Created in April 2017, SIGUL is a joint Special Interest Group of the European Language Resources Association (ELRA) and of the International Speech Communication Association (ISCA).
In October 2016, ELRA had setup the Workgroup on Less-Resourced Languages (LRL), convened by Claudia Soria, with the mission to support the maintenance of linguistic diversity through technology and ICT. Since Language Resources and technologies represent a key component for any language-based technology, this special interest group intends to focus on the particular needs and requirements of less-resourced languages.
Through its participation in the Special Interest Group on Under-resourced Languages, ELRA reasserts its active involvement in contributing to enhance the support for the languages with little or no technological support.
Register to SIGUL mailing-list: https://bit.ly/2PYhM66
SIGUL intends to bring together a number of professionals involved in the development of language resources and technologies for under-resourced languages. Its main objective is to build a community that not only supports linguistic diversity through technology and ICT but also commits to increase the lesser-resourced languages (regional, minority, or endangered) chances to survive the digital world through language and speech technology.
Porting a NLP system (for instance a speech recognition system or a syntactic parser) to a lesser-resourced language requires techniques that go far beyond the basic re-training of the models.
Indeed, processing a new language often leads to new challenges (special phonetic and phonological systems, word segmentation problems, fuzzy grammatical structure, unwritten language, etc.). The lack of resources requires, on its side, innovative data collection methodologies (via community sourcing for instance) or models for which information is shared between languages (e.g. multilingual acoustic models) or even approaches that do not need annotated data (e.g. zero-resource or zero-shot methods). In addition, some social and cultural aspects related to the context of the targeted language bring additional problems: languages with many dialects in different regions, code-switching phenomena, massive presence of non-native speakers. It is also important to bridge the gap between language experts, native speakers and technology experts. Finally, digital humanities offer new opportunities to work on ancient languages which are inherently under-resourced. Therefore, the main goal of this SIG will be to increase interaction between researchers interested in all the above topics.
- Chair and ELRA liaison representative: Claudia Soria (CNR-ILC, Pisa, Italy)
- Co-chair and ISCA liaison representative: Laurent Besacier (LIG, Grenoble, France)
- Secretary: Sakriani Sakti (NAIST, Nara, Japan)
- Contributor: Dorothee Beermann (NTNU, Norway)
- Gilles Adda, LIMSI-CNRS
- Victoria Arranz, ELRA/ELDA
- Laurent Besacier, LIG-IMAG
- Khalid Choukri, ELRA/ELDA
- Thierry Declerck, DFKI
- Vera Ferreira, CIDLES
- Mikel Forcada, Universitat d'Alacant
- John Judge, ADAPT DCU
- Valérie Mapelli, ELRA/ELDA
- Yohei Murakami, Kyoto University
- Joseph Mariani, LIMSI-CNRS
- Sakriani Sakti, NAIST
- Claudia Soria, ILC-CNR
International Advisory Group
- Tunde Adegbola, African Languages Technology Initiative, Nigeria
- Shyam Agrawal, KIIT Group of Colleges, India
- Antti Arppe, University of Alberta, Canada
- Steven Bird, Charles Darwin University, Australia
- Pushpak Bhattacharyya, IIT Bombay, India
- Chris Cieri, LDC, USA
- Dafydd Gibbon, Bielefeld University, Germany
- Linne Ha, Google Inc., USA (tbc)
- Andras Kornai, Hungarian Academy of Sciences, Hungary
- Lori Levin, Carnegie Mellon University, USA
- Satoshi Nakamura, NARA INSTITUTE OF SCIENCE AND TECHNOLOGY, Japan
- Girish Nath Jha, JNU, India
- Guy de Pauw, Textgain, Belgium
- Laurette Pretorius, University of South Africa, South Africa
- Kevin Scannell, Saint Louis University, Missouri, USA
- Francis Tyers, UiT Norgga árktalaš universitehta, Norway
- Workshop on Ugandan Languages: Vitality, Resources and Capacity Building, at
Makerere University, on Nov. 18–22, 2019, as part of the International Year of indigenous Languages 2019
- INTERSPEECH 2017 Special session on Digital Revolution for Under-resourced Languages (DigRev-URL), download the presentation.
Spoken Language Technologies for Under-resourced Languages (LSTU) workshops
Workshops at LREC
- LREC 2018 (Miyazaki, Japan) "Collaboration and Computing for Under-Resourced Languages - Sustaining Knowledge Diversity in the Digital Age", proceedings
- LREC 2016 (Portorož, Slovenia) "Collaboration and Computing for Under-Resourced Languages - Towards an Alliance for Digital Language Diversity", proceedings
- LREC 2014 (Reykjavik, Iceland) "Collaboration and Computing for Under Resourced Languages in the Linked Open Data Era", proceedings
- LREC 2012 (Istanbul, Turkey) "Language technology for normalisation of less-resourced languages", proceedings
- LREC 2010 (Malta) "Creation and use of basic lexical resources for less-resourced languages", proceedings
- LREC 2008 (Marrakech, Morocco) "Collaboration: interoperability between people in the creation of language resources for less-resourced languages", proceedings
LRL Workshops at L&TC (Poznan)
- L&TC 2017 “Language Technology for Less Resourced Languages”
- L&TC 2015 “Language Technologies in Support of Less-Resourced Languages”
- L&TC 2013 “Less Resourced Languages, New Technologies, New Challenges and Opportunities”
- L&TC 2011 “Addressing the Gaps in Language Resources and Technologies”
- L&TC 2009 “Getting Less-Resourced Languages on-Board!”
Events endorsed by SIGUL
- Zero Resource Speech Challenge 2019 (deadline March 15, 2019).
- Digital Revolution for Under-resourced Languages in Asia, in Nara (Japan), February 19-2, 2019
- Developmental Language Datasets and Tools Match-Up Bootcamp, at the Max Planck Institute for Psycholinguistics in Nijmegen (The Netherlands) on May 22–24, 2019