Special Interest Group: Under-resourced Languages

Created in April 2017, SIGUL is a joint Special Interest Group of the ELRA Language Resources Association (ELRA) and of the International Speech Communication Association (ISCA).

In October 2016, ELRA had setup the Workgroup on Less-Resourced Languages (LRL), convened by Claudia Soria, with the mission to support the maintenance of linguistic diversity through technology and ICT. Since Language Resources and technologies represent a key component for any language-based technology, this special interest group intends to focus on the particular needs and requirements of less-resourced languages.

Through its participation in the Special Interest Group on Under-resourced Languages, ELRA reasserts its active involvement in contributing to enhance the support for the languages with little or no technological support.

To join SIGUL, please contact Sakriani Sakti (NAIST, Nara, Japan) or Maite Melero (Barcelona Supercomputing Center, Spain).

Register to SIGUL mailing-list: https://bit.ly/2PYhM66

Aim

SIGUL intends to bring together a number of professionals involved in the development of language resources and technologies for under-resourced languages. Its main objective is to build a community that not only supports linguistic diversity through technology and ICT but also commits to increase the lesser-resourced languages (regional, minority, or endangered) chances to survive the digital world through language and speech technology.

Motivation

Porting a NLP system (for instance a speech recognition system or a syntactic parser) to a lesser-resourced language requires techniques that go far beyond the basic re-training of the models.
Indeed, processing a new language often leads to new challenges (special phonetic and phonological systems, word segmentation problems, fuzzy grammatical structure, unwritten language, etc.). The lack of resources requires, on its side, innovative data collection methodologies (via community sourcing for instance) or models for which information is shared between languages (e.g. multilingual acoustic models) or even approaches that do not need annotated data (e.g. zero-resource or zero-shot methods). In addition, some social and cultural aspects related to the context of the targeted language bring additional problems: languages with many dialects in different regions, code-switching phenomena, massive presence of non-native speakers. It is also important to bridge the gap between language experts, native speakers and technology experts. Finally, digital humanities offer new opportunities to work on ancient languages which are inherently under-resourced. Therefore, the main goal of this SIG will be to increase interaction between researchers interested in all the above topics.

Officers

Chair and ISCA liaison representative: Sakriani Sakti (NAIST, Nara, Japan)
Co-chair and ELRA liaison representative: Claudia Soria (CNR-ILC, Pisa, Italy)
Secretary: Maite Melero (Barcelona Supercomputing Center, Spain)

Committee

Gilles Adda, LIMSI-CNRS
Victoria Arranz, ELRA/ELDA
Laurent Besacier, LIG-IMAG
Khalid Choukri, ELRA/ELDA
Thierry Declerck, DFKI
Vera Ferreira, CIDLES
Mikel Forcada, Universitat d'Alacant
John Judge, ADAPT DCU
Valérie Mapelli, ELRA/ELDA
Yohei Murakami, Kyoto University
Joseph Mariani, LIMSI-CNRS
Sakriani Sakti, NAIST
Claudia Soria, ILC-CNR

International Advisory Group

Tunde Adegbola, African Languages Technology Initiative, Nigeria
Shyam Agrawal, KIIT Group of Colleges, India
Antti Arppe, University of Alberta, Canada
Steven Bird, Charles Darwin University, Australia
Pushpak Bhattacharyya, IIT Bombay, India
Chris Cieri, LDC, USA
Dafydd Gibbon, Bielefeld University, Germany
Andras Kornai, Hungarian Academy of Sciences, Hungary
Lori Levin, Carnegie Mellon University, USA
Satoshi Nakamura, NARA INSTITUTE OF SCIENCE AND TECHNOLOGY, Japan
Girish Nath Jha, JNU, India
Guy de Pauw, Textgain, Belgium
Laurette Pretorius, University of South Africa, South Africa
Kevin Scannell, Saint Louis University, Missouri, USA
Francis Tyers, UiT Norgga árktalaš universitehta, Norway

SIGUL Events

SIGUL 2023 with INTERSPEECH, on 18-20 August 2023, Dublin, Ireland
Workshop on Ugandan Languages: Vitality, Resources and Capacity Building, at
Makerere University, on Nov. 18–22, 2019, as part of the International Year of indigenous Languages 2019
INTERSPEECH 2017 Special session on Digital Revolution for Under-resourced Languages (DigRev-URL), download the presentation.

Spoken Language Technologies for Under-resourced Languages (LSTU) workshops

SLTU 2018 (New Delhi, India)
SLTU 2016 (Yogyakarta, Indonesia), proceedings
SLTU 2014 (Saint-Petersburg, Russian), proceedings
SLTU 2012 (Cape Town, South Africa), proceedings
SLTU 2010 (Penang, Malaysia), proceedings
SLTU 2008 (Hanoi, Vietnam), proceedings

Workshops at LREC

LREC-COLING 2024 (Turin, Italy) The 3rd Annual Meeting of the ELRA-ISCA Special Interest Group on Under-resourced Languages (SIGUL2024), 1st CfP
LREC 2022 (Marseille, France) SIGUL 2022 Workshop, proceedings
LREC 2020 (Event cancelled) "1st Joint SLTU and CCURL Workshop", proceedings
LREC 2018 (Miyazaki, Japan) "Collaboration and Computing for Under-Resourced Languages - Sustaining Knowledge Diversity in the Digital Age", proceedings
LREC 2016 (Portorož, Slovenia) "Collaboration and Computing for Under-Resourced Languages - Towards an Alliance for Digital Language Diversity", proceedings
LREC 2014 (Reykjavik, Iceland) "Collaboration and Computing for Under Resourced Languages in the Linked Open Data Era", proceedings
LREC 2012 (Istanbul, Turkey) "Language technology for normalisation of less-resourced languages", proceedings
LREC 2010 (Malta) "Creation and use of basic lexical resources for less-resourced languages", proceedings
LREC 2008 (Marrakech, Morocco) "Collaboration: interoperability between people in the creation of language resources for less-resourced languages", proceedings

LRL Workshops at L&TC (Poznan)

L&TC 2017 “Language Technology for Less Resourced Languages”
L&TC 2015 “Language Technologies in Support of Less-Resourced Languages”
L&TC 2013 “Less Resourced Languages, New Technologies, New Challenges and Opportunities”
L&TC 2011 “Addressing the Gaps in Language Resources and Technologies”
L&TC 2009 “Getting Less-Resourced Languages on-Board!”

Events endorsed by SIGUL

Zero Resource Speech Challenge 2019 (deadline March 15, 2019).
Digital Revolution for Under-resourced Languages in Asia, in Nara (Japan), February 19-2, 2019
Developmental Language Datasets and Tools Match-Up Bootcamp, at the Max Planck Institute for Psycholinguistics in Nijmegen (The Netherlands) on May 22–24, 2019

Reports on the SIGUL activities are available below:

SIGUL2024-1stCfP.pdf (542.4 KB)

Aim

Motivation

Officers

Committee

International Advisory Group

SIGUL Events

Spoken Language Technologies for Under-resourced Languages (LSTU) workshops

Workshops at LREC

LRL Workshops at L&TC (Poznan)

Events endorsed by SIGUL

Links

Tags

Latest News

Tag Cloud

ELRA Tweets

Share this page!

Aim

Motivation

Officers

Committee

International Advisory Group

SIGUL Events

Spoken Language Technologies for Under-resourced Languages (LSTU) workshops

Workshops at LREC

LRL Workshops at L&TC (Poznan)

Events endorsed by SIGUL

Links

Tags

Latest News

Tag Cloud

ELRA Tweets