Utilizing Language Technology in the Documentation of Endangered Uralic Languages

The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for futur...

Full description

Saved in:
Bibliographic Details
Main Authors: Ciprian Gerstenberger, Niko Partanen, Michael Rießler, Joshua Wilbur
Format: Article
Language:English
Published: Linköping University Electronic Press 2016-03-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/1660
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590643293061120
author Ciprian Gerstenberger
Niko Partanen
Michael Rießler
Joshua Wilbur
author_facet Ciprian Gerstenberger
Niko Partanen
Michael Rießler
Joshua Wilbur
author_sort Ciprian Gerstenberger
collection DOAJ
description The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future.
format Article
id doaj-art-f2b181dd025f41aa86c116d521b7c0f6
institution Kabale University
issn 2000-1533
language English
publishDate 2016-03-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-f2b181dd025f41aa86c116d521b7c0f62025-01-23T10:36:33ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332016-03-01410.3384/nejlt.2000-1533.1643Utilizing Language Technology in the Documentation of Endangered Uralic LanguagesCiprian Gerstenberger0Niko Partanen1Michael Rießler2Joshua Wilbur3UiT – The Arctic University of Norway, Giellatekno – Saami Language TechnologyUniversity of Hamburg, Department of Uralic StudiesUniversity of Freiburg, Department of Scandinavian StudiesUniversity of Freiburg, Department of Scandinavian Studies The paper describes work-in-progress by the Pite Saami, Kola Saami and Izhva Komi language documentation projects, all of which record new spoken language data, digitize available recordings and annotate these multimedia data in order to provide comprehensive language corpora as databases for future research on and for endangered – and under-described – Uralic speech communities. Applying language technology in language documentation helps us to create more systematically annotated corpora, rather than eclectic data collections. Specifically, we describe a script providing interactivity between different morphosyntactic analysis modules implemented as Finite State Transducers and ELAN, a Graphical User Interface tool for annotating and presenting multimodal corpora. Ultimately, the spoken corpora created in our projects will be useful for scientifically significant quantitative investigations on these languages in the future. https://nejlt.ep.liu.se/article/view/1660
spellingShingle Ciprian Gerstenberger
Niko Partanen
Michael Rießler
Joshua Wilbur
Utilizing Language Technology in the Documentation of Endangered Uralic Languages
Northern European Journal of Language Technology
title Utilizing Language Technology in the Documentation of Endangered Uralic Languages
title_full Utilizing Language Technology in the Documentation of Endangered Uralic Languages
title_fullStr Utilizing Language Technology in the Documentation of Endangered Uralic Languages
title_full_unstemmed Utilizing Language Technology in the Documentation of Endangered Uralic Languages
title_short Utilizing Language Technology in the Documentation of Endangered Uralic Languages
title_sort utilizing language technology in the documentation of endangered uralic languages
url https://nejlt.ep.liu.se/article/view/1660
work_keys_str_mv AT cipriangerstenberger utilizinglanguagetechnologyinthedocumentationofendangereduraliclanguages
AT nikopartanen utilizinglanguagetechnologyinthedocumentationofendangereduraliclanguages
AT michaelrießler utilizinglanguagetechnologyinthedocumentationofendangereduraliclanguages
AT joshuawilbur utilizinglanguagetechnologyinthedocumentationofendangereduraliclanguages