Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | Arabic |
| Published: |
Scientific and Technological Research Center for the Development of the Arabic Language
2016-05-01
|
| Series: | Al-Lisaniyyat |
| Subjects: | |
| Online Access: | https://www.crstdla.dz/ojs/index.php/allj/article/view/369 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849424435709214720 |
|---|---|
| author | Luisa Orosanu Denis Jouvet |
| author_facet | Luisa Orosanu Denis Jouvet |
| author_sort | Luisa Orosanu |
| collection | DOAJ |
| description |
This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small set of sentences containing the new words and on a set of n-gram counts containing the known words, we search for known words which have the most similar neighbor distribution (of the few preceding and few following neighbor words) to the new words. The similar words are determined through the computation of KL divergences on the distribution of neighbor words. The n-gram parameter values associated to the similar words are then used to define the n-gram parameter values of the new words. In the context of speech recognition, the performance assessment on a LVCSR task shows the benefit of the proposed approach.
|
| format | Article |
| id | doaj-art-e253473a67de45e58d2f0dd31471953a |
| institution | Kabale University |
| issn | 1112-4393 2588-2031 |
| language | Arabic |
| publishDate | 2016-05-01 |
| publisher | Scientific and Technological Research Center for the Development of the Arabic Language |
| record_format | Article |
| series | Al-Lisaniyyat |
| spelling | doaj-art-e253473a67de45e58d2f0dd31471953a2025-08-20T03:30:10ZaraScientific and Technological Research Center for the Development of the Arabic LanguageAl-Lisaniyyat1112-43932588-20312016-05-0122210.61850/allj.v22i2.369Adding New Words Into A Language Model Using Parameters Of Known Words With Similar BehaviorLuisa Orosanu0Denis Jouvet1University of LorraineUniversity of Lorraine This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small set of sentences containing the new words and on a set of n-gram counts containing the known words, we search for known words which have the most similar neighbor distribution (of the few preceding and few following neighbor words) to the new words. The similar words are determined through the computation of KL divergences on the distribution of neighbor words. The n-gram parameter values associated to the similar words are then used to define the n-gram parameter values of the new words. In the context of speech recognition, the performance assessment on a LVCSR task shows the benefit of the proposed approach. https://www.crstdla.dz/ojs/index.php/allj/article/view/369speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas |
| spellingShingle | Luisa Orosanu Denis Jouvet Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior Al-Lisaniyyat speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas |
| title | Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior |
| title_full | Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior |
| title_fullStr | Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior |
| title_full_unstemmed | Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior |
| title_short | Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior |
| title_sort | adding new words into a language model using parameters of known words with similar behavior |
| topic | speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas |
| url | https://www.crstdla.dz/ojs/index.php/allj/article/view/369 |
| work_keys_str_mv | AT luisaorosanu addingnewwordsintoalanguagemodelusingparametersofknownwordswithsimilarbehavior AT denisjouvet addingnewwordsintoalanguagemodelusingparametersofknownwordswithsimilarbehavior |