Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior

This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small...

Full description

Saved in:
Bibliographic Details
Main Authors: Luisa Orosanu, Denis Jouvet
Format: Article
Language:Arabic
Published: Scientific and Technological Research Center for the Development of the Arabic Language 2016-05-01
Series:Al-Lisaniyyat
Subjects:
Online Access:https://www.crstdla.dz/ojs/index.php/allj/article/view/369
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849424435709214720
author Luisa Orosanu
Denis Jouvet
author_facet Luisa Orosanu
Denis Jouvet
author_sort Luisa Orosanu
collection DOAJ
description This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small set of sentences containing the new words and on a set of n-gram counts containing the known words, we search for known words which have the most similar neighbor distribution (of the few preceding and few following neighbor words) to the new words. The similar words are determined through the computation of KL divergences on the distribution of neighbor words. The n-gram parameter values associated to the similar words are then used to define the n-gram parameter values of the new words. In the context of speech recognition, the performance assessment on a LVCSR task shows the benefit of the proposed approach.
format Article
id doaj-art-e253473a67de45e58d2f0dd31471953a
institution Kabale University
issn 1112-4393
2588-2031
language Arabic
publishDate 2016-05-01
publisher Scientific and Technological Research Center for the Development of the Arabic Language
record_format Article
series Al-Lisaniyyat
spelling doaj-art-e253473a67de45e58d2f0dd31471953a2025-08-20T03:30:10ZaraScientific and Technological Research Center for the Development of the Arabic LanguageAl-Lisaniyyat1112-43932588-20312016-05-0122210.61850/allj.v22i2.369Adding New Words Into A Language Model Using Parameters Of Known Words With Similar BehaviorLuisa Orosanu0Denis Jouvet1University of LorraineUniversity of Lorraine This article presents a study on how to automatically add new words into a language model without re-training it or adapting it (which requires a lot of new data). The proposed approach consists in finding a list of similar words for each new word to be added in the language model. Based on a small set of sentences containing the new words and on a set of n-gram counts containing the known words, we search for known words which have the most similar neighbor distribution (of the few preceding and few following neighbor words) to the new words. The similar words are determined through the computation of KL divergences on the distribution of neighbor words. The n-gram parameter values associated to the similar words are then used to define the n-gram parameter values of the new words. In the context of speech recognition, the performance assessment on a LVCSR task shows the benefit of the proposed approach. https://www.crstdla.dz/ojs/index.php/allj/article/view/369speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas
spellingShingle Luisa Orosanu
Denis Jouvet
Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
Al-Lisaniyyat
speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas
title Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
title_full Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
title_fullStr Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
title_full_unstemmed Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
title_short Adding New Words Into A Language Model Using Parameters Of Known Words With Similar Behavior
title_sort adding new words into a language model using parameters of known words with similar behavior
topic speech-to-text transcriptions- language modeling- OOV words -similar words- part-of-speech tags- lemmas
url https://www.crstdla.dz/ojs/index.php/allj/article/view/369
work_keys_str_mv AT luisaorosanu addingnewwordsintoalanguagemodelusingparametersofknownwordswithsimilarbehavior
AT denisjouvet addingnewwordsintoalanguagemodelusingparametersofknownwordswithsimilarbehavior