Task-dependent Optimal Weight Combinations for Static Embeddings

A variety of NLP applications use word2vec skip-gram, GloVe, and fastText word embeddings. These models learn two sets of embedding vectors, but most practitioners use only one of them, or alternately an unweighted sum of both. This is the first study to systematically explore a range of linear com...

Full description

Saved in:
Bibliographic Details
Main Authors: Nathaniel Robinson, Nathaniel Carlson, David Mortensen, Elizabeth Vargas, Thomas Fackrell, Nancy Fulda
Format: Article
Language:English
Published: Linköping University Electronic Press 2022-11-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/4438
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832591241817096192
author Nathaniel Robinson
Nathaniel Carlson
David Mortensen
Elizabeth Vargas
Thomas Fackrell
Nancy Fulda
author_facet Nathaniel Robinson
Nathaniel Carlson
David Mortensen
Elizabeth Vargas
Thomas Fackrell
Nancy Fulda
author_sort Nathaniel Robinson
collection DOAJ
description A variety of NLP applications use word2vec skip-gram, GloVe, and fastText word embeddings. These models learn two sets of embedding vectors, but most practitioners use only one of them, or alternately an unweighted sum of both. This is the first study to systematically explore a range of linear combinations between the first and second embedding sets. We evaluate these combinations on a set of six NLP benchmarks including IR, POS-tagging, and sentence similarity. We show that the default embedding combinations are often suboptimal and demonstrate 1.0-8.0% improvements. Notably, GloVe’s default unweighted sum is its least effective combination across tasks. We provide a theoretical basis for weighting one set of embeddings more than the other according to the algorithm and task. We apply our findings to improve accuracy in applications of cross-lingual alignment and navigational knowledge by up to 15.2%.
format Article
id doaj-art-82b93e95dc9844c486e7ee26f14472fd
institution Kabale University
issn 2000-1533
language English
publishDate 2022-11-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-82b93e95dc9844c486e7ee26f14472fd2025-01-22T15:25:17ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332022-11-018110.3384/nejlt.2000-1533.2022.4438Task-dependent Optimal Weight Combinations for Static EmbeddingsNathaniel Robinson0Nathaniel Carlson1David Mortensen2Elizabeth Vargas3Thomas Fackrell4Nancy Fulda5Carnegie Mellon UniversityBrigham Young UniversityCarnegie Mellon UniversityBrigham Young UniversityBrigham Young UniversityBrigham Young University A variety of NLP applications use word2vec skip-gram, GloVe, and fastText word embeddings. These models learn two sets of embedding vectors, but most practitioners use only one of them, or alternately an unweighted sum of both. This is the first study to systematically explore a range of linear combinations between the first and second embedding sets. We evaluate these combinations on a set of six NLP benchmarks including IR, POS-tagging, and sentence similarity. We show that the default embedding combinations are often suboptimal and demonstrate 1.0-8.0% improvements. Notably, GloVe’s default unweighted sum is its least effective combination across tasks. We provide a theoretical basis for weighting one set of embeddings more than the other according to the algorithm and task. We apply our findings to improve accuracy in applications of cross-lingual alignment and navigational knowledge by up to 15.2%. https://nejlt.ep.liu.se/article/view/4438
spellingShingle Nathaniel Robinson
Nathaniel Carlson
David Mortensen
Elizabeth Vargas
Thomas Fackrell
Nancy Fulda
Task-dependent Optimal Weight Combinations for Static Embeddings
Northern European Journal of Language Technology
title Task-dependent Optimal Weight Combinations for Static Embeddings
title_full Task-dependent Optimal Weight Combinations for Static Embeddings
title_fullStr Task-dependent Optimal Weight Combinations for Static Embeddings
title_full_unstemmed Task-dependent Optimal Weight Combinations for Static Embeddings
title_short Task-dependent Optimal Weight Combinations for Static Embeddings
title_sort task dependent optimal weight combinations for static embeddings
url https://nejlt.ep.liu.se/article/view/4438
work_keys_str_mv AT nathanielrobinson taskdependentoptimalweightcombinationsforstaticembeddings
AT nathanielcarlson taskdependentoptimalweightcombinationsforstaticembeddings
AT davidmortensen taskdependentoptimalweightcombinationsforstaticembeddings
AT elizabethvargas taskdependentoptimalweightcombinationsforstaticembeddings
AT thomasfackrell taskdependentoptimalweightcombinationsforstaticembeddings
AT nancyfulda taskdependentoptimalweightcombinationsforstaticembeddings