Contextualized embeddings for semantic change detection: Lessons learned

We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrey Kutuzov, Erik Velldal, Lilja Øvrelid
Format: Article
Language:English
Published: Linköping University Electronic Press 2022-08-01
Series:Northern European Journal of Language Technology
Online Access:https://nejlt.ep.liu.se/article/view/3478
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832591216157392896
author Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
author_facet Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
author_sort Andrey Kutuzov
collection DOAJ
description We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable). Such challenging cases are discussed in detail with examples, and their linguistic categorization is proposed. Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance, which naturally stem from their distributional nature, but is different from the types of issues observed in methods based on static embeddings. Additionally, they often merge together syntactic and semantic aspects of lexical entities. We propose a range of possible future solutions to these issues.
format Article
id doaj-art-232869529e3a4d5a9311658eb3893cda
institution Kabale University
issn 2000-1533
language English
publishDate 2022-08-01
publisher Linköping University Electronic Press
record_format Article
series Northern European Journal of Language Technology
spelling doaj-art-232869529e3a4d5a9311658eb3893cda2025-01-22T15:25:17ZengLinköping University Electronic PressNorthern European Journal of Language Technology2000-15332022-08-018110.3384/nejlt.2000-1533.2022.3478Contextualized embeddings for semantic change detection: Lessons learnedAndrey Kutuzov0Erik Velldal1Lilja Øvrelid2University of OsloUniversity of OsloUniversity of Oslo We present a qualitative analysis of the (potentially erroneous) outputs of contextualized embedding-based methods for detecting diachronic semantic change. First, we introduce an ensemble method outperforming previously described contextualized approaches. This method is used as a basis for an in-depth analysis of the degrees of semantic change predicted for English words across 5 decades. Our findings show that contextualized methods can often predict high change scores for words which are not undergoing any real diachronic semantic shift in the lexicographic sense of the term (or at least the status of these shifts is questionable). Such challenging cases are discussed in detail with examples, and their linguistic categorization is proposed. Our conclusion is that pre-trained contextualized language models are prone to confound changes in lexicographic senses and changes in contextual variance, which naturally stem from their distributional nature, but is different from the types of issues observed in methods based on static embeddings. Additionally, they often merge together syntactic and semantic aspects of lexical entities. We propose a range of possible future solutions to these issues. https://nejlt.ep.liu.se/article/view/3478
spellingShingle Andrey Kutuzov
Erik Velldal
Lilja Øvrelid
Contextualized embeddings for semantic change detection: Lessons learned
Northern European Journal of Language Technology
title Contextualized embeddings for semantic change detection: Lessons learned
title_full Contextualized embeddings for semantic change detection: Lessons learned
title_fullStr Contextualized embeddings for semantic change detection: Lessons learned
title_full_unstemmed Contextualized embeddings for semantic change detection: Lessons learned
title_short Contextualized embeddings for semantic change detection: Lessons learned
title_sort contextualized embeddings for semantic change detection lessons learned
url https://nejlt.ep.liu.se/article/view/3478
work_keys_str_mv AT andreykutuzov contextualizedembeddingsforsemanticchangedetectionlessonslearned
AT erikvelldal contextualizedembeddingsforsemanticchangedetectionlessonslearned
AT liljaøvrelid contextualizedembeddingsforsemanticchangedetectionlessonslearned