SEMbeddings: how to evaluate model misfit before data collection using large-language models

IntroductionRecent developments suggest that Large Language Models (LLMs) provide a promising approach for approximating empirical correlation matrices of item responses by utilizing item embeddings and their cosine similarities. In this paper, we introduce a novel tool, which we label SEMbeddings.M...

Full description

Saved in:
Bibliographic Details
Main Authors: Tommaso Feraco, Enrico Toffalini
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Psychology
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1433339/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832542287737913344
author Tommaso Feraco
Enrico Toffalini
author_facet Tommaso Feraco
Enrico Toffalini
author_sort Tommaso Feraco
collection DOAJ
description IntroductionRecent developments suggest that Large Language Models (LLMs) provide a promising approach for approximating empirical correlation matrices of item responses by utilizing item embeddings and their cosine similarities. In this paper, we introduce a novel tool, which we label SEMbeddings.MethodsThis tool integrates mpnet-personality (a fine-tuned embedding model) with latent measurement models to assess model fit or misfit prior to data collection. To support our statement, we apply SEMbeddings to the 96 items of the VIA-IS-P, which measures 24 different character strengths, using responses from 31,697 participants.ResultsOur analysis shows a significant, though not perfect, correlation (r = 0.67) between the cosine similarities of embeddings and empirical correlations among items. We then demonstrate how to fit confirmatory factor analyses on the cosine similarity matrices produced by mpnet-personality and interpret the outcomes using modification indices. We found that relying on traditional fit indices when using SEMbeddings can be misleading as they often lead to more conservative conclusions compared to empirical results. Nevertheless, they provide valuable suggestions about possible misfit, and we argue that the modification indices obtained from these models could serve as a useful screening tool to make informed decisions about items prior to data collection.DiscussionAs LLMs become increasingly precise and new fine-tuned models are released, these procedures have the potential to deliver more reliable results, potentially transforming the way new questionnaires are developed.
format Article
id doaj-art-114dc525f507419d950bb01827a5e5c0
institution Kabale University
issn 1664-1078
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Psychology
spelling doaj-art-114dc525f507419d950bb01827a5e5c02025-02-04T06:31:40ZengFrontiers Media S.A.Frontiers in Psychology1664-10782025-02-011510.3389/fpsyg.2024.14333391433339SEMbeddings: how to evaluate model misfit before data collection using large-language modelsTommaso FeracoEnrico ToffaliniIntroductionRecent developments suggest that Large Language Models (LLMs) provide a promising approach for approximating empirical correlation matrices of item responses by utilizing item embeddings and their cosine similarities. In this paper, we introduce a novel tool, which we label SEMbeddings.MethodsThis tool integrates mpnet-personality (a fine-tuned embedding model) with latent measurement models to assess model fit or misfit prior to data collection. To support our statement, we apply SEMbeddings to the 96 items of the VIA-IS-P, which measures 24 different character strengths, using responses from 31,697 participants.ResultsOur analysis shows a significant, though not perfect, correlation (r = 0.67) between the cosine similarities of embeddings and empirical correlations among items. We then demonstrate how to fit confirmatory factor analyses on the cosine similarity matrices produced by mpnet-personality and interpret the outcomes using modification indices. We found that relying on traditional fit indices when using SEMbeddings can be misleading as they often lead to more conservative conclusions compared to empirical results. Nevertheless, they provide valuable suggestions about possible misfit, and we argue that the modification indices obtained from these models could serve as a useful screening tool to make informed decisions about items prior to data collection.DiscussionAs LLMs become increasingly precise and new fine-tuned models are released, these procedures have the potential to deliver more reliable results, potentially transforming the way new questionnaires are developed.https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1433339/fulllarge language modelsartificial intelligenceconfirmatory factor analysisvalidityassessmentstructural equation models
spellingShingle Tommaso Feraco
Enrico Toffalini
SEMbeddings: how to evaluate model misfit before data collection using large-language models
Frontiers in Psychology
large language models
artificial intelligence
confirmatory factor analysis
validity
assessment
structural equation models
title SEMbeddings: how to evaluate model misfit before data collection using large-language models
title_full SEMbeddings: how to evaluate model misfit before data collection using large-language models
title_fullStr SEMbeddings: how to evaluate model misfit before data collection using large-language models
title_full_unstemmed SEMbeddings: how to evaluate model misfit before data collection using large-language models
title_short SEMbeddings: how to evaluate model misfit before data collection using large-language models
title_sort sembeddings how to evaluate model misfit before data collection using large language models
topic large language models
artificial intelligence
confirmatory factor analysis
validity
assessment
structural equation models
url https://www.frontiersin.org/articles/10.3389/fpsyg.2024.1433339/full
work_keys_str_mv AT tommasoferaco sembeddingshowtoevaluatemodelmisfitbeforedatacollectionusinglargelanguagemodels
AT enricotoffalini sembeddingshowtoevaluatemodelmisfitbeforedatacollectionusinglargelanguagemodels