Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response

Survey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete...

Full description

Saved in:

Bibliographic Details
Main Authors:	Junyung Ji, Jiwoo Kim, Younghoon Kim
Format:	Article
Language:	English
Published:	MDPI AG 2024-09-01
Series:	Future Internet
Subjects:	survey data item non-response large language models prompt engineering
Online Access:	https://www.mdpi.com/1999-5903/16/10/351
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1850205165301268480
author	Junyung Ji Jiwoo Kim Younghoon Kim
author_facet	Junyung Ji Jiwoo Kim Younghoon Kim
author_sort	Junyung Ji
collection	DOAJ
description	Survey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete datasets that undermine data integrity and can hinder or even prevent accurate analysis. Traditional methods for addressing missing data, such as statistical imputation techniques and deep learning models, often fall short when dealing with the rich linguistic content of survey data. These approaches are also hampered by high time complexity for training and the need for extensive preprocessing or feature selection. In this paper, we introduce an approach that leverages Large Language Models (LLMs) through prompt engineering for predicting item non-responses in survey data. Our method combines the strengths of both traditional imputation techniques and deep learning methods with the advanced linguistic understanding of LLMs. By integrating respondent similarities, question relevance, and linguistic semantics, our approach enhances the accuracy and comprehensiveness of survey data analysis. The proposed method bypasses the need for complex preprocessing and additional training, making it adaptable, scalable, and capable of generating explainable predictions in natural language. We evaluated the effectiveness of our LLM-based approach through a series of experiments, demonstrating its competitive performance against established methods such as Multivariate Imputation by Chained Equations (MICE), MissForest, and deep learning models like TabTransformer. The results show that our approach not only matches but, in some cases, exceeds the performance of these methods while significantly reducing the time required for data processing.
format	Article
id	doaj-art-e4e7a4fe76a84897b32da2a7b6b27f9c
institution	OA Journals
issn	1999-5903
language	English
publishDate	2024-09-01
publisher	MDPI AG
record_format	Article
series	Future Internet
spelling	doaj-art-e4e7a4fe76a84897b32da2a7b6b27f9c2025-08-20T02:11:09ZengMDPI AGFuture Internet1999-59032024-09-01161035110.3390/fi16100351Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-ResponseJunyung Ji0Jiwoo Kim1Younghoon Kim2Department of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of KoreaDepartment of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of KoreaDepartment of Applied Artificial Intelligence, Hanyang University at Ansan, Ansan 15588, Republic of KoreaSurvey data play a crucial role in various research fields, including economics, education, and healthcare, by providing insights into human behavior and opinions. However, item non-response, where respondents fail to answer specific questions, presents a significant challenge by creating incomplete datasets that undermine data integrity and can hinder or even prevent accurate analysis. Traditional methods for addressing missing data, such as statistical imputation techniques and deep learning models, often fall short when dealing with the rich linguistic content of survey data. These approaches are also hampered by high time complexity for training and the need for extensive preprocessing or feature selection. In this paper, we introduce an approach that leverages Large Language Models (LLMs) through prompt engineering for predicting item non-responses in survey data. Our method combines the strengths of both traditional imputation techniques and deep learning methods with the advanced linguistic understanding of LLMs. By integrating respondent similarities, question relevance, and linguistic semantics, our approach enhances the accuracy and comprehensiveness of survey data analysis. The proposed method bypasses the need for complex preprocessing and additional training, making it adaptable, scalable, and capable of generating explainable predictions in natural language. We evaluated the effectiveness of our LLM-based approach through a series of experiments, demonstrating its competitive performance against established methods such as Multivariate Imputation by Chained Equations (MICE), MissForest, and deep learning models like TabTransformer. The results show that our approach not only matches but, in some cases, exceeds the performance of these methods while significantly reducing the time required for data processing.https://www.mdpi.com/1999-5903/16/10/351survey dataitem non-responselarge language modelsprompt engineering
spellingShingle	Junyung Ji Jiwoo Kim Younghoon Kim Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response Future Internet survey data item non-response large language models prompt engineering
title	Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response
title_full	Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response
title_fullStr	Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response
title_full_unstemmed	Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response
title_short	Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response
title_sort	predicting missing values in survey data using prompt engineering for addressing item non response
topic	survey data item non-response large language models prompt engineering
url	https://www.mdpi.com/1999-5903/16/10/351
work_keys_str_mv	AT junyungji predictingmissingvaluesinsurveydatausingpromptengineeringforaddressingitemnonresponse AT jiwookim predictingmissingvaluesinsurveydatausingpromptengineeringforaddressingitemnonresponse AT younghoonkim predictingmissingvaluesinsurveydatausingpromptengineeringforaddressingitemnonresponse

Predicting Missing Values in Survey Data Using Prompt Engineering for Addressing Item Non-Response

Similar Items