Advances in Biomedical Missing Data Imputation: A Survey

Ensuring data quality in biomedical sciences is crucial for reliable research outcomes, particularly as precision medicine continues to gain prominence. Missing values compromise data quality and can make it difficult to perform data-based studies. The origins of missing values in biomedical dataset...

Full description

Saved in:
Bibliographic Details
Main Authors: Miriam Barrabes, Maria Perera, Victor Novelle Moriano, Xavier Giro-I-Nieto, Daniel Mas Montserrat, Alexander G. Ioannidis
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10795134/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832576783155724288
author Miriam Barrabes
Maria Perera
Victor Novelle Moriano
Xavier Giro-I-Nieto
Daniel Mas Montserrat
Alexander G. Ioannidis
author_facet Miriam Barrabes
Maria Perera
Victor Novelle Moriano
Xavier Giro-I-Nieto
Daniel Mas Montserrat
Alexander G. Ioannidis
author_sort Miriam Barrabes
collection DOAJ
description Ensuring data quality in biomedical sciences is crucial for reliable research outcomes, particularly as precision medicine continues to gain prominence. Missing values compromise data quality and can make it difficult to perform data-based studies. The origins of missing values in biomedical datasets are diverse, including experimental errors, equipment malfunctions, and variations in data collection protocols tailored to individual patient conditions. To address the complex nature of missing values and the unique characteristics of biomedical data, a diverse spectrum of computational imputation techniques has emerged. These methods range from traditional statistical analysis to more modern approaches such as discriminative machine learning models and deep generative networks. This survey paper provides a comprehensive overview of the extensive literature on missing data imputation techniques, with a specific focus on applications in genomics, single-cell RNA sequencing, health records, and medical imaging. We outline the fundamental principles underlying each imputation technique and present a detailed analysis of their advantages and disadvantages, categorized by missing data patterns. To aid practitioners in method selection, we offer practical recommendations based on critical factors such as dataset size, data type, and missingness rate. By synthesizing insights from existing literature, we provide a holistic perspective on the effectiveness of various imputation methods under different biomedical contexts, thereby facilitating informed decision-making for researchers and practitioners in applying imputation techniques to biomedical data processing.
format Article
id doaj-art-c68b363daf28407a96cde17b9a087d53
institution Kabale University
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-c68b363daf28407a96cde17b9a087d532025-01-31T00:01:12ZengIEEEIEEE Access2169-35362025-01-0113169181693210.1109/ACCESS.2024.351650610795134Advances in Biomedical Missing Data Imputation: A SurveyMiriam Barrabes0https://orcid.org/0009-0007-7379-1658Maria Perera1Victor Novelle Moriano2Xavier Giro-I-Nieto3Daniel Mas Montserrat4Alexander G. Ioannidis5https://orcid.org/0000-0002-4735-7803Department of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, SpainDepartment of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, SpainDepartment of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, SpainDepartment of Signal Theory and Communications, Universitat Politècnica de Catalunya, Barcelona, SpainDepartment of Biomedical Data Science, Stanford University, Stanford, CA, USADepartment of Biomedical Data Science, Stanford University, Stanford, CA, USAEnsuring data quality in biomedical sciences is crucial for reliable research outcomes, particularly as precision medicine continues to gain prominence. Missing values compromise data quality and can make it difficult to perform data-based studies. The origins of missing values in biomedical datasets are diverse, including experimental errors, equipment malfunctions, and variations in data collection protocols tailored to individual patient conditions. To address the complex nature of missing values and the unique characteristics of biomedical data, a diverse spectrum of computational imputation techniques has emerged. These methods range from traditional statistical analysis to more modern approaches such as discriminative machine learning models and deep generative networks. This survey paper provides a comprehensive overview of the extensive literature on missing data imputation techniques, with a specific focus on applications in genomics, single-cell RNA sequencing, health records, and medical imaging. We outline the fundamental principles underlying each imputation technique and present a detailed analysis of their advantages and disadvantages, categorized by missing data patterns. To aid practitioners in method selection, we offer practical recommendations based on critical factors such as dataset size, data type, and missingness rate. By synthesizing insights from existing literature, we provide a holistic perspective on the effectiveness of various imputation methods under different biomedical contexts, thereby facilitating informed decision-making for researchers and practitioners in applying imputation techniques to biomedical data processing.https://ieeexplore.ieee.org/document/10795134/Biomedical data imputationdata-centric AIdeep learninggenomicshealth recordimputation
spellingShingle Miriam Barrabes
Maria Perera
Victor Novelle Moriano
Xavier Giro-I-Nieto
Daniel Mas Montserrat
Alexander G. Ioannidis
Advances in Biomedical Missing Data Imputation: A Survey
IEEE Access
Biomedical data imputation
data-centric AI
deep learning
genomics
health record
imputation
title Advances in Biomedical Missing Data Imputation: A Survey
title_full Advances in Biomedical Missing Data Imputation: A Survey
title_fullStr Advances in Biomedical Missing Data Imputation: A Survey
title_full_unstemmed Advances in Biomedical Missing Data Imputation: A Survey
title_short Advances in Biomedical Missing Data Imputation: A Survey
title_sort advances in biomedical missing data imputation a survey
topic Biomedical data imputation
data-centric AI
deep learning
genomics
health record
imputation
url https://ieeexplore.ieee.org/document/10795134/
work_keys_str_mv AT miriambarrabes advancesinbiomedicalmissingdataimputationasurvey
AT mariaperera advancesinbiomedicalmissingdataimputationasurvey
AT victornovellemoriano advancesinbiomedicalmissingdataimputationasurvey
AT xaviergiroinieto advancesinbiomedicalmissingdataimputationasurvey
AT danielmasmontserrat advancesinbiomedicalmissingdataimputationasurvey
AT alexandergioannidis advancesinbiomedicalmissingdataimputationasurvey