Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable

Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include varia...

Full description

Saved in:
Bibliographic Details
Main Authors: Sara Javadi, Abbas Bahrampour, Mohammad Mehdi Saber, Behshid Garrusi, Mohammad Reza Baneshi
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Journal of Probability and Statistics
Online Access:http://dx.doi.org/10.1155/2021/6668822
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832550226366300160
author Sara Javadi
Abbas Bahrampour
Mohammad Mehdi Saber
Behshid Garrusi
Mohammad Reza Baneshi
author_facet Sara Javadi
Abbas Bahrampour
Mohammad Mehdi Saber
Behshid Garrusi
Mohammad Reza Baneshi
author_sort Sara Javadi
collection DOAJ
description Multiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.
format Article
id doaj-art-8bb3e24ffce24872a91e17ac8b7dca77
institution Kabale University
issn 1687-952X
1687-9538
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Journal of Probability and Statistics
spelling doaj-art-8bb3e24ffce24872a91e17ac8b7dca772025-02-03T06:07:17ZengWileyJournal of Probability and Statistics1687-952X1687-95382021-01-01202110.1155/2021/66688226668822Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous VariableSara Javadi0Abbas Bahrampour1Mohammad Mehdi Saber2Behshid Garrusi3Mohammad Reza Baneshi4Department of Biostatistics and Epidemiology, School of Public Health, Kerman University of Medical Sciences, Kerman, IranModeling in Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, IranDepartment of Statistics, Higher Education Center of Eghlid, Eghlid, IranKerman Neuroscience Research Center, Institute of Neuropharmacology, Kerman University of Medical Sciences, Kerman, IranModeling in Health Research Center, Institute for Futures Studies in Health, Kerman University of Medical Sciences, Kerman, IranMultiple imputation by chained equations (MICE) is the most common method for imputing missing data. In the MICE algorithm, imputation can be performed using a variety of parametric and nonparametric methods. The default setting in the implementation of MICE is for imputation models to include variables as linear terms only with no interactions, but omission of interaction terms may lead to biased results. It is investigated, using simulated and real datasets, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. We compared four multiple imputation (MI) methods on a real and a simulated dataset. MI methods included using predictive mean matching with an interaction term in the imputation model in MICE (MICE-interaction), classification and regression tree (CART) for specifying the imputation model in MICE (MICE-CART), the implementation of random forest (RF) in MICE (MICE-RF), and MICE-Stratified method. We first selected secondary data and devised an experimental design that consisted of 40 scenarios (2 × 5 × 4), which differed by the rate of simulated missing data (10%, 20%, 30%, 40%, and 50%), the missing mechanism (MAR and MCAR), and imputation method (MICE-Interaction, MICE-CART, MICE-RF, and MICE-Stratified). First, we randomly drew 700 observations with replacement 300 times, and then the missing data were created. The evaluation was based on raw bias (RB) as well as five other measurements that were averaged over the repetitions. Next, in a simulation study, we generated data 1000 times with a sample size of 700. Then, we created missing data for each dataset once. For all scenarios, the same criteria were used as for real data to evaluate the performance of methods in the simulation study. It is concluded that, when there is an interaction effect between a dummy and a continuous predictor, substantial gains are possible by using recursive partitioning for imputation compared to parametric methods, and also, the MICE-Interaction method is always more efficient and convenient to preserve interaction effects than the other methods.http://dx.doi.org/10.1155/2021/6668822
spellingShingle Sara Javadi
Abbas Bahrampour
Mohammad Mehdi Saber
Behshid Garrusi
Mohammad Reza Baneshi
Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
Journal of Probability and Statistics
title Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
title_full Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
title_fullStr Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
title_full_unstemmed Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
title_short Evaluation of Four Multiple Imputation Methods for Handling Missing Binary Outcome Data in the Presence of an Interaction between a Dummy and a Continuous Variable
title_sort evaluation of four multiple imputation methods for handling missing binary outcome data in the presence of an interaction between a dummy and a continuous variable
url http://dx.doi.org/10.1155/2021/6668822
work_keys_str_mv AT sarajavadi evaluationoffourmultipleimputationmethodsforhandlingmissingbinaryoutcomedatainthepresenceofaninteractionbetweenadummyandacontinuousvariable
AT abbasbahrampour evaluationoffourmultipleimputationmethodsforhandlingmissingbinaryoutcomedatainthepresenceofaninteractionbetweenadummyandacontinuousvariable
AT mohammadmehdisaber evaluationoffourmultipleimputationmethodsforhandlingmissingbinaryoutcomedatainthepresenceofaninteractionbetweenadummyandacontinuousvariable
AT behshidgarrusi evaluationoffourmultipleimputationmethodsforhandlingmissingbinaryoutcomedatainthepresenceofaninteractionbetweenadummyandacontinuousvariable
AT mohammadrezabaneshi evaluationoffourmultipleimputationmethodsforhandlingmissingbinaryoutcomedatainthepresenceofaninteractionbetweenadummyandacontinuousvariable