The proper application of logistic regression model in complex survey data: a systematic review

Abstract Background Logistic regression is a useful statistical technique commonly used in many fields like healthcare, marketing, or finance to generate insights from binary outcomes (e.g., sick vs. not sick). However, when applying logistic regression to complex survey data, which includes complex...

Full description

Saved in:
Bibliographic Details
Main Authors: Devjit Dey, Md. Samio Haque, Md. Mojahedul Islam, Umme Iffat Aishi, Sajida Sultana Shammy, Md. Sabbir Ahmed Mayen, Syed Toukir Ahmed Noor, Md. Jamal Uddin
Format: Article
Language:English
Published: BMC 2025-01-01
Series:BMC Medical Research Methodology
Subjects:
Online Access:https://doi.org/10.1186/s12874-024-02454-5
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832585669000560640
author Devjit Dey
Md. Samio Haque
Md. Mojahedul Islam
Umme Iffat Aishi
Sajida Sultana Shammy
Md. Sabbir Ahmed Mayen
Syed Toukir Ahmed Noor
Md. Jamal Uddin
author_facet Devjit Dey
Md. Samio Haque
Md. Mojahedul Islam
Umme Iffat Aishi
Sajida Sultana Shammy
Md. Sabbir Ahmed Mayen
Syed Toukir Ahmed Noor
Md. Jamal Uddin
author_sort Devjit Dey
collection DOAJ
description Abstract Background Logistic regression is a useful statistical technique commonly used in many fields like healthcare, marketing, or finance to generate insights from binary outcomes (e.g., sick vs. not sick). However, when applying logistic regression to complex survey data, which includes complex sampling designs, specific methodological issues are often overlooked. Methods The systematic review extensively searched the PubMed and ScienceDirect databases from January 2015 to December 2021, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, focusing primarily on the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS). 810 articles met the inclusion criteria and were included in the analysis. When discussing logistic regression, the review considered multiple methodological problems such as the model adequacy assessment, handling dependence of observations, utilization of complex survey design, dealing with missing values, outliers, and more. Results Among the selected articles, the DHS database was used the most (96%), with MICS accounting for only 3%, and both DHS and MICS accounting for 1%. Of these, it was found that only 19.7% of the studies employed multilevel mixed-effects logistic regression to account for data dependencies. Model validation techniques were not reported in 94.8% of the studies with limited uses of the bootstrap, jackknife, and other resampling methods. Moreover, sample weights, PSUs, and strata variables were used together in 40.4% of the articles, and 41.7% of the studies did not use any of these variables, which could have produced biased results. Goodness-of-fit assessments were not mentioned in 75.3% of the articles, and the Hosmer–Lemeshow and likelihood ratio test were the most common among those reported. Furthermore, 95.8% of studies did not mention outliers, and only 41.0% of studies corrected for missing information, while only 2.7% applied imputation techniques. Conclusions This systematic review highlights important gaps in the use of logistic regression with complex survey data, such as overlooking data dependencies, survey design, and proper validation techniques, along with neglecting outliers, missing data, and goodness-of-fit assessments, all of which point to the need for clearer methodological standards and more thorough reporting to improve the reliability of results. Future research should focus on consistently following these standards to ensure stronger and more dependable findings.
format Article
id doaj-art-c9bc12e57eed4e76ba6f22ff04a5631b
institution Kabale University
issn 1471-2288
language English
publishDate 2025-01-01
publisher BMC
record_format Article
series BMC Medical Research Methodology
spelling doaj-art-c9bc12e57eed4e76ba6f22ff04a5631b2025-01-26T12:39:33ZengBMCBMC Medical Research Methodology1471-22882025-01-0125111810.1186/s12874-024-02454-5The proper application of logistic regression model in complex survey data: a systematic reviewDevjit Dey0Md. Samio Haque1Md. Mojahedul Islam2Umme Iffat Aishi3Sajida Sultana Shammy4Md. Sabbir Ahmed Mayen5Syed Toukir Ahmed Noor6Md. Jamal Uddin7Department of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyDepartment of Statistics, Shahjalal University of Science and TechnologyAbstract Background Logistic regression is a useful statistical technique commonly used in many fields like healthcare, marketing, or finance to generate insights from binary outcomes (e.g., sick vs. not sick). However, when applying logistic regression to complex survey data, which includes complex sampling designs, specific methodological issues are often overlooked. Methods The systematic review extensively searched the PubMed and ScienceDirect databases from January 2015 to December 2021, following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 2020 guidelines, focusing primarily on the Demographic and Health Surveys (DHS) and Multiple Indicator Cluster Surveys (MICS). 810 articles met the inclusion criteria and were included in the analysis. When discussing logistic regression, the review considered multiple methodological problems such as the model adequacy assessment, handling dependence of observations, utilization of complex survey design, dealing with missing values, outliers, and more. Results Among the selected articles, the DHS database was used the most (96%), with MICS accounting for only 3%, and both DHS and MICS accounting for 1%. Of these, it was found that only 19.7% of the studies employed multilevel mixed-effects logistic regression to account for data dependencies. Model validation techniques were not reported in 94.8% of the studies with limited uses of the bootstrap, jackknife, and other resampling methods. Moreover, sample weights, PSUs, and strata variables were used together in 40.4% of the articles, and 41.7% of the studies did not use any of these variables, which could have produced biased results. Goodness-of-fit assessments were not mentioned in 75.3% of the articles, and the Hosmer–Lemeshow and likelihood ratio test were the most common among those reported. Furthermore, 95.8% of studies did not mention outliers, and only 41.0% of studies corrected for missing information, while only 2.7% applied imputation techniques. Conclusions This systematic review highlights important gaps in the use of logistic regression with complex survey data, such as overlooking data dependencies, survey design, and proper validation techniques, along with neglecting outliers, missing data, and goodness-of-fit assessments, all of which point to the need for clearer methodological standards and more thorough reporting to improve the reliability of results. Future research should focus on consistently following these standards to ensure stronger and more dependable findings.https://doi.org/10.1186/s12874-024-02454-5Logistic regressionComplex survey dataMethodological challengesModel selectionDHSMICS
spellingShingle Devjit Dey
Md. Samio Haque
Md. Mojahedul Islam
Umme Iffat Aishi
Sajida Sultana Shammy
Md. Sabbir Ahmed Mayen
Syed Toukir Ahmed Noor
Md. Jamal Uddin
The proper application of logistic regression model in complex survey data: a systematic review
BMC Medical Research Methodology
Logistic regression
Complex survey data
Methodological challenges
Model selection
DHS
MICS
title The proper application of logistic regression model in complex survey data: a systematic review
title_full The proper application of logistic regression model in complex survey data: a systematic review
title_fullStr The proper application of logistic regression model in complex survey data: a systematic review
title_full_unstemmed The proper application of logistic regression model in complex survey data: a systematic review
title_short The proper application of logistic regression model in complex survey data: a systematic review
title_sort proper application of logistic regression model in complex survey data a systematic review
topic Logistic regression
Complex survey data
Methodological challenges
Model selection
DHS
MICS
url https://doi.org/10.1186/s12874-024-02454-5
work_keys_str_mv AT devjitdey theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdsamiohaque theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdmojahedulislam theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT ummeiffataishi theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT sajidasultanashammy theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdsabbirahmedmayen theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT syedtoukirahmednoor theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdjamaluddin theproperapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT devjitdey properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdsamiohaque properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdmojahedulislam properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT ummeiffataishi properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT sajidasultanashammy properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdsabbirahmedmayen properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT syedtoukirahmednoor properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview
AT mdjamaluddin properapplicationoflogisticregressionmodelincomplexsurveydataasystematicreview