The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis

BackgroundHIV testing is the cornerstone of HIV prevention and a pivotal step in realizing the Joint United Nations Program on HIV/AIDS (UNAIDS) goal of ending AIDS by 2030. Despite the availability of relevant survey data, there exists a research gap in using machine learnin...

Full description

Saved in:
Bibliographic Details
Main Authors: Musa Jaiteh, Edith Phalane, Yegnanew A Shiferaw, Refilwe Nancy Phaswana-Mafuya
Format: Article
Language:English
Published: JMIR Publications 2025-01-01
Series:JMIR Research Protocols
Online Access:https://www.researchprotocols.org/2025/1/e59916
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832584022798106624
author Musa Jaiteh
Edith Phalane
Yegnanew A Shiferaw
Refilwe Nancy Phaswana-Mafuya
author_facet Musa Jaiteh
Edith Phalane
Yegnanew A Shiferaw
Refilwe Nancy Phaswana-Mafuya
author_sort Musa Jaiteh
collection DOAJ
description BackgroundHIV testing is the cornerstone of HIV prevention and a pivotal step in realizing the Joint United Nations Program on HIV/AIDS (UNAIDS) goal of ending AIDS by 2030. Despite the availability of relevant survey data, there exists a research gap in using machine learning (ML) to analyze and predict HIV testing among adults in South Africa. Further investigation is needed to bridge this knowledge gap and inform evidence-based interventions to improve HIV testing. ObjectiveThis study aims to determine consistent predictors of HIV testing by applying supervised ML algorithms in repeated adult population-based surveys in South Africa. MethodsA retrospective analysis of multiwave cross-sectional survey data will be conducted to determine the predictors of HIV testing among South African adults aged 18 years and older. A supervised ML technique will be applied across the five cycles of the South African National HIV Prevalence, Incidence, Behavior, and Communication Survey (SABSSM) surveys. The Human Science Research Council (HSRC) conducted the SABSSM surveys in 2002, 2005, 2008, 2012, and 2017. The available SABSSM datasets will be imported to RStudio (version 4.3.2; Posit Software, PBC) to clean and remove outliers. A chi-square test will be conducted to select important predictors of HIV testing. Each dataset will be split into 80% training and 20% test samples. Logistic regression, support vector machines, random forests, and decision trees will be used. A cross-validation technique will be used to divide the training sample into k-folds, including a validation set, and models will be trained on each fold. The models’ performance will be evaluated on the validation set using evaluation metrics such as accuracy, precision, recall, F1-score, area under curve-receiver operating characteristics, and confusion matrix. ResultsThe SABSSM datasets are open access datasets available on the HSRC database. Ethics approval for this study was obtained from the University of Johannesburg Research and Ethics Committee on April 23, 2024 (REC-2725-2024). The authors were given access to all five SABSSM datasets by the HSRC on August 20, 2024. The datasets were explored to identify the independent variables likely influencing HIV testing uptake. The findings of this study will determine consistent variables predicting HIV testing uptake among the South African adult population over the course of 20 years. Furthermore, this study will evaluate and compare the performance metrics of the 4 different ML algorithms, and the best model will be used to develop an HIV testing predictive model. ConclusionsThis study will contribute to existing knowledge and deepen understanding of factors linked to HIV testing beyond traditional methods. Consequently, the findings would inform evidence-based policy recommendations that can guide policy makers to formulate more effective and targeted public health approaches toward strengthening HIV testing. International Registered Report Identifier (IRRID)DERR1-10.2196/59916
format Article
id doaj-art-c326d6b227204e6e9491e27f4dd79a1a
institution Kabale University
issn 1929-0748
language English
publishDate 2025-01-01
publisher JMIR Publications
record_format Article
series JMIR Research Protocols
spelling doaj-art-c326d6b227204e6e9491e27f4dd79a1a2025-01-27T22:00:32ZengJMIR PublicationsJMIR Research Protocols1929-07482025-01-0114e5991610.2196/59916The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional AnalysisMusa Jaitehhttps://orcid.org/0000-0001-6920-9919Edith Phalanehttps://orcid.org/0000-0001-6128-2337Yegnanew A Shiferawhttps://orcid.org/0000-0002-2422-4768Refilwe Nancy Phaswana-Mafuyahttps://orcid.org/0000-0001-9387-0432 BackgroundHIV testing is the cornerstone of HIV prevention and a pivotal step in realizing the Joint United Nations Program on HIV/AIDS (UNAIDS) goal of ending AIDS by 2030. Despite the availability of relevant survey data, there exists a research gap in using machine learning (ML) to analyze and predict HIV testing among adults in South Africa. Further investigation is needed to bridge this knowledge gap and inform evidence-based interventions to improve HIV testing. ObjectiveThis study aims to determine consistent predictors of HIV testing by applying supervised ML algorithms in repeated adult population-based surveys in South Africa. MethodsA retrospective analysis of multiwave cross-sectional survey data will be conducted to determine the predictors of HIV testing among South African adults aged 18 years and older. A supervised ML technique will be applied across the five cycles of the South African National HIV Prevalence, Incidence, Behavior, and Communication Survey (SABSSM) surveys. The Human Science Research Council (HSRC) conducted the SABSSM surveys in 2002, 2005, 2008, 2012, and 2017. The available SABSSM datasets will be imported to RStudio (version 4.3.2; Posit Software, PBC) to clean and remove outliers. A chi-square test will be conducted to select important predictors of HIV testing. Each dataset will be split into 80% training and 20% test samples. Logistic regression, support vector machines, random forests, and decision trees will be used. A cross-validation technique will be used to divide the training sample into k-folds, including a validation set, and models will be trained on each fold. The models’ performance will be evaluated on the validation set using evaluation metrics such as accuracy, precision, recall, F1-score, area under curve-receiver operating characteristics, and confusion matrix. ResultsThe SABSSM datasets are open access datasets available on the HSRC database. Ethics approval for this study was obtained from the University of Johannesburg Research and Ethics Committee on April 23, 2024 (REC-2725-2024). The authors were given access to all five SABSSM datasets by the HSRC on August 20, 2024. The datasets were explored to identify the independent variables likely influencing HIV testing uptake. The findings of this study will determine consistent variables predicting HIV testing uptake among the South African adult population over the course of 20 years. Furthermore, this study will evaluate and compare the performance metrics of the 4 different ML algorithms, and the best model will be used to develop an HIV testing predictive model. ConclusionsThis study will contribute to existing knowledge and deepen understanding of factors linked to HIV testing beyond traditional methods. Consequently, the findings would inform evidence-based policy recommendations that can guide policy makers to formulate more effective and targeted public health approaches toward strengthening HIV testing. International Registered Report Identifier (IRRID)DERR1-10.2196/59916https://www.researchprotocols.org/2025/1/e59916
spellingShingle Musa Jaiteh
Edith Phalane
Yegnanew A Shiferaw
Refilwe Nancy Phaswana-Mafuya
The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
JMIR Research Protocols
title The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
title_full The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
title_fullStr The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
title_full_unstemmed The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
title_short The Application of Machine Learning Algorithms to Predict HIV Testing in Repeated Adult Population–Based Surveys in South Africa: Protocol for a Multiwave Cross-Sectional Analysis
title_sort application of machine learning algorithms to predict hiv testing in repeated adult population based surveys in south africa protocol for a multiwave cross sectional analysis
url https://www.researchprotocols.org/2025/1/e59916
work_keys_str_mv AT musajaiteh theapplicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT edithphalane theapplicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT yegnanewashiferaw theapplicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT refilwenancyphaswanamafuya theapplicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT musajaiteh applicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT edithphalane applicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT yegnanewashiferaw applicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis
AT refilwenancyphaswanamafuya applicationofmachinelearningalgorithmstopredicthivtestinginrepeatedadultpopulationbasedsurveysinsouthafricaprotocolforamultiwavecrosssectionalanalysis