Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach

Abstract BackgroundPrevious machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selec...

Full description

Saved in:
Bibliographic Details
Main Authors: David Agustriawan, Adithama Mulia, Marlinda Vasty Overbeek, Vincent Kurniawan, Jheno Syechlo, Moeljono Widjaja, Muhammad Imran Ahmad
Format: Article
Language:English
Published: JMIR Publications 2025-07-01
Series:JMIR Bioinformatics and Biotechnology
Online Access:https://bioinform.jmir.org/2025/1/e72423
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238984520105984
author David Agustriawan
Adithama Mulia
Marlinda Vasty Overbeek
Vincent Kurniawan
Jheno Syechlo
Moeljono Widjaja
Muhammad Imran Ahmad
author_facet David Agustriawan
Adithama Mulia
Marlinda Vasty Overbeek
Vincent Kurniawan
Jheno Syechlo
Moeljono Widjaja
Muhammad Imran Ahmad
author_sort David Agustriawan
collection DOAJ
description Abstract BackgroundPrevious machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles. ObjectiveWe aim to develop a classification method for diagnosing prostate cancer using gene expression in specific populations. MethodsThis research uses differentially expressed gene analysis, receiver operating characteristic analysis, and MSigDB (Molecular Signature Database) verification as a feature selection framework to identify genes for constructing support vector machine models. ResultsAmong the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for White patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved a similarly strong performance, with 97% accuracy for White patients and 95% for African American patients, using only 9 gene features. It was trained on 374 samples and tested on 138 samples. ConclusionsThe findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.
format Article
id doaj-art-da277ea0d7104e00a8fee349d48a08e5
institution Kabale University
issn 2563-3570
language English
publishDate 2025-07-01
publisher JMIR Publications
record_format Article
series JMIR Bioinformatics and Biotechnology
spelling doaj-art-da277ea0d7104e00a8fee349d48a08e52025-08-20T04:01:16ZengJMIR PublicationsJMIR Bioinformatics and Biotechnology2563-35702025-07-016e72423e7242310.2196/72423Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization ApproachDavid Agustriawanhttp://orcid.org/0000-0003-1185-1145Adithama Muliahttp://orcid.org/0009-0005-6885-0575Marlinda Vasty Overbeekhttp://orcid.org/0000-0003-2590-843XVincent Kurniawanhttp://orcid.org/0009-0004-1238-5232Jheno Syechlohttp://orcid.org/0009-0001-5557-5085Moeljono Widjajahttp://orcid.org/0000-0003-3002-7426Muhammad Imran Ahmadhttp://orcid.org/0000-0002-9157-5998 Abstract BackgroundPrevious machine learning approaches for prostate cancer detection using gene expression data have shown remarkable classification accuracies. However, prior studies overlook the influence of racial diversity within the population and the importance of selecting outlier genes based on expression profiles. ObjectiveWe aim to develop a classification method for diagnosing prostate cancer using gene expression in specific populations. MethodsThis research uses differentially expressed gene analysis, receiver operating characteristic analysis, and MSigDB (Molecular Signature Database) verification as a feature selection framework to identify genes for constructing support vector machine models. ResultsAmong the models evaluated, the highest observed accuracy was achieved using 139 gene features without oversampling, resulting in 98% accuracy for White patients and 97% for African American patients, based on 388 training samples and 92 testing samples. Notably, another model achieved a similarly strong performance, with 97% accuracy for White patients and 95% for African American patients, using only 9 gene features. It was trained on 374 samples and tested on 138 samples. ConclusionsThe findings identify a race-specific diagnosis method for prostate cancer detection using enhanced feature selection and machine learning. This approach emphasizes the potential for developing unbiased diagnostic tools in specific populations.https://bioinform.jmir.org/2025/1/e72423
spellingShingle David Agustriawan
Adithama Mulia
Marlinda Vasty Overbeek
Vincent Kurniawan
Jheno Syechlo
Moeljono Widjaja
Muhammad Imran Ahmad
Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
JMIR Bioinformatics and Biotechnology
title Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
title_full Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
title_fullStr Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
title_full_unstemmed Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
title_short Framework for Race-Specific Prostate Cancer Detection Using Machine Learning Through Gene Expression Data: Feature Selection Optimization Approach
title_sort framework for race specific prostate cancer detection using machine learning through gene expression data feature selection optimization approach
url https://bioinform.jmir.org/2025/1/e72423
work_keys_str_mv AT davidagustriawan frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT adithamamulia frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT marlindavastyoverbeek frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT vincentkurniawan frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT jhenosyechlo frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT moeljonowidjaja frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach
AT muhammadimranahmad frameworkforracespecificprostatecancerdetectionusingmachinelearningthroughgeneexpressiondatafeatureselectionoptimizationapproach