‘Machine Learning’ multiclassification for stage diagnosis of Alzheimer’s disease utilizing augmented blood gene expression and feature fusion

Abstract Objective The present study explores the classification of Alzheimer’s disease (AD) stages, encompassing cognitive normalcy, Mild Cognitive Impairment (MCI), and AD/Dementia, through the application of Machine Learning (ML) multiclassification algorithms. This investigation utilizes blood g...

Full description

Saved in:
Bibliographic Details
Main Authors: Manash Sarma, Subarna Chatterjee
Format: Article
Language:English
Published: Springer 2025-06-01
Series:Discover Applied Sciences
Subjects:
Online Access:https://doi.org/10.1007/s42452-025-07237-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objective The present study explores the classification of Alzheimer’s disease (AD) stages, encompassing cognitive normalcy, Mild Cognitive Impairment (MCI), and AD/Dementia, through the application of Machine Learning (ML) multiclassification algorithms. This investigation utilizes blood gene expression datasets obtained from participants in the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the National Center for Biotechnology Information (NCBI). Three blood gene expression datasets of high dimensionality and low sample size (HDLSS) have been utilized in this study, with one dataset exhibiting significant class imbalance. This study integrates clinical data from electronic health records (EHRs) with gene expression datasets, which has been found to significantly enhance the accuracy of stage diagnosis. Methods A combination of XGBoost and SFBS (“sequential floating backward selection”) methods is utilized to select features. Our research identified a subset of 95 gene transcripts exhibiting optimal efficacy from an extensive collection of over 49,000 transcripts within the ADNI gene expression dataset. Furthermore, our analysis of two integrated NCBI datasets revealed 125 gene transcripts demonstrating superior effectiveness among more than 30,000 potential candidates. These findings resulted in the development of two distinct model categories: one derived from the ADNI dataset and the other from the integrated NCBI dataset. DL classifier is used for developing models of both categories while GB (Gradient Boost), SVM (Support Vector Machine) classifier based models are built to identify AD stages from NCBI participants. Because of high data imbalance in genomic data, border line oversampling is explored for model training and original data for validation. We have conducted a multimodal analysis and stage classification by integrating the ADNI gene expression and clinical datasets using ‘Feature-Level Fusion’. Result In the case of ADNI study participants, we obtained best multi-classification performance with ‘ROC AUC’ scores of 0. 76, 0.76, 0.71 for the CN, MCI, and Dementia stages, respectively. We achieved F1 scores of 0.71, 0.77, 0.53 for these same categories. For the NCBI-based model, the best AUC scores of 0.82, 0.74, and 0.79 (for CN, MCI, and AD, respectively) and F1 scores of 0.75, 0.60, and 0.77 were attained when evaluated using GSE3060 test data. When assessed with GSE3061 test data, the model achieved optimal AUC scores of 0.81, 0.75, and 0.78, and F1 scores of 0.74, 0.67, and 0.73.This research identified MAPK14, MID1, TEP1, PLG, DRAXIN, USP47 as genes associated with AD. In the context of ADNI data, the integration of clinical data with gene expression data led to an enhancement of the best F1 scores to 0.85, 0.86, and 0.83 for CN, MCI, and AD, respectively. Additionally, the ROC AUC scores were improved to 0.90, 0.85, and 0.89. Conclusion Using machine learning multiclassification techniques on blood gene expression profile data from ADNI and NCBI, we achieved the most promising results to date for diagnosing multiple stages of Alzheimer’s disease. This proves that the efficacy of our feature selection techniques that could find essential genes associated with AD. Highly accurate of diagnosis of stages that include MCI from genetic data can potentially provide timely alert for individuals susceptible/predisposed to AD.
ISSN:3004-9261