Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction

Abstract Leukemia is the most prevalent form of blood cancer, affecting individuals across all age groups. Early and accurate diagnosis is crucial for effective treatment and improved clinical outcomes. Peripheral blood smear analysis, a key non-invasive diagnostic tool, often suffers from subjectiv...

Full description

Saved in:
Bibliographic Details
Main Authors: Sazzli Kasim, Sorayya Malek, JunJie Tang, Xue Ning Kiew, Song Cheen, Bryan Liew, Norashikin Saidon, Raja Ezman, Raja Shariff
Format: Article
Language:English
Published: Nature Portfolio 2025-07-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-05585-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849238527823314944
author Sazzli Kasim
Sorayya Malek
JunJie Tang
Xue Ning Kiew
Song Cheen
Bryan Liew
Norashikin Saidon
Raja Ezman
Raja Shariff
author_facet Sazzli Kasim
Sorayya Malek
JunJie Tang
Xue Ning Kiew
Song Cheen
Bryan Liew
Norashikin Saidon
Raja Ezman
Raja Shariff
author_sort Sazzli Kasim
collection DOAJ
description Abstract Leukemia is the most prevalent form of blood cancer, affecting individuals across all age groups. Early and accurate diagnosis is crucial for effective treatment and improved clinical outcomes. Peripheral blood smear analysis, a key non-invasive diagnostic tool, often suffers from subjective interpretation, inter-observer variability, and a lack of readily available expertise. Although deep learning approaches, particularly Convolutional Neural Networks (CNNs), have demonstrated exceptional performance in binary classification tasks, multiclass classification of leukemia subtypes remains challenging due to limited data availability and morphological similarities between subtypes. This study presents a novel hybrid methodology that combines pre-trained CNN architectures, including VGG16, InceptionV3, and ResNet50, with advanced classification models such as Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and the deep learning-based Multi-Layer Perceptron (MLP). The method leverages publicly available datasets, the Acute Lymphoblastic Leukemia Image Database (ALL-IDB) and the Munich AML Morphology Dataset, to classify healthy cells, lymphoblasts, and myeloblasts. Pre-trained CNNs are employed for feature extraction, while the classifiers refine the predictions for improved accuracy. The proposed approach demonstrated exceptional performance, with the InceptionV3 + SVM combination achieving the highest accuracy of 88%, followed closely by VGG16 + XGBoost at 87%. MLP-based models also achieved strong results, effectively capturing non-linear patterns in the data. In contrast, ResNet50 exhibited limitations, likely due to overfitting caused by the small dataset. The novelty of this work lies in the integration of pre-trained deep learning architectures with hybrid classification techniques, enabling robust multiclass classification in data-constrained scenarios. This innovative approach offers a scalable and precise diagnostic tool, improving the speed and reliability of leukemia subtype identification and providing significant potential to enhance clinical decision-making and patient care.
format Article
id doaj-art-3bc1594de3c14fc5814e7266d7397eed
institution Kabale University
issn 2045-2322
language English
publishDate 2025-07-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-3bc1594de3c14fc5814e7266d7397eed2025-08-20T04:01:35ZengNature PortfolioScientific Reports2045-23222025-07-0115111410.1038/s41598-025-05585-xMulticlass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extractionSazzli Kasim0Sorayya Malek1JunJie Tang2Xue Ning Kiew3Song Cheen4Bryan Liew5Norashikin Saidon6Raja Ezman7Raja Shariff8Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM)Institute of Biological Sciences, Faculty of Science, University MalayaInstitute of Biological Sciences, Faculty of Science, University MalayaInstitute of Biological Sciences, Faculty of Science, University MalayaMicrobiome Research Centre, Monash University MalaysiaInstitute of Biological Sciences, Faculty of Science, University MalayaFaculty of Medicine, Universiti Teknologi MARA (UiTM)Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM)Cardiology Department, Faculty of Medicine, Universiti Teknologi MARA (UiTM)Abstract Leukemia is the most prevalent form of blood cancer, affecting individuals across all age groups. Early and accurate diagnosis is crucial for effective treatment and improved clinical outcomes. Peripheral blood smear analysis, a key non-invasive diagnostic tool, often suffers from subjective interpretation, inter-observer variability, and a lack of readily available expertise. Although deep learning approaches, particularly Convolutional Neural Networks (CNNs), have demonstrated exceptional performance in binary classification tasks, multiclass classification of leukemia subtypes remains challenging due to limited data availability and morphological similarities between subtypes. This study presents a novel hybrid methodology that combines pre-trained CNN architectures, including VGG16, InceptionV3, and ResNet50, with advanced classification models such as Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), and the deep learning-based Multi-Layer Perceptron (MLP). The method leverages publicly available datasets, the Acute Lymphoblastic Leukemia Image Database (ALL-IDB) and the Munich AML Morphology Dataset, to classify healthy cells, lymphoblasts, and myeloblasts. Pre-trained CNNs are employed for feature extraction, while the classifiers refine the predictions for improved accuracy. The proposed approach demonstrated exceptional performance, with the InceptionV3 + SVM combination achieving the highest accuracy of 88%, followed closely by VGG16 + XGBoost at 87%. MLP-based models also achieved strong results, effectively capturing non-linear patterns in the data. In contrast, ResNet50 exhibited limitations, likely due to overfitting caused by the small dataset. The novelty of this work lies in the integration of pre-trained deep learning architectures with hybrid classification techniques, enabling robust multiclass classification in data-constrained scenarios. This innovative approach offers a scalable and precise diagnostic tool, improving the speed and reliability of leukemia subtype identification and providing significant potential to enhance clinical decision-making and patient care.https://doi.org/10.1038/s41598-025-05585-xLeukemiaDeep learningCNNAcute lymphoblastic leukemia image databaseMunich AML morphology dataset
spellingShingle Sazzli Kasim
Sorayya Malek
JunJie Tang
Xue Ning Kiew
Song Cheen
Bryan Liew
Norashikin Saidon
Raja Ezman
Raja Shariff
Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
Scientific Reports
Leukemia
Deep learning
CNN
Acute lymphoblastic leukemia image database
Munich AML morphology dataset
title Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
title_full Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
title_fullStr Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
title_full_unstemmed Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
title_short Multiclass leukemia cell classification using hybrid deep learning and machine learning with CNN-based feature extraction
title_sort multiclass leukemia cell classification using hybrid deep learning and machine learning with cnn based feature extraction
topic Leukemia
Deep learning
CNN
Acute lymphoblastic leukemia image database
Munich AML morphology dataset
url https://doi.org/10.1038/s41598-025-05585-x
work_keys_str_mv AT sazzlikasim multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT sorayyamalek multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT junjietang multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT xueningkiew multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT songcheen multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT bryanliew multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT norashikinsaidon multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT rajaezman multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction
AT rajashariff multiclassleukemiacellclassificationusinghybriddeeplearningandmachinelearningwithcnnbasedfeatureextraction