Performance Analysis of Diabetes Detection Using Machine Learning Classifiers

Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes d...

Full description

Saved in:
Bibliographic Details
Main Authors: Hung Huynh, Liu Hui, Ngoc Han Nguyen, Ruixuan Qiao
Format: Article
Language:English
Published: IJMADA 2024-10-01
Series:International Journal of Management and Data Analytics
Subjects:
Online Access:https://ijmada.com/index.php/ijmada/article/view/50
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832593415170162688
author Hung Huynh
Liu Hui
Ngoc Han Nguyen
Ruixuan Qiao
author_facet Hung Huynh
Liu Hui
Ngoc Han Nguyen
Ruixuan Qiao
author_sort Hung Huynh
collection DOAJ
description Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes depend on few data points and are prone to mistakes, resulting in premature action. Additionally, the sluggish adoption of modern machine learning (ML) technologies in the healthcare industries might be due to their misunderstanding of the systems’ decision making procedures.  This study purports to fill that gap by looking at various machine learning (ML) algorithms and applying them on the PIMA Indians Diabetes Dataset provided by the National Health Institute of Diabetes and Digestive and Kidney Diseases with the aim of improving the validity of diabetes prediction and diagnosis. Three types of machine learning classifiers are used: Tree-based, Function-based, and Rule-based. Results have shown that Stochastic Gradient Descent (function), Logistic Regression (function), JRip (rules) and Random Forests (trees) are among the top performing classifiers. They are judged based on different metrics, such as accuracy, precision, recall, specificity, F-1 score, MCC, and ROC area. Despite performing well in almost all of the metrics, SGD’s low recall score shows that it is not the most optimal algorithm. Given that recall score is prioritized in the context of clinical diagnostics, Random Forest emerges as a strong candidate due to its balanced performance across key metrics.
format Article
id doaj-art-67316a78c4214f6e9a12d3047c766d12
institution Kabale University
issn 2816-9395
language English
publishDate 2024-10-01
publisher IJMADA
record_format Article
series International Journal of Management and Data Analytics
spelling doaj-art-67316a78c4214f6e9a12d3047c766d122025-01-20T15:45:31ZengIJMADAInternational Journal of Management and Data Analytics2816-93952024-10-0141435450Performance Analysis of Diabetes Detection Using Machine Learning ClassifiersHung Huynh0Liu Hui1Ngoc Han Nguyen2Ruixuan Qiao3University Canada WestUniversity Canada WestUniversity Canada WestUniversity Canada WestDiabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes depend on few data points and are prone to mistakes, resulting in premature action. Additionally, the sluggish adoption of modern machine learning (ML) technologies in the healthcare industries might be due to their misunderstanding of the systems’ decision making procedures.  This study purports to fill that gap by looking at various machine learning (ML) algorithms and applying them on the PIMA Indians Diabetes Dataset provided by the National Health Institute of Diabetes and Digestive and Kidney Diseases with the aim of improving the validity of diabetes prediction and diagnosis. Three types of machine learning classifiers are used: Tree-based, Function-based, and Rule-based. Results have shown that Stochastic Gradient Descent (function), Logistic Regression (function), JRip (rules) and Random Forests (trees) are among the top performing classifiers. They are judged based on different metrics, such as accuracy, precision, recall, specificity, F-1 score, MCC, and ROC area. Despite performing well in almost all of the metrics, SGD’s low recall score shows that it is not the most optimal algorithm. Given that recall score is prioritized in the context of clinical diagnostics, Random Forest emerges as a strong candidate due to its balanced performance across key metrics.https://ijmada.com/index.php/ijmada/article/view/50diabetes prediction and diagnosis, machine learning, classifiers, algorithms.
spellingShingle Hung Huynh
Liu Hui
Ngoc Han Nguyen
Ruixuan Qiao
Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
International Journal of Management and Data Analytics
diabetes prediction and diagnosis, machine learning, classifiers, algorithms.
title Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
title_full Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
title_fullStr Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
title_full_unstemmed Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
title_short Performance Analysis of Diabetes Detection Using Machine Learning Classifiers
title_sort performance analysis of diabetes detection using machine learning classifiers
topic diabetes prediction and diagnosis, machine learning, classifiers, algorithms.
url https://ijmada.com/index.php/ijmada/article/view/50
work_keys_str_mv AT hunghuynh performanceanalysisofdiabetesdetectionusingmachinelearningclassifiers
AT liuhui performanceanalysisofdiabetesdetectionusingmachinelearningclassifiers
AT ngochannguyen performanceanalysisofdiabetesdetectionusingmachinelearningclassifiers
AT ruixuanqiao performanceanalysisofdiabetesdetectionusingmachinelearningclassifiers