Performance Analysis of Diabetes Detection Using Machine Learning Classifiers

Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes d...

Full description

Saved in:

Bibliographic Details
Main Authors:	Hung Huynh, Liu Hui, Ngoc Han Nguyen, Ruixuan Qiao
Format:	Article
Language:	English
Published:	IJMADA 2024-10-01
Series:	International Journal of Management and Data Analytics
Subjects:	diabetes prediction and diagnosis, machine learning, classifiers, algorithms.
Online Access:	https://ijmada.com/index.php/ijmada/article/view/50
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Diabetes is a chronic medical condition that has been causing severe public health challenges in not only Canada, but the entire world, for as long as time immemorial, impacting millions of people and putting pressure on healthcare resources. That said, conventional diagnostic procedures sometimes depend on few data points and are prone to mistakes, resulting in premature action. Additionally, the sluggish adoption of modern machine learning (ML) technologies in the healthcare industries might be due to their misunderstanding of the systems’ decision making procedures. This study purports to fill that gap by looking at various machine learning (ML) algorithms and applying them on the PIMA Indians Diabetes Dataset provided by the National Health Institute of Diabetes and Digestive and Kidney Diseases with the aim of improving the validity of diabetes prediction and diagnosis. Three types of machine learning classifiers are used: Tree-based, Function-based, and Rule-based. Results have shown that Stochastic Gradient Descent (function), Logistic Regression (function), JRip (rules) and Random Forests (trees) are among the top performing classifiers. They are judged based on different metrics, such as accuracy, precision, recall, specificity, F-1 score, MCC, and ROC area. Despite performing well in almost all of the metrics, SGD’s low recall score shows that it is not the most optimal algorithm. Given that recall score is prioritized in the context of clinical diagnostics, Random Forest emerges as a strong candidate due to its balanced performance across key metrics.
ISSN:	2816-9395

Performance Analysis of Diabetes Detection Using Machine Learning Classifiers

Similar Items