Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier

Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a...

Full description

Saved in:
Bibliographic Details
Main Authors: Emad Majeed Hameed, Hardik Joshi
Format: Article
Language:English
Published: middle technical university 2024-09-01
Series:Journal of Techniques
Subjects:
Online Access:https://journal.mtu.edu.iq/index.php/MTU/article/view/2587
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832595210929963008
author Emad Majeed Hameed
Hardik Joshi
author_facet Emad Majeed Hameed
Hardik Joshi
author_sort Emad Majeed Hameed
collection DOAJ
description Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used.
format Article
id doaj-art-b57b9cadc7f94d7a858c8c3c4e8ce4af
institution Kabale University
issn 1818-653X
2708-8383
language English
publishDate 2024-09-01
publisher middle technical university
record_format Article
series Journal of Techniques
spelling doaj-art-b57b9cadc7f94d7a858c8c3c4e8ce4af2025-01-19T10:56:29Zengmiddle technical universityJournal of Techniques1818-653X2708-83832024-09-016310.51173/jt.v6i3.2587Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN ClassifierEmad Majeed Hameed0Hardik Joshi1https://orcid.org/0000-0002-0943-6383Department of Computer Science, Gujarat University, Ahmedabad, IndiaDepartment of Computer Science, Gujarat University, Ahmedabad, India Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used. https://journal.mtu.edu.iq/index.php/MTU/article/view/2587DiabetesPredictionFeature SelectionKNN
spellingShingle Emad Majeed Hameed
Hardik Joshi
Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
Journal of Techniques
Diabetes
Prediction
Feature Selection
KNN
title Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
title_full Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
title_fullStr Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
title_full_unstemmed Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
title_short Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
title_sort improving diabetes prediction by selecting optimal k and distance measures in knn classifier
topic Diabetes
Prediction
Feature Selection
KNN
url https://journal.mtu.edu.iq/index.php/MTU/article/view/2587
work_keys_str_mv AT emadmajeedhameed improvingdiabetespredictionbyselectingoptimalkanddistancemeasuresinknnclassifier
AT hardikjoshi improvingdiabetespredictionbyselectingoptimalkanddistancemeasuresinknnclassifier