Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
middle technical university
2024-09-01
|
Series: | Journal of Techniques |
Subjects: | |
Online Access: | https://journal.mtu.edu.iq/index.php/MTU/article/view/2587 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832595210929963008 |
---|---|
author | Emad Majeed Hameed Hardik Joshi |
author_facet | Emad Majeed Hameed Hardik Joshi |
author_sort | Emad Majeed Hameed |
collection | DOAJ |
description |
Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used.
|
format | Article |
id | doaj-art-b57b9cadc7f94d7a858c8c3c4e8ce4af |
institution | Kabale University |
issn | 1818-653X 2708-8383 |
language | English |
publishDate | 2024-09-01 |
publisher | middle technical university |
record_format | Article |
series | Journal of Techniques |
spelling | doaj-art-b57b9cadc7f94d7a858c8c3c4e8ce4af2025-01-19T10:56:29Zengmiddle technical universityJournal of Techniques1818-653X2708-83832024-09-016310.51173/jt.v6i3.2587Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN ClassifierEmad Majeed Hameed0Hardik Joshi1https://orcid.org/0000-0002-0943-6383Department of Computer Science, Gujarat University, Ahmedabad, IndiaDepartment of Computer Science, Gujarat University, Ahmedabad, India Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used. https://journal.mtu.edu.iq/index.php/MTU/article/view/2587DiabetesPredictionFeature SelectionKNN |
spellingShingle | Emad Majeed Hameed Hardik Joshi Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier Journal of Techniques Diabetes Prediction Feature Selection KNN |
title | Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier |
title_full | Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier |
title_fullStr | Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier |
title_full_unstemmed | Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier |
title_short | Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier |
title_sort | improving diabetes prediction by selecting optimal k and distance measures in knn classifier |
topic | Diabetes Prediction Feature Selection KNN |
url | https://journal.mtu.edu.iq/index.php/MTU/article/view/2587 |
work_keys_str_mv | AT emadmajeedhameed improvingdiabetespredictionbyselectingoptimalkanddistancemeasuresinknnclassifier AT hardikjoshi improvingdiabetespredictionbyselectingoptimalkanddistancemeasuresinknnclassifier |