Improving Diabetes Prediction by Selecting Optimal K and Distance Measures in KNN Classifier
Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
middle technical university
2024-09-01
|
Series: | Journal of Techniques |
Subjects: | |
Online Access: | https://journal.mtu.edu.iq/index.php/MTU/article/view/2587 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Diabetes is an illness that is widespread throughout the world and is considered a health concern, which requires work to explore advanced predictive techniques for early diagnosis of the illness. This paper discusses diabetes prediction by using the K-Nearest Neighbors (KNN) classifier, which is a widely used algorithm in machine learning. Most studies only dealt with investigating the optimal value of k in the KNN algorithm and did not address the best method to measure distance alone or together with the optimal value of k to improve the efficiency of diabetes prediction. This study simultaneously investigates both the optimal value of k and the optimal method for measuring distance to improve the performance of the KNN technique in predicting diabetes. By using and analyzing the Indian Diabetes PIMA dataset, this study seeks to discover the extent to which different parameters, especially the optimal value of K and distance metrics, affect the performance of the classifier. Through experiments that included applying different values for the K factor and using various distance measures, the study reached insights into maximizing the classifier's accuracy. The study shows that choosing the distance measure greatly affects the accuracy of classification and selecting the optimal K value helps eliminate problems of overfitting and underfitting, which is a feature of robust models for diabetes prediction. The research results showed that the best performance achieved was 80.5% when ????=35 and the Euclidean distance measure was used.
|
---|---|
ISSN: | 1818-653X 2708-8383 |