An improved K‐means algorithm for big data
Abstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last t...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2022-02-01
|
Series: | IET Software |
Subjects: | |
Online Access: | https://doi.org/10.1049/sfw2.12032 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832559663415033856 |
---|---|
author | Fatemeh Moodi Hamid Saadatfar |
author_facet | Fatemeh Moodi Hamid Saadatfar |
author_sort | Fatemeh Moodi |
collection | DOAJ |
description | Abstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index —cluster radius—again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best‐case scenario. According to the findings, the proposed method is very beneficial to big data. |
format | Article |
id | doaj-art-ecab59d800fe4677a253e0cc10ca1cfd |
institution | Kabale University |
issn | 1751-8806 1751-8814 |
language | English |
publishDate | 2022-02-01 |
publisher | Wiley |
record_format | Article |
series | IET Software |
spelling | doaj-art-ecab59d800fe4677a253e0cc10ca1cfd2025-02-03T01:29:37ZengWileyIET Software1751-88061751-88142022-02-01161485910.1049/sfw2.12032An improved K‐means algorithm for big dataFatemeh Moodi0Hamid Saadatfar1Computer Engineering Department Hormozan Higher Education Institute Birjand IranComputer Engineering Department University of Birjand Birjand IranAbstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index —cluster radius—again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best‐case scenario. According to the findings, the proposed method is very beneficial to big data.https://doi.org/10.1049/sfw2.12032iterative methodspattern clusteringBig Data |
spellingShingle | Fatemeh Moodi Hamid Saadatfar An improved K‐means algorithm for big data IET Software iterative methods pattern clustering Big Data |
title | An improved K‐means algorithm for big data |
title_full | An improved K‐means algorithm for big data |
title_fullStr | An improved K‐means algorithm for big data |
title_full_unstemmed | An improved K‐means algorithm for big data |
title_short | An improved K‐means algorithm for big data |
title_sort | improved k means algorithm for big data |
topic | iterative methods pattern clustering Big Data |
url | https://doi.org/10.1049/sfw2.12032 |
work_keys_str_mv | AT fatemehmoodi animprovedkmeansalgorithmforbigdata AT hamidsaadatfar animprovedkmeansalgorithmforbigdata AT fatemehmoodi improvedkmeansalgorithmforbigdata AT hamidsaadatfar improvedkmeansalgorithmforbigdata |