An improved K‐means algorithm for big data

Abstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last t...

Full description

Saved in:
Bibliographic Details
Main Authors: Fatemeh Moodi, Hamid Saadatfar
Format: Article
Language:English
Published: Wiley 2022-02-01
Series:IET Software
Subjects:
Online Access:https://doi.org/10.1049/sfw2.12032
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832559663415033856
author Fatemeh Moodi
Hamid Saadatfar
author_facet Fatemeh Moodi
Hamid Saadatfar
author_sort Fatemeh Moodi
collection DOAJ
description Abstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index —cluster radius—again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best‐case scenario. According to the findings, the proposed method is very beneficial to big data.
format Article
id doaj-art-ecab59d800fe4677a253e0cc10ca1cfd
institution Kabale University
issn 1751-8806
1751-8814
language English
publishDate 2022-02-01
publisher Wiley
record_format Article
series IET Software
spelling doaj-art-ecab59d800fe4677a253e0cc10ca1cfd2025-02-03T01:29:37ZengWileyIET Software1751-88061751-88142022-02-01161485910.1049/sfw2.12032An improved K‐means algorithm for big dataFatemeh Moodi0Hamid Saadatfar1Computer Engineering Department Hormozan Higher Education Institute Birjand IranComputer Engineering Department University of Birjand Birjand IranAbstract An improved version of K‐means clustering algorithm that can be applied to big data through lower processing loads with acceptable precision rates is presented here. In this method, the distances from one point to its two nearest centroids were used along with their variations in the last two iterations. Points with an equidistance threshold greater than the equidistance index were eliminated from the distance calculations and were stabilised in the cluster. Although these points are compared with the research index —cluster radius—again in the algorithm iteration, the excluded points are again included in the calculations if their distances from the stabilised cluster centroid are longer than the cluster radius. This can improve the clustering quality. Computerised tests as well as synthetic and real samples show that this method is able to improve the clustering quality by up to 41.85% in the best‐case scenario. According to the findings, the proposed method is very beneficial to big data.https://doi.org/10.1049/sfw2.12032iterative methodspattern clusteringBig Data
spellingShingle Fatemeh Moodi
Hamid Saadatfar
An improved K‐means algorithm for big data
IET Software
iterative methods
pattern clustering
Big Data
title An improved K‐means algorithm for big data
title_full An improved K‐means algorithm for big data
title_fullStr An improved K‐means algorithm for big data
title_full_unstemmed An improved K‐means algorithm for big data
title_short An improved K‐means algorithm for big data
title_sort improved k means algorithm for big data
topic iterative methods
pattern clustering
Big Data
url https://doi.org/10.1049/sfw2.12032
work_keys_str_mv AT fatemehmoodi animprovedkmeansalgorithmforbigdata
AT hamidsaadatfar animprovedkmeansalgorithmforbigdata
AT fatemehmoodi improvedkmeansalgorithmforbigdata
AT hamidsaadatfar improvedkmeansalgorithmforbigdata