Self-Adaptive K-Means Based on a Covering Algorithm

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial c...

Full description

Saved in:

Bibliographic Details
Main Authors:	Yiwen Zhang, Yuanyuan Zhou, Xing Guo, Jintao Wu, Qiang He, Xiao Liu, Yun Yang
Format:	Article
Language:	English
Published:	Wiley 2018-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2018/7698274
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832548969981411328
author	Yiwen Zhang Yuanyuan Zhou Xing Guo Jintao Wu Qiang He Xiao Liu Yun Yang
author_facet	Yiwen Zhang Yuanyuan Zhou Xing Guo Jintao Wu Qiang He Xiao Liu Yun Yang
author_sort	Yiwen Zhang
collection	DOAJ
description	The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.
format	Article
id	doaj-art-bb474532500943828954fb035e1ba1e4
institution	Kabale University
issn	1076-2787 1099-0526
language	English
publishDate	2018-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-bb474532500943828954fb035e1ba1e42025-02-03T06:12:36ZengWileyComplexity1076-27871099-05262018-01-01201810.1155/2018/76982747698274Self-Adaptive K-Means Based on a Covering AlgorithmYiwen Zhang0Yuanyuan Zhou1Xing Guo2Jintao Wu3Qiang He4Xiao Liu5Yun Yang6School of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Computer Science and Technology, Anhui University, Hefei 230601, ChinaSchool of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, VIC 3122, AustraliaSchool of Information Technology, Deakin University, Melbourne, VIC 3125, AustraliaSchool of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, VIC 3122, AustraliaThe K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.http://dx.doi.org/10.1155/2018/7698274
spellingShingle	Yiwen Zhang Yuanyuan Zhou Xing Guo Jintao Wu Qiang He Xiao Liu Yun Yang Self-Adaptive K-Means Based on a Covering Algorithm Complexity
title	Self-Adaptive K-Means Based on a Covering Algorithm
title_full	Self-Adaptive K-Means Based on a Covering Algorithm
title_fullStr	Self-Adaptive K-Means Based on a Covering Algorithm
title_full_unstemmed	Self-Adaptive K-Means Based on a Covering Algorithm
title_short	Self-Adaptive K-Means Based on a Covering Algorithm
title_sort	self adaptive k means based on a covering algorithm
url	http://dx.doi.org/10.1155/2018/7698274
work_keys_str_mv	AT yiwenzhang selfadaptivekmeansbasedonacoveringalgorithm AT yuanyuanzhou selfadaptivekmeansbasedonacoveringalgorithm AT xingguo selfadaptivekmeansbasedonacoveringalgorithm AT jintaowu selfadaptivekmeansbasedonacoveringalgorithm AT qianghe selfadaptivekmeansbasedonacoveringalgorithm AT xiaoliu selfadaptivekmeansbasedonacoveringalgorithm AT yunyang selfadaptivekmeansbasedonacoveringalgorithm

Self-Adaptive K-Means Based on a Covering Algorithm

Similar Items