Clustering Algorithm by Boundary Detection Base on Entropy of KNN

Clustering analysis has been widely applied in various fields, and boundary detection based clustering algorithms have shown effective performance. In this work, we propose a clustering algorithm by boundary detection based on entropy of KNN (CBDEK). A border point contains only the nearest neighbor...

Full description

Saved in:
Bibliographic Details
Main Authors: Jiaman Ding, Jinyuan Yin, Lianyin Jia, Xiaodong Fu, Hongbin Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10945327/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Clustering analysis has been widely applied in various fields, and boundary detection based clustering algorithms have shown effective performance. In this work, we propose a clustering algorithm by boundary detection based on entropy of KNN (CBDEK). A border point contains only the nearest neighbors within a specific directional range. Thus, we define entropy of KNN (EK) to accurately identify the boundary of the cluster. Since entropy has the property of measuring uncertainty, it can be used to quantify the possibility that a point is a border point. A lower EK indicates an uneven neighbor distribution, increasing the possibility of the point being a border point. Then, the border points are clustered based on the directional similarity of their nearest neighbors. Specifically, if a border point is a neighbor of another border point and most of their nearest neighbors show directional similarity, they are considered to belong to the same cluster. Furthermore, we assign the label of a border point to the interior points located within the maximum nearest neighbor sub-block of this border point to facilitate an efficient allocation for the remaining points (interior points). In addition, our algorithm incorporates noise mitigation techniques using average distance and box plot analysis. The effectiveness of CBDEK is proven by a comparative evaluation of nine algorithms on 24 datasets.
ISSN:2169-3536