Label iteration-based clustering ensemble algorithm

The existing training strategies for clustering ensemble algorithm are generally conducted based on the same data and different base clustering algorithms and commonly have the limitations of low performance for large-scale data and weak adaptability of consensus function. To address these problems,...

Full description

Saved in:

Bibliographic Details
Main Authors:	HE Yulin, YANG Jin, HUANG Zhexue, YIN Jianfei
Format:	Article
Language:	zho
Published:	POSTS&TELECOM PRESS Co., LTD 2024-12-01
Series:	智能科学与技术学报
Subjects:	clustering ensemble algorithm ensemble learning random sample partition maximum mean discrepancy label iteration
Online Access:	http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202443/
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832586367849201664
author	HE Yulin YANG Jin HUANG Zhexue YIN Jianfei
author_facet	HE Yulin YANG Jin HUANG Zhexue YIN Jianfei
author_sort	HE Yulin
collection	DOAJ
description	The existing training strategies for clustering ensemble algorithm are generally conducted based on the same data and different base clustering algorithms and commonly have the limitations of low performance for large-scale data and weak adaptability of consensus function. To address these problems, this paper proposed a label iteration-based clustering ensemble (LICE) algorithm which was developed based on the training strategy for clustering ensemble algorithm of different data and same base clustering algorithm. Firstly, multiple base clusterings were trained based on the random sample partition (RSP) data blocks. Secondly, the base clustering results with same cluster numbers were fused with maximum mean discrepancy criterion and then a heuristic classifier was trained based on the RSP data blocks with labels. Thirdly, the sample points without labels were labeled with heuristic classifier which was iteratively enhanced with the labeled sample points having the consistent labeling for clustering and classification. Finally, a series of persuasive experiments were conducted to validate the feasibility and effectiveness of LICE algorithm. The experimental results showed that the normalized mutual information, adjusted Rand index, Fowlkes-Mallows index and purity of LICE algorithm increased by 17.23%, 16.75%, 31.29%, and 12.37% on average at the 5th iteration compared to the initial iteration and these four indexes increased by 11.76%, 16.50%, 9.36%, and 14.20% on average for the representative datasets in comparison with seven state-of-the-art clustering ensemble algorithms and thus demonstrate that LICE algorithm is an efficient and reasonable clustering ensemble algorithm with the potential to handle large-scale data clustering problems.
format	Article
id	doaj-art-c20f8ff3a41c4e24ab65349272744dad
institution	Kabale University
issn	2096-6652
language	zho
publishDate	2024-12-01
publisher	POSTS&TELECOM PRESS Co., LTD
record_format	Article
series	智能科学与技术学报
spelling	doaj-art-c20f8ff3a41c4e24ab65349272744dad2025-01-25T19:00:50ZzhoPOSTS&TELECOM PRESS Co., LTD智能科学与技术学报2096-66522024-12-01646647981046392Label iteration-based clustering ensemble algorithmHE YulinYANG JinHUANG ZhexueYIN JianfeiThe existing training strategies for clustering ensemble algorithm are generally conducted based on the same data and different base clustering algorithms and commonly have the limitations of low performance for large-scale data and weak adaptability of consensus function. To address these problems, this paper proposed a label iteration-based clustering ensemble (LICE) algorithm which was developed based on the training strategy for clustering ensemble algorithm of different data and same base clustering algorithm. Firstly, multiple base clusterings were trained based on the random sample partition (RSP) data blocks. Secondly, the base clustering results with same cluster numbers were fused with maximum mean discrepancy criterion and then a heuristic classifier was trained based on the RSP data blocks with labels. Thirdly, the sample points without labels were labeled with heuristic classifier which was iteratively enhanced with the labeled sample points having the consistent labeling for clustering and classification. Finally, a series of persuasive experiments were conducted to validate the feasibility and effectiveness of LICE algorithm. The experimental results showed that the normalized mutual information, adjusted Rand index, Fowlkes-Mallows index and purity of LICE algorithm increased by 17.23%, 16.75%, 31.29%, and 12.37% on average at the 5th iteration compared to the initial iteration and these four indexes increased by 11.76%, 16.50%, 9.36%, and 14.20% on average for the representative datasets in comparison with seven state-of-the-art clustering ensemble algorithms and thus demonstrate that LICE algorithm is an efficient and reasonable clustering ensemble algorithm with the potential to handle large-scale data clustering problems.http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202443/clustering ensemble algorithmensemble learningrandom sample partitionmaximum mean discrepancylabel iteration
spellingShingle	HE Yulin YANG Jin HUANG Zhexue YIN Jianfei Label iteration-based clustering ensemble algorithm 智能科学与技术学报 clustering ensemble algorithm ensemble learning random sample partition maximum mean discrepancy label iteration
title	Label iteration-based clustering ensemble algorithm
title_full	Label iteration-based clustering ensemble algorithm
title_fullStr	Label iteration-based clustering ensemble algorithm
title_full_unstemmed	Label iteration-based clustering ensemble algorithm
title_short	Label iteration-based clustering ensemble algorithm
title_sort	label iteration based clustering ensemble algorithm
topic	clustering ensemble algorithm ensemble learning random sample partition maximum mean discrepancy label iteration
url	http://www.cjist.com.cn/zh/article/doi/10.11959/j.issn.2096-6652.202443/
work_keys_str_mv	AT heyulin labeliterationbasedclusteringensemblealgorithm AT yangjin labeliterationbasedclusteringensemblealgorithm AT huangzhexue labeliterationbasedclusteringensemblealgorithm AT yinjianfei labeliterationbasedclusteringensemblealgorithm

Label iteration-based clustering ensemble algorithm

Similar Items