Active Learning for Constrained Document Clustering with Uncertainty Region

Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to t...

Full description

Saved in:

Bibliographic Details
Main Authors:	M. A. Balafar, R. Hazratgholizadeh, M. R. F. Derakhshi
Format:	Article
Language:	English
Published:	Wiley 2020-01-01
Series:	Complexity
Online Access:	http://dx.doi.org/10.1155/2020/3207306
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832547169467367424
author	M. A. Balafar R. Hazratgholizadeh M. R. F. Derakhshi
author_facet	M. A. Balafar R. Hazratgholizadeh M. R. F. Derakhshi
author_sort	M. A. Balafar
collection	DOAJ
description	Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.
format	Article
id	doaj-art-a93bc96c11204491a3f0ede7140cd725
institution	Kabale University
issn	1076-2787 1099-0526
language	English
publishDate	2020-01-01
publisher	Wiley
record_format	Article
series	Complexity
spelling	doaj-art-a93bc96c11204491a3f0ede7140cd7252025-02-03T06:45:59ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/32073063207306Active Learning for Constrained Document Clustering with Uncertainty RegionM. A. Balafar0R. Hazratgholizadeh1M. R. F. Derakhshi2Department of IT, Faculty of Engineering, University of Tabriz, Tabriz, IranDepartment of IT, Faculty of Engineering, University of Tabriz, Tabriz, IranDepartment of Computer, Faculty of Engineering, University of Tabriz, Tabriz, IranConstrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.http://dx.doi.org/10.1155/2020/3207306
spellingShingle	M. A. Balafar R. Hazratgholizadeh M. R. F. Derakhshi Active Learning for Constrained Document Clustering with Uncertainty Region Complexity
title	Active Learning for Constrained Document Clustering with Uncertainty Region
title_full	Active Learning for Constrained Document Clustering with Uncertainty Region
title_fullStr	Active Learning for Constrained Document Clustering with Uncertainty Region
title_full_unstemmed	Active Learning for Constrained Document Clustering with Uncertainty Region
title_short	Active Learning for Constrained Document Clustering with Uncertainty Region
title_sort	active learning for constrained document clustering with uncertainty region
url	http://dx.doi.org/10.1155/2020/3207306
work_keys_str_mv	AT mabalafar activelearningforconstraineddocumentclusteringwithuncertaintyregion AT rhazratgholizadeh activelearningforconstraineddocumentclusteringwithuncertaintyregion AT mrfderakhshi activelearningforconstraineddocumentclusteringwithuncertaintyregion

Active Learning for Constrained Document Clustering with Uncertainty Region

Similar Items