Active Learning for Constrained Document Clustering with Uncertainty Region

Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to t...

Full description

Saved in:
Bibliographic Details
Main Authors: M. A. Balafar, R. Hazratgholizadeh, M. R. F. Derakhshi
Format: Article
Language:English
Published: Wiley 2020-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2020/3207306
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832547169467367424
author M. A. Balafar
R. Hazratgholizadeh
M. R. F. Derakhshi
author_facet M. A. Balafar
R. Hazratgholizadeh
M. R. F. Derakhshi
author_sort M. A. Balafar
collection DOAJ
description Constrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.
format Article
id doaj-art-a93bc96c11204491a3f0ede7140cd725
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2020-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-a93bc96c11204491a3f0ede7140cd7252025-02-03T06:45:59ZengWileyComplexity1076-27871099-05262020-01-01202010.1155/2020/32073063207306Active Learning for Constrained Document Clustering with Uncertainty RegionM. A. Balafar0R. Hazratgholizadeh1M. R. F. Derakhshi2Department of IT, Faculty of Engineering, University of Tabriz, Tabriz, IranDepartment of IT, Faculty of Engineering, University of Tabriz, Tabriz, IranDepartment of Computer, Faculty of Engineering, University of Tabriz, Tabriz, IranConstrained clustering is intended to improve accuracy and personalization based on the constraints expressed by an Oracle. In this paper, a new constrained clustering algorithm is proposed and some of the informative data pairs are selected during an iterative process. Then, they are presented to the Oracle and their relation is answered with “Must-link (ML) or Cannot-link (CL).” In each iteration, first, the support vector machine (SVM) is utilized based on the label produced by the current clustering. According to the distance of each document from the hyperplane, the distance matrix is created. Also, based on cosine similarity of word2vector of each document, the similarity matrix is created. Two types of probability (similarity and degree of similarity) are calculated and they are smoothed for belonging to neighborhoods. Neighborhoods form the samples that are labeled by Oracle, to be in the same cluster. Finally, at the end of each iteration, the data with a greater level of uncertainty (in term of probability) is selected for questioning the oracle. In order to evaluate, the proposed method is compared with famous state-of-the-art methods based on two criteria and over a standard dataset. The result demonstrates an increased accuracy and stability of the obtained result with fewer questions.http://dx.doi.org/10.1155/2020/3207306
spellingShingle M. A. Balafar
R. Hazratgholizadeh
M. R. F. Derakhshi
Active Learning for Constrained Document Clustering with Uncertainty Region
Complexity
title Active Learning for Constrained Document Clustering with Uncertainty Region
title_full Active Learning for Constrained Document Clustering with Uncertainty Region
title_fullStr Active Learning for Constrained Document Clustering with Uncertainty Region
title_full_unstemmed Active Learning for Constrained Document Clustering with Uncertainty Region
title_short Active Learning for Constrained Document Clustering with Uncertainty Region
title_sort active learning for constrained document clustering with uncertainty region
url http://dx.doi.org/10.1155/2020/3207306
work_keys_str_mv AT mabalafar activelearningforconstraineddocumentclusteringwithuncertaintyregion
AT rhazratgholizadeh activelearningforconstraineddocumentclusteringwithuncertaintyregion
AT mrfderakhshi activelearningforconstraineddocumentclusteringwithuncertaintyregion