SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings

Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity b...

Full description

Saved in:
Bibliographic Details
Main Authors: Rawan N. Al-Matham, Hend S. Al-Khalifa
Format: Article
Language:English
Published: Wiley 2021-01-01
Series:Complexity
Online Access:http://dx.doi.org/10.1155/2021/6627434
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832566386131468288
author Rawan N. Al-Matham
Hend S. Al-Khalifa
author_facet Rawan N. Al-Matham
Hend S. Al-Khalifa
author_sort Rawan N. Al-Matham
collection DOAJ
description Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words. However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation. In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules. Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models. We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri. We also arranged for a manual evaluation by two Arabic linguists. The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone. SynoExtractor obtained a 0.605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.748 MAP for the Gigaword corpus with 25% improvement. SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP. Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.
format Article
id doaj-art-087d3c51dfb54b3f85bd0c8fe06f8676
institution Kabale University
issn 1076-2787
1099-0526
language English
publishDate 2021-01-01
publisher Wiley
record_format Article
series Complexity
spelling doaj-art-087d3c51dfb54b3f85bd0c8fe06f86762025-02-03T01:04:12ZengWileyComplexity1076-27871099-05262021-01-01202110.1155/2021/66274346627434SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word EmbeddingsRawan N. Al-Matham0Hend S. Al-Khalifa1Department of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 12371, Riyadh, Saudi ArabiaDepartment of Information Technology, College of Computer and Information Sciences, King Saud University, P.O. Box 12371, Riyadh, Saudi ArabiaAutomatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words. However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation. In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules. Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models. We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri. We also arranged for a manual evaluation by two Arabic linguists. The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone. SynoExtractor obtained a 0.605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.748 MAP for the Gigaword corpus with 25% improvement. SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP. Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.http://dx.doi.org/10.1155/2021/6627434
spellingShingle Rawan N. Al-Matham
Hend S. Al-Khalifa
SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
Complexity
title SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
title_full SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
title_fullStr SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
title_full_unstemmed SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
title_short SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
title_sort synoextractor a novel pipeline for arabic synonym extraction using word2vec word embeddings
url http://dx.doi.org/10.1155/2021/6627434
work_keys_str_mv AT rawannalmatham synoextractoranovelpipelineforarabicsynonymextractionusingword2vecwordembeddings
AT hendsalkhalifa synoextractoranovelpipelineforarabicsynonymextractionusingword2vecwordembeddings