Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language

Paraphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to...

Full description

Saved in:
Bibliographic Details
Main Authors: Haya Rabih Alsulami, Amal Abdullah Almansour
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/8/4139
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850183396454563840
author Haya Rabih Alsulami
Amal Abdullah Almansour
author_facet Haya Rabih Alsulami
Amal Abdullah Almansour
author_sort Haya Rabih Alsulami
collection DOAJ
description Paraphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to generate new paraphrased text, and the generative method is commonly used for text generation. Generative Pre-trained Transformer (GPT) models have demonstrated effectiveness in various text generation tasks, including summarization, proofreading, and rephrasing of English texts. However, GPT-4’s capabilities in Arabic paraphrase generation have not been extensively studied despite Arabic being one of the most widely spoken languages. In this paper, the researchers evaluate the capabilities of GPT-4 in text paraphrasing for Arabic. Furthermore, the paper presents a comprehensive evaluation method for paraphrase quality and developing a detailed framework for evaluation. The framework comprises Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Lexical Diversity (LD), Jaccard similarity, and word embedding using the Arabic Bi-directional Encoder Representation from Transformers (AraBERT) model with cosine and Euclidean similarity. This paper illustrates that GPT-4 can effectively produce a new paraphrased sentence that is semantically equivalent to the original sentence, and the quality framework efficiently ranks paraphrased pairs according to quality criteria.
format Article
id doaj-art-b0e336eb9f0345d2b7f8f4ea3a11f11f
institution OA Journals
issn 2076-3417
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj-art-b0e336eb9f0345d2b7f8f4ea3a11f11f2025-08-20T02:17:21ZengMDPI AGApplied Sciences2076-34172025-04-01158413910.3390/app15084139Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic LanguageHaya Rabih Alsulami0Amal Abdullah Almansour1Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaDepartment of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaParaphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to generate new paraphrased text, and the generative method is commonly used for text generation. Generative Pre-trained Transformer (GPT) models have demonstrated effectiveness in various text generation tasks, including summarization, proofreading, and rephrasing of English texts. However, GPT-4’s capabilities in Arabic paraphrase generation have not been extensively studied despite Arabic being one of the most widely spoken languages. In this paper, the researchers evaluate the capabilities of GPT-4 in text paraphrasing for Arabic. Furthermore, the paper presents a comprehensive evaluation method for paraphrase quality and developing a detailed framework for evaluation. The framework comprises Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Lexical Diversity (LD), Jaccard similarity, and word embedding using the Arabic Bi-directional Encoder Representation from Transformers (AraBERT) model with cosine and Euclidean similarity. This paper illustrates that GPT-4 can effectively produce a new paraphrased sentence that is semantically equivalent to the original sentence, and the quality framework efficiently ranks paraphrased pairs according to quality criteria.https://www.mdpi.com/2076-3417/15/8/4139ArabicNLPGPT-4AraBERTsemantic similarityparaphrasing
spellingShingle Haya Rabih Alsulami
Amal Abdullah Almansour
Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
Applied Sciences
Arabic
NLP
GPT-4
AraBERT
semantic similarity
paraphrasing
title Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
title_full Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
title_fullStr Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
title_full_unstemmed Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
title_short Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
title_sort exploring gpt 4 capabilities in generating paraphrased sentences for the arabic language
topic Arabic
NLP
GPT-4
AraBERT
semantic similarity
paraphrasing
url https://www.mdpi.com/2076-3417/15/8/4139
work_keys_str_mv AT hayarabihalsulami exploringgpt4capabilitiesingeneratingparaphrasedsentencesforthearabiclanguage
AT amalabdullahalmansour exploringgpt4capabilitiesingeneratingparaphrasedsentencesforthearabiclanguage