Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language
Paraphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to...
Saved in:
| Main Authors: | , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-04-01
|
| Series: | Applied Sciences |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2076-3417/15/8/4139 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850183396454563840 |
|---|---|
| author | Haya Rabih Alsulami Amal Abdullah Almansour |
| author_facet | Haya Rabih Alsulami Amal Abdullah Almansour |
| author_sort | Haya Rabih Alsulami |
| collection | DOAJ |
| description | Paraphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to generate new paraphrased text, and the generative method is commonly used for text generation. Generative Pre-trained Transformer (GPT) models have demonstrated effectiveness in various text generation tasks, including summarization, proofreading, and rephrasing of English texts. However, GPT-4’s capabilities in Arabic paraphrase generation have not been extensively studied despite Arabic being one of the most widely spoken languages. In this paper, the researchers evaluate the capabilities of GPT-4 in text paraphrasing for Arabic. Furthermore, the paper presents a comprehensive evaluation method for paraphrase quality and developing a detailed framework for evaluation. The framework comprises Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Lexical Diversity (LD), Jaccard similarity, and word embedding using the Arabic Bi-directional Encoder Representation from Transformers (AraBERT) model with cosine and Euclidean similarity. This paper illustrates that GPT-4 can effectively produce a new paraphrased sentence that is semantically equivalent to the original sentence, and the quality framework efficiently ranks paraphrased pairs according to quality criteria. |
| format | Article |
| id | doaj-art-b0e336eb9f0345d2b7f8f4ea3a11f11f |
| institution | OA Journals |
| issn | 2076-3417 |
| language | English |
| publishDate | 2025-04-01 |
| publisher | MDPI AG |
| record_format | Article |
| series | Applied Sciences |
| spelling | doaj-art-b0e336eb9f0345d2b7f8f4ea3a11f11f2025-08-20T02:17:21ZengMDPI AGApplied Sciences2076-34172025-04-01158413910.3390/app15084139Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic LanguageHaya Rabih Alsulami0Amal Abdullah Almansour1Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaDepartment of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi ArabiaParaphrasing means expressing the semantic meaning of a text using different words. Paraphrasing has a significant impact on numerous Natural Language Processing (NLP) applications, such as Machine Translation (MT) and Question Answering (QA). Machine Learning (ML) methods are frequently employed to generate new paraphrased text, and the generative method is commonly used for text generation. Generative Pre-trained Transformer (GPT) models have demonstrated effectiveness in various text generation tasks, including summarization, proofreading, and rephrasing of English texts. However, GPT-4’s capabilities in Arabic paraphrase generation have not been extensively studied despite Arabic being one of the most widely spoken languages. In this paper, the researchers evaluate the capabilities of GPT-4 in text paraphrasing for Arabic. Furthermore, the paper presents a comprehensive evaluation method for paraphrase quality and developing a detailed framework for evaluation. The framework comprises Bilingual Evaluation Understudy (BLEU), Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Lexical Diversity (LD), Jaccard similarity, and word embedding using the Arabic Bi-directional Encoder Representation from Transformers (AraBERT) model with cosine and Euclidean similarity. This paper illustrates that GPT-4 can effectively produce a new paraphrased sentence that is semantically equivalent to the original sentence, and the quality framework efficiently ranks paraphrased pairs according to quality criteria.https://www.mdpi.com/2076-3417/15/8/4139ArabicNLPGPT-4AraBERTsemantic similarityparaphrasing |
| spellingShingle | Haya Rabih Alsulami Amal Abdullah Almansour Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language Applied Sciences Arabic NLP GPT-4 AraBERT semantic similarity paraphrasing |
| title | Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language |
| title_full | Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language |
| title_fullStr | Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language |
| title_full_unstemmed | Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language |
| title_short | Exploring GPT-4 Capabilities in Generating Paraphrased Sentences for the Arabic Language |
| title_sort | exploring gpt 4 capabilities in generating paraphrased sentences for the arabic language |
| topic | Arabic NLP GPT-4 AraBERT semantic similarity paraphrasing |
| url | https://www.mdpi.com/2076-3417/15/8/4139 |
| work_keys_str_mv | AT hayarabihalsulami exploringgpt4capabilitiesingeneratingparaphrasedsentencesforthearabiclanguage AT amalabdullahalmansour exploringgpt4capabilitiesingeneratingparaphrasedsentencesforthearabiclanguage |