Abstractive Text Summarization in Arabic-Like Script Using Multi-Encoder Architecture and Semantic Extraction Techniques
In the field of Natural Language Processing (NLP), the task of text summarization plays a vital role in understanding textual content and producing concise summaries. Text summarization approaches can be categorized as either extractive or abstractive, with the latter largely unexplored in the Arabi...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11020615/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | In the field of Natural Language Processing (NLP), the task of text summarization plays a vital role in understanding textual content and producing concise summaries. Text summarization approaches can be categorized as either extractive or abstractive, with the latter largely unexplored in the Arabic script languages. Previous research on abstractive summarization for the Urdu language has made progress in creating logical and succinct summaries. Still, it has failed to preserve the underlying semantics, often resulting in inconsistent and ambiguous outputs. This study focuses on designing and implementing an abstractive text summarization model for the Urdu language, aiming to capture the semantic meaning of the original text. The proposed model improves readability and comprehension by generating concise and meaningful overviews of original content while introducing new words and expressions. The study implements a multi-layer transformer encoder as part of the mBART (Multilingual BART) model. The encoder consists of multiple stacked transformer layers. It utilizes self-attention mechanisms to process and understand the input sequences effectively, achieving a BERTScore of 90%, a BLEU score of 43%, and ROUGE scores of 80.3% for ROUGE-1, 74.3% for ROUGE-2, and 80.3% for ROUGE-L. These results demonstrate the efficacy of the model in producing coherent and high-quality abstractive summaries for the Urdu language. |
|---|---|
| ISSN: | 2169-3536 |