Abstractive Text Summarization in Arabic-Like Script Using Multi-Encoder Architecture and Semantic Extraction Techniques

In the field of Natural Language Processing (NLP), the task of text summarization plays a vital role in understanding textual content and producing concise summaries. Text summarization approaches can be categorized as either extractive or abstractive, with the latter largely unexplored in the Arabi...

Full description

Saved in:
Bibliographic Details
Main Authors: Wajiha Fatima, Syed Saqib Raza Rizvi, Taher M. Ghazal, Qasem M. Kharma, Munir Ahmad, Sagheer Abbas, Muhammad Furqan, Khan Muhammad Adnan
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11020615/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In the field of Natural Language Processing (NLP), the task of text summarization plays a vital role in understanding textual content and producing concise summaries. Text summarization approaches can be categorized as either extractive or abstractive, with the latter largely unexplored in the Arabic script languages. Previous research on abstractive summarization for the Urdu language has made progress in creating logical and succinct summaries. Still, it has failed to preserve the underlying semantics, often resulting in inconsistent and ambiguous outputs. This study focuses on designing and implementing an abstractive text summarization model for the Urdu language, aiming to capture the semantic meaning of the original text. The proposed model improves readability and comprehension by generating concise and meaningful overviews of original content while introducing new words and expressions. The study implements a multi-layer transformer encoder as part of the mBART (Multilingual BART) model. The encoder consists of multiple stacked transformer layers. It utilizes self-attention mechanisms to process and understand the input sequences effectively, achieving a BERTScore of 90%, a BLEU score of 43%, and ROUGE scores of 80.3% for ROUGE-1, 74.3% for ROUGE-2, and 80.3% for ROUGE-L. These results demonstrate the efficacy of the model in producing coherent and high-quality abstractive summaries for the Urdu language.
ISSN:2169-3536