Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset

Withthe rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This article introduces a new framework for classifyin...

Full description

Saved in:
Bibliographic Details
Main Authors: Luigi Serreli, Claudio Marche, Michele Nitti
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Open Journal of the Computer Society
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10806791/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832592843733991424
author Luigi Serreli
Claudio Marche
Michele Nitti
author_facet Luigi Serreli
Claudio Marche
Michele Nitti
author_sort Luigi Serreli
collection DOAJ
description Withthe rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This article introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The article also presents a new dataset of 60 000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55%, respectively.
format Article
id doaj-art-2a41c73583a54d43b499be4c7bbf2bcf
institution Kabale University
issn 2644-1268
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Open Journal of the Computer Society
spelling doaj-art-2a41c73583a54d43b499be4c7bbf2bcf2025-01-21T00:02:38ZengIEEEIEEE Open Journal of the Computer Society2644-12682025-01-01615316410.1109/OJCS.2024.351974710806791Reducing Data Volume in News Topic Classification: Deep Learning Framework and DatasetLuigi Serreli0https://orcid.org/0000-0002-8793-9015Claudio Marche1https://orcid.org/0000-0002-1017-9046Michele Nitti2https://orcid.org/0000-0002-7832-7121Department of Electrical and Electronic Engineering (DIEE), University of Cagliari, Cagliari, ItalyDepartment of Electrical and Electronic Engineering (DIEE), University of Cagliari, Cagliari, ItalyDepartment of Electrical and Electronic Engineering (DIEE), University of Cagliari, Cagliari, ItalyWiththe rise of smart devices and technological advancements, accessing vast amounts of information has become easier than ever before. However, sorting and categorising such an overwhelming volume of content has become increasingly challenging. This article introduces a new framework for classifying news articles based on a Bidirectional LSTM (BiLSTM) network and an attention mechanism. The article also presents a new dataset of 60 000 news articles from various global sources. Furthermore, it proposes a methodology for reducing data volume by extracting key sentences using an algorithm resulting in inference times that are, on average, 50% shorter than the original document without compromising the system's accuracy. Experimental evaluations demonstrate that our framework outperforms existing methodologies in terms of accuracy. Our system's accuracy has been compared with various works using two popular datasets, AG News and BBC News, and has achieved excellent results of 99.7% and 94.55%, respectively.https://ieeexplore.ieee.org/document/10806791/Data volumedeep learningnatural language processingtopic classification
spellingShingle Luigi Serreli
Claudio Marche
Michele Nitti
Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
IEEE Open Journal of the Computer Society
Data volume
deep learning
natural language processing
topic classification
title Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
title_full Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
title_fullStr Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
title_full_unstemmed Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
title_short Reducing Data Volume in News Topic Classification: Deep Learning Framework and Dataset
title_sort reducing data volume in news topic classification deep learning framework and dataset
topic Data volume
deep learning
natural language processing
topic classification
url https://ieeexplore.ieee.org/document/10806791/
work_keys_str_mv AT luigiserreli reducingdatavolumeinnewstopicclassificationdeeplearningframeworkanddataset
AT claudiomarche reducingdatavolumeinnewstopicclassificationdeeplearningframeworkanddataset
AT michelenitti reducingdatavolumeinnewstopicclassificationdeeplearningframeworkanddataset