Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence

Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases....

Full description

Saved in:
Bibliographic Details
Main Authors: Abdullah, Muhammad Ateeb Ather, Olga Kolesnikova, Grigori Sidorov
Format: Article
Language:English
Published: MDPI AG 2025-07-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/9/7/190
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849303557983961088
author Abdullah
Muhammad Ateeb Ather
Olga Kolesnikova
Grigori Sidorov
author_facet Abdullah
Muhammad Ateeb Ather
Olga Kolesnikova
Grigori Sidorov
author_sort Abdullah
collection DOAJ
description Detecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired <i>t</i>-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems.
format Article
id doaj-art-4ab7ba6177aa47b08d1a1497c5cd4a2d
institution Kabale University
issn 2504-2289
language English
publishDate 2025-07-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj-art-4ab7ba6177aa47b08d1a1497c5cd4a2d2025-08-20T03:58:25ZengMDPI AGBig Data and Cognitive Computing2504-22892025-07-019719010.3390/bdcc9070190Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial IntelligenceAbdullah0Muhammad Ateeb Ather1Olga Kolesnikova2Grigori Sidorov3Centro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City 07738, MexicoDepartment of Computer Science, Bahria University Lahore Campus, Lahore 54600, PakistanCentro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City 07738, MexicoCentro de Investigación en Computación, Instituto Politécnico Nacional, Mexico City 07738, MexicoDetecting biased language in large-scale corpora, such as the Wiki Neutrality Corpus, is essential for promoting neutrality in digital content. This study systematically evaluates a range of machine learning (ML) and deep learning (DL) models for the detection of biased and pre-conditioned phrases. Conventional classifiers, including Extreme Gradient Boosting (XGBoost), Light Gradient-Boosting Machine (LightGBM), and Categorical Boosting (CatBoost), are compared with advanced neural architectures such as Bidirectional Encoder Representations from Transformers (BERT), Long Short-Term Memory (LSTM) networks, and Generative Adversarial Networks (GANs). A novel hybrid architecture is proposed, integrating DistilBERT, LSTM, and GANs within a unified framework. Extensive experimentation with intermediate variants DistilBERT + LSTM (without GAN) and DistilBERT + GAN (without LSTM) demonstrates that the fully integrated model consistently outperforms all alternatives. The proposed hybrid model achieves a cross-validation accuracy of 99.00%, significantly surpassing traditional baselines such as XGBoost (96.73%) and LightGBM (96.83%). It also exhibits superior stability, statistical significance (paired <i>t</i>-tests), and favorable trade-offs between performance and computational efficiency. The results underscore the potential of hybrid deep learning models for capturing subtle linguistic bias and advancing more objective and reliable automated content moderation systems.https://www.mdpi.com/2504-2289/9/7/190bias detectionmachine learningdeep learningtext analysisneutralityBERT
spellingShingle Abdullah
Muhammad Ateeb Ather
Olga Kolesnikova
Grigori Sidorov
Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
Big Data and Cognitive Computing
bias detection
machine learning
deep learning
text analysis
neutrality
BERT
title Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
title_full Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
title_fullStr Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
title_full_unstemmed Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
title_short Detection of Biased Phrases in the Wiki Neutrality Corpus for Fairer Digital Content Management Using Artificial Intelligence
title_sort detection of biased phrases in the wiki neutrality corpus for fairer digital content management using artificial intelligence
topic bias detection
machine learning
deep learning
text analysis
neutrality
BERT
url https://www.mdpi.com/2504-2289/9/7/190
work_keys_str_mv AT abdullah detectionofbiasedphrasesinthewikineutralitycorpusforfairerdigitalcontentmanagementusingartificialintelligence
AT muhammadateebather detectionofbiasedphrasesinthewikineutralitycorpusforfairerdigitalcontentmanagementusingartificialintelligence
AT olgakolesnikova detectionofbiasedphrasesinthewikineutralitycorpusforfairerdigitalcontentmanagementusingartificialintelligence
AT grigorisidorov detectionofbiasedphrasesinthewikineutralitycorpusforfairerdigitalcontentmanagementusingartificialintelligence