Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers

This paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from v...

Full description

Saved in:
Bibliographic Details
Main Authors: Daniela Gifu, Covaci Silviu-Vasile
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Future Internet
Subjects:
Online Access:https://www.mdpi.com/1999-5903/17/1/38
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588418937257984
author Daniela Gifu
Covaci Silviu-Vasile
author_facet Daniela Gifu
Covaci Silviu-Vasile
author_sort Daniela Gifu
collection DOAJ
description This paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from various large language models (LLMs), ensuring a diverse and representative corpus. Methodologically, we fine-tuned several transformer architectures, including BERT, RoBERTa, and DistilBERT, on this dataset to distinguish between human-written and AI-generated text. Our experiments examined both monolingual and multilingual settings, evaluating the model’s performance across diverse datasets such as M4, AICrowd, Indonesian Hoax News Detection, TURNBACKHOAX, and ERH. The results demonstrate that RoBERTa-large achieved superior accuracy and F-scores of around 83%, particularly in monolingual contexts, while DistilBERT-multilingual-cased excelled in multilingual scenarios, achieving accuracy and F-scores of around 72%. This study contributes a refined dataset and provides insights into model performance, highlighting the transformative potential of transformer models in detecting AI-generated content.
format Article
id doaj-art-6f100b6f83834dbc896d7424b5588c9d
institution Kabale University
issn 1999-5903
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Future Internet
spelling doaj-art-6f100b6f83834dbc896d7424b5588c9d2025-01-24T13:33:38ZengMDPI AGFuture Internet1999-59032025-01-011713810.3390/fi17010038Artificial Intelligence vs. Human: Decoding Text Authenticity with TransformersDaniela Gifu0Covaci Silviu-Vasile1Institute of Computer Science, Romanian Academy—Iași Branch, Codrescu 2, 700481 Iași, RomaniaGeorge Emil Palade University of Medicine, Pharmacy, Science, and Technology of Târgu Mureș, Gheorghe Marinescu 38, 540142 Târgu Mureș, RomaniaThis paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from various large language models (LLMs), ensuring a diverse and representative corpus. Methodologically, we fine-tuned several transformer architectures, including BERT, RoBERTa, and DistilBERT, on this dataset to distinguish between human-written and AI-generated text. Our experiments examined both monolingual and multilingual settings, evaluating the model’s performance across diverse datasets such as M4, AICrowd, Indonesian Hoax News Detection, TURNBACKHOAX, and ERH. The results demonstrate that RoBERTa-large achieved superior accuracy and F-scores of around 83%, particularly in monolingual contexts, while DistilBERT-multilingual-cased excelled in multilingual scenarios, achieving accuracy and F-scores of around 72%. This study contributes a refined dataset and provides insights into model performance, highlighting the transformative potential of transformer models in detecting AI-generated content.https://www.mdpi.com/1999-5903/17/1/38large language modelsnatural language processingcontent creationtext authenticity
spellingShingle Daniela Gifu
Covaci Silviu-Vasile
Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
Future Internet
large language models
natural language processing
content creation
text authenticity
title Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
title_full Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
title_fullStr Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
title_full_unstemmed Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
title_short Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
title_sort artificial intelligence vs human decoding text authenticity with transformers
topic large language models
natural language processing
content creation
text authenticity
url https://www.mdpi.com/1999-5903/17/1/38
work_keys_str_mv AT danielagifu artificialintelligencevshumandecodingtextauthenticitywithtransformers
AT covacisilviuvasile artificialintelligencevshumandecodingtextauthenticitywithtransformers