Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers
This paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from v...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Future Internet |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-5903/17/1/38 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588418937257984 |
---|---|
author | Daniela Gifu Covaci Silviu-Vasile |
author_facet | Daniela Gifu Covaci Silviu-Vasile |
author_sort | Daniela Gifu |
collection | DOAJ |
description | This paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from various large language models (LLMs), ensuring a diverse and representative corpus. Methodologically, we fine-tuned several transformer architectures, including BERT, RoBERTa, and DistilBERT, on this dataset to distinguish between human-written and AI-generated text. Our experiments examined both monolingual and multilingual settings, evaluating the model’s performance across diverse datasets such as M4, AICrowd, Indonesian Hoax News Detection, TURNBACKHOAX, and ERH. The results demonstrate that RoBERTa-large achieved superior accuracy and F-scores of around 83%, particularly in monolingual contexts, while DistilBERT-multilingual-cased excelled in multilingual scenarios, achieving accuracy and F-scores of around 72%. This study contributes a refined dataset and provides insights into model performance, highlighting the transformative potential of transformer models in detecting AI-generated content. |
format | Article |
id | doaj-art-6f100b6f83834dbc896d7424b5588c9d |
institution | Kabale University |
issn | 1999-5903 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Future Internet |
spelling | doaj-art-6f100b6f83834dbc896d7424b5588c9d2025-01-24T13:33:38ZengMDPI AGFuture Internet1999-59032025-01-011713810.3390/fi17010038Artificial Intelligence vs. Human: Decoding Text Authenticity with TransformersDaniela Gifu0Covaci Silviu-Vasile1Institute of Computer Science, Romanian Academy—Iași Branch, Codrescu 2, 700481 Iași, RomaniaGeorge Emil Palade University of Medicine, Pharmacy, Science, and Technology of Târgu Mureș, Gheorghe Marinescu 38, 540142 Târgu Mureș, RomaniaThis paper presents a comprehensive study on detecting AI-generated text using transformer models. Our research extends the existing RODICA dataset to create the Enhanced RODICA for Human-Authored and AI-Generated Text (ERH) dataset. We enriched RODICA by incorporating machine-generated texts from various large language models (LLMs), ensuring a diverse and representative corpus. Methodologically, we fine-tuned several transformer architectures, including BERT, RoBERTa, and DistilBERT, on this dataset to distinguish between human-written and AI-generated text. Our experiments examined both monolingual and multilingual settings, evaluating the model’s performance across diverse datasets such as M4, AICrowd, Indonesian Hoax News Detection, TURNBACKHOAX, and ERH. The results demonstrate that RoBERTa-large achieved superior accuracy and F-scores of around 83%, particularly in monolingual contexts, while DistilBERT-multilingual-cased excelled in multilingual scenarios, achieving accuracy and F-scores of around 72%. This study contributes a refined dataset and provides insights into model performance, highlighting the transformative potential of transformer models in detecting AI-generated content.https://www.mdpi.com/1999-5903/17/1/38large language modelsnatural language processingcontent creationtext authenticity |
spellingShingle | Daniela Gifu Covaci Silviu-Vasile Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers Future Internet large language models natural language processing content creation text authenticity |
title | Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers |
title_full | Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers |
title_fullStr | Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers |
title_full_unstemmed | Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers |
title_short | Artificial Intelligence vs. Human: Decoding Text Authenticity with Transformers |
title_sort | artificial intelligence vs human decoding text authenticity with transformers |
topic | large language models natural language processing content creation text authenticity |
url | https://www.mdpi.com/1999-5903/17/1/38 |
work_keys_str_mv | AT danielagifu artificialintelligencevshumandecodingtextauthenticitywithtransformers AT covacisilviuvasile artificialintelligencevshumandecodingtextauthenticitywithtransformers |