Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models

Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solu...

Full description

Saved in:
Bibliographic Details
Main Authors: Tymon Lesław Żarski, Artur Janicki
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/1/49
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832588357255823360
author Tymon Lesław Żarski
Artur Janicki
author_facet Tymon Lesław Żarski
Artur Janicki
author_sort Tymon Lesław Żarski
collection DOAJ
description Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments.
format Article
id doaj-art-8e7ae36b35cf4fe7bf081265f231056c
institution Kabale University
issn 2078-2489
language English
publishDate 2025-01-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-8e7ae36b35cf4fe7bf081265f231056c2025-01-24T13:35:17ZengMDPI AGInformation2078-24892025-01-011614910.3390/info16010049Enhancing Privacy While Preserving Context in Text Transformations by Large Language ModelsTymon Lesław Żarski0Artur Janicki1Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandFaculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandData security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments.https://www.mdpi.com/2078-2489/16/1/49natural language processingnamed entity recognitionlarge language modelsdata securitydeep learning
spellingShingle Tymon Lesław Żarski
Artur Janicki
Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
Information
natural language processing
named entity recognition
large language models
data security
deep learning
title Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_full Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_fullStr Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_full_unstemmed Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_short Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_sort enhancing privacy while preserving context in text transformations by large language models
topic natural language processing
named entity recognition
large language models
data security
deep learning
url https://www.mdpi.com/2078-2489/16/1/49
work_keys_str_mv AT tymonlesławzarski enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels
AT arturjanicki enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels