Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solu...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-01-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/16/1/49 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832588357255823360 |
---|---|
author | Tymon Lesław Żarski Artur Janicki |
author_facet | Tymon Lesław Żarski Artur Janicki |
author_sort | Tymon Lesław Żarski |
collection | DOAJ |
description | Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments. |
format | Article |
id | doaj-art-8e7ae36b35cf4fe7bf081265f231056c |
institution | Kabale University |
issn | 2078-2489 |
language | English |
publishDate | 2025-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Information |
spelling | doaj-art-8e7ae36b35cf4fe7bf081265f231056c2025-01-24T13:35:17ZengMDPI AGInformation2078-24892025-01-011614910.3390/info16010049Enhancing Privacy While Preserving Context in Text Transformations by Large Language ModelsTymon Lesław Żarski0Artur Janicki1Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandFaculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandData security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments.https://www.mdpi.com/2078-2489/16/1/49natural language processingnamed entity recognitionlarge language modelsdata securitydeep learning |
spellingShingle | Tymon Lesław Żarski Artur Janicki Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models Information natural language processing named entity recognition large language models data security deep learning |
title | Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models |
title_full | Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models |
title_fullStr | Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models |
title_full_unstemmed | Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models |
title_short | Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models |
title_sort | enhancing privacy while preserving context in text transformations by large language models |
topic | natural language processing named entity recognition large language models data security deep learning |
url | https://www.mdpi.com/2078-2489/16/1/49 |
work_keys_str_mv | AT tymonlesławzarski enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels AT arturjanicki enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels |