Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models

Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solu...

Full description

Saved in:

Bibliographic Details
Main Authors:	Tymon Lesław Żarski, Artur Janicki
Format:	Article
Language:	English
Published:	MDPI AG 2025-01-01
Series:	Information
Subjects:	natural language processing named entity recognition large language models data security deep learning
Online Access:	https://www.mdpi.com/2078-2489/16/1/49
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832588357255823360
author	Tymon Lesław Żarski Artur Janicki
author_facet	Tymon Lesław Żarski Artur Janicki
author_sort	Tymon Lesław Żarski
collection	DOAJ
description	Data security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments.
format	Article
id	doaj-art-8e7ae36b35cf4fe7bf081265f231056c
institution	Kabale University
issn	2078-2489
language	English
publishDate	2025-01-01
publisher	MDPI AG
record_format	Article
series	Information
spelling	doaj-art-8e7ae36b35cf4fe7bf081265f231056c2025-01-24T13:35:17ZengMDPI AGInformation2078-24892025-01-011614910.3390/info16010049Enhancing Privacy While Preserving Context in Text Transformations by Large Language ModelsTymon Lesław Żarski0Artur Janicki1Faculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandFaculty of Electronics and Information Technology, Warsaw University of Technology, 00-665 Warsaw, PolandData security is a critical concern for Internet users, primarily as more people rely on social networks and online tools daily. Despite the convenience, many users are unaware of the risks posed to their sensitive and personal data. This study addresses this issue by presenting a comprehensive solution to prevent personal data leakage using online tools. We developed a conceptual solution that enhances user privacy by identifying and anonymizing named entity classes representing sensitive data while maintaining the original context by swapping source entities for functional data. Our approach utilizes natural language processing methods, combining machine learning tools such as MITIE and spaCy with rule-based text analysis. We employed regular expressions and large language models to anonymize text, preserving its context for further processing or enabling restoration to the original form after transformations. The results demonstrate the effectiveness of our custom-trained models, achieving an F1 score of 0.8292. Additionally, the proposed algorithms successfully preserved context in approximately 93.23% of test cases, indicating a promising solution for secure data handling in online environments.https://www.mdpi.com/2078-2489/16/1/49natural language processingnamed entity recognitionlarge language modelsdata securitydeep learning
spellingShingle	Tymon Lesław Żarski Artur Janicki Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models Information natural language processing named entity recognition large language models data security deep learning
title	Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_full	Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_fullStr	Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_full_unstemmed	Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_short	Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models
title_sort	enhancing privacy while preserving context in text transformations by large language models
topic	natural language processing named entity recognition large language models data security deep learning
url	https://www.mdpi.com/2078-2489/16/1/49
work_keys_str_mv	AT tymonlesławzarski enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels AT arturjanicki enhancingprivacywhilepreservingcontextintexttransformationsbylargelanguagemodels

Enhancing Privacy While Preserving Context in Text Transformations by Large Language Models

Similar Items