Task-Oriented Adversarial Attacks for Aspect-Based Sentiment Analysis Models

Adversarial attacks deliberately modify deep learning inputs, mislead models, and cause incorrect results. Previous adversarial attacks on sentiment analysis models have demonstrated success in misleading these models. However, most existing attacks in sentiment analysis have applied a generalized a...

Full description

Saved in:
Bibliographic Details
Main Authors: Monserrat Vázquez-Hernández, Ignacio Algredo-Badillo, Luis Villaseñor-Pineda, Mariana Lobato-Báez, Juan Carlos Lopez-Pimentel, Luis Alberto Morales-Rosales
Format: Article
Language:English
Published: MDPI AG 2025-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/15/2/855
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Adversarial attacks deliberately modify deep learning inputs, mislead models, and cause incorrect results. Previous adversarial attacks on sentiment analysis models have demonstrated success in misleading these models. However, most existing attacks in sentiment analysis have applied a generalized approach to input modifications, without considering the characteristics and objectives of the different analysis levels. Specifically, for aspect-based sentiment analysis, there is a lack of attack methods that modify inputs in accordance with the evaluated aspects. Consequently, unnecessary modifications are made, compromising the input semantics, making the changes more detectable, and avoiding the identification of new vulnerabilities. In previous work, we proposed a model to generate adversarial examples in particular for aspect-based sentiment analysis. In this paper, we assess the effectiveness of our adversarial example model in negatively impacting aspect-based model results while maintaining high levels of semantic inputs. To conduct this evaluation, we propose diverse adversarial attacks across different dataset domains, target architectures, and consider distinct levels of victim model knowledge, thus obtaining a comprehensive evaluation. The obtained results demonstrate that our approach outperforms existing attack methods in terms of accuracy reduction and semantic similarity, achieving a 65.30% reduction in model accuracy with a low perturbation ratio of 7.79%. These findings highlight the importance of considering task-specific characteristics when designing adversarial examples, as even simple modifications to elements that support task classification can successfully mislead models.
ISSN:2076-3417