ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
AIMS Press
2024-12-01
|
Series: | Electronic Research Archive |
Subjects: | |
Online Access: | https://www.aimspress.com/article/doi/10.3934/era.2024313 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832590783691096064 |
---|---|
author | Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu |
author_facet | Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu |
author_sort | Zengyu Cai |
collection | DOAJ |
description | Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms. |
format | Article |
id | doaj-art-e9022109a6ff4566ab6955df2bc30cf2 |
institution | Kabale University |
issn | 2688-1594 |
language | English |
publishDate | 2024-12-01 |
publisher | AIMS Press |
record_format | Article |
series | Electronic Research Archive |
spelling | doaj-art-e9022109a6ff4566ab6955df2bc30cf22025-01-23T07:53:06ZengAIMS PressElectronic Research Archive2688-15942024-12-0132126698671610.3934/era.2024313ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attentionZengyu Cai0Liusen Xu1Jianwei Zhang2Yuan Feng3Liang Zhu4Fangmei Liu5School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Elechonic Information, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaPornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.https://www.aimspress.com/article/doi/10.3934/era.2024313pornographic image classificationvision transformerconvolutional block attention modulemulti-head attentionconvolutional neural network |
spellingShingle | Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention Electronic Research Archive pornographic image classification vision transformer convolutional block attention module multi-head attention convolutional neural network |
title | ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention |
title_full | ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention |
title_fullStr | ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention |
title_full_unstemmed | ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention |
title_short | ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention |
title_sort | vit dualatt an efficient pornographic image classification method based on vision transformer with dual attention |
topic | pornographic image classification vision transformer convolutional block attention module multi-head attention convolutional neural network |
url | https://www.aimspress.com/article/doi/10.3934/era.2024313 |
work_keys_str_mv | AT zengyucai vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT liusenxu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT jianweizhang vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT yuanfeng vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT liangzhu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT fangmeiliu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention |