ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention

Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose...

Full description

Saved in:
Bibliographic Details
Main Authors: Zengyu Cai, Liusen Xu, Jianwei Zhang, Yuan Feng, Liang Zhu, Fangmei Liu
Format: Article
Language:English
Published: AIMS Press 2024-12-01
Series:Electronic Research Archive
Subjects:
Online Access:https://www.aimspress.com/article/doi/10.3934/era.2024313
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832590783691096064
author Zengyu Cai
Liusen Xu
Jianwei Zhang
Yuan Feng
Liang Zhu
Fangmei Liu
author_facet Zengyu Cai
Liusen Xu
Jianwei Zhang
Yuan Feng
Liang Zhu
Fangmei Liu
author_sort Zengyu Cai
collection DOAJ
description Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.
format Article
id doaj-art-e9022109a6ff4566ab6955df2bc30cf2
institution Kabale University
issn 2688-1594
language English
publishDate 2024-12-01
publisher AIMS Press
record_format Article
series Electronic Research Archive
spelling doaj-art-e9022109a6ff4566ab6955df2bc30cf22025-01-23T07:53:06ZengAIMS PressElectronic Research Archive2688-15942024-12-0132126698671610.3934/era.2024313ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attentionZengyu Cai0Liusen Xu1Jianwei Zhang2Yuan Feng3Liang Zhu4Fangmei Liu5School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Elechonic Information, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaPornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.https://www.aimspress.com/article/doi/10.3934/era.2024313pornographic image classificationvision transformerconvolutional block attention modulemulti-head attentionconvolutional neural network
spellingShingle Zengyu Cai
Liusen Xu
Jianwei Zhang
Yuan Feng
Liang Zhu
Fangmei Liu
ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
Electronic Research Archive
pornographic image classification
vision transformer
convolutional block attention module
multi-head attention
convolutional neural network
title ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_full ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_fullStr ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_full_unstemmed ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_short ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_sort vit dualatt an efficient pornographic image classification method based on vision transformer with dual attention
topic pornographic image classification
vision transformer
convolutional block attention module
multi-head attention
convolutional neural network
url https://www.aimspress.com/article/doi/10.3934/era.2024313
work_keys_str_mv AT zengyucai vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention
AT liusenxu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention
AT jianweizhang vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention
AT yuanfeng vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention
AT liangzhu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention
AT fangmeiliu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention