ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention

Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zengyu Cai, Liusen Xu, Jianwei Zhang, Yuan Feng, Liang Zhu, Fangmei Liu
Format:	Article
Language:	English
Published:	AIMS Press 2024-12-01
Series:	Electronic Research Archive
Subjects:	pornographic image classification vision transformer convolutional block attention module multi-head attention convolutional neural network
Online Access:	https://www.aimspress.com/article/doi/10.3934/era.2024313
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832590783691096064
author	Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu
author_facet	Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu
author_sort	Zengyu Cai
collection	DOAJ
description	Pornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.
format	Article
id	doaj-art-e9022109a6ff4566ab6955df2bc30cf2
institution	Kabale University
issn	2688-1594
language	English
publishDate	2024-12-01
publisher	AIMS Press
record_format	Article
series	Electronic Research Archive
spelling	doaj-art-e9022109a6ff4566ab6955df2bc30cf22025-01-23T07:53:06ZengAIMS PressElectronic Research Archive2688-15942024-12-0132126698671610.3934/era.2024313ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attentionZengyu Cai0Liusen Xu1Jianwei Zhang2Yuan Feng3Liang Zhu4Fangmei Liu5School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Elechonic Information, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaSchool of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450003, ChinaPornographic images not only pollute the internet environment, but also potentially harm societal values and the mental health of young people. Therefore, accurately classifying and filtering pornographic images is crucial to maintaining the safety of the online community. In this paper, we propose a novel pornographic image classification model named ViT-DualAtt. The model adopts a CNN-Transformer hierarchical structure, combining the strengths of Convolutional Neural Networks (CNNs) and Transformers to effectively capture and integrate both local and global features, thereby enhancing feature representation accuracy and diversity. Moreover, the model integrates multi-head attention and convolutional block attention mechanisms to further improve classification accuracy. Experiments were conducted using the nsfw_data_scrapper dataset publicly available on GitHub by data scientist Alexander Kim. Our results demonstrated that ViT-DualAtt achieved a classification accuracy of 97.2% ± 0.1% in pornographic image classification tasks, outperforming the current state-of-the-art model (RepVGG-SimAM) by 2.7%. Furthermore, the model achieves a pornographic image miss rate of only 1.6%, significantly reducing the risk of pornographic image dissemination on internet platforms.https://www.aimspress.com/article/doi/10.3934/era.2024313pornographic image classificationvision transformerconvolutional block attention modulemulti-head attentionconvolutional neural network
spellingShingle	Zengyu Cai Liusen Xu Jianwei Zhang Yuan Feng Liang Zhu Fangmei Liu ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention Electronic Research Archive pornographic image classification vision transformer convolutional block attention module multi-head attention convolutional neural network
title	ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_full	ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_fullStr	ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_full_unstemmed	ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_short	ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
title_sort	vit dualatt an efficient pornographic image classification method based on vision transformer with dual attention
topic	pornographic image classification vision transformer convolutional block attention module multi-head attention convolutional neural network
url	https://www.aimspress.com/article/doi/10.3934/era.2024313
work_keys_str_mv	AT zengyucai vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT liusenxu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT jianweizhang vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT yuanfeng vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT liangzhu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention AT fangmeiliu vitdualattanefficientpornographicimageclassificationmethodbasedonvisiontransformerwithdualattention

ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention

Similar Items