IMViT: Adjacency Matrix-Based Lightweight Plain Vision Transformer
Transformers are becoming dominant deep learning backbones for both computer vision and natural language processing. While extensive experiments prove its outstanding ability for large models, transformers with small sizes are not comparable with convolutional neural networks in various downstream t...
Saved in:
Main Authors: | Qihao Chen, Yunfeng Yan, Xianbo Wang, Jishen Peng |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10849548/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
by: Zengyu Cai, et al.
Published: (2024-12-01) -
Mirror Target YOLO: An Improved YOLOv8 Method With Indirect Vision for Heritage Buildings Fire Detection
by: Jian Liang, et al.
Published: (2025-01-01) -
Leveraging two-dimensional pre-trained vision transformers for three-dimensional model generation via masked autoencoders
by: Muhammad Sajid, et al.
Published: (2025-01-01) -
Transforming Alzheimer’s Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
by: Dian Kurniasari, et al.
Published: (2025-01-01) -
Squeeze-and-Excitation Vision Transformer for Lung Nodule Classification
by: Xiaozhong Xue, et al.
Published: (2025-01-01)