LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.

Since Transformers have demonstrated excellent performance in the segmentation of two-dimensional medical images, recent works have also introduced them into 3D medical segmentation tasks. For example, hierarchical transformers like Swin UNETR have reintroduced several prior knowledge of convolution...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ming Li, Jingang Ma, Jing Zhao
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2025-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0329806
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1849228260289806336
author	Ming Li Jingang Ma Jing Zhao
author_facet	Ming Li Jingang Ma Jing Zhao
author_sort	Ming Li
collection	DOAJ
description	Since Transformers have demonstrated excellent performance in the segmentation of two-dimensional medical images, recent works have also introduced them into 3D medical segmentation tasks. For example, hierarchical transformers like Swin UNETR have reintroduced several prior knowledge of convolutional networks, further enhancing the model's volumetric segmentation ability on three-dimensional medical datasets. The effectiveness of these hybrid architecture methods is largely attributed to the large number of parameters and the large receptive fields of non-local self-attention. We believe that large-kernel volumetric depthwise convolutions can obtain large receptive fields with fewer parameters. In this paper, we propose a lightweight three-dimensional convolutional network, LKDA-Net, for efficient and accurate three-dimensional volumetric segmentation. This network adopts a large-kernel depthwise convolution attention mechanism to simulate the self-attention mechanism of Transformers. Firstly, inspired by the Swin Transformer module, we investigate different-sized large-kernel convolution attention mechanisms to obtain larger global receptive fields, and replace the MLP in the Swin Transformer with the Inverted Bottleneck with Depthwise Convolutional Augmentation to reduce channel redundancy and enhance feature expression and segmentation performance. Secondly, we propose a skip connection fusion module to achieve smooth feature fusion, enabling the decoder to effectively utilize the features of the encoder. Finally, through experimental evaluations on three public datasets, namely Synapse, BTCV and ACDC, LKDA-Net outperforms existing models of various architectures in segmentation performance and has fewer parameters. Code: https://github.com/zouyunkai/LKDA-Net.
format	Article
id	doaj-art-db9b35e7dac5494eb3b9e9d5def33b2c
institution	Kabale University
issn	1932-6203
language	English
publishDate	2025-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj-art-db9b35e7dac5494eb3b9e9d5def33b2c2025-08-23T05:31:52ZengPublic Library of Science (PLoS)PLoS ONE1932-62032025-01-01208e032980610.1371/journal.pone.0329806LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.Ming LiJingang MaJing ZhaoSince Transformers have demonstrated excellent performance in the segmentation of two-dimensional medical images, recent works have also introduced them into 3D medical segmentation tasks. For example, hierarchical transformers like Swin UNETR have reintroduced several prior knowledge of convolutional networks, further enhancing the model's volumetric segmentation ability on three-dimensional medical datasets. The effectiveness of these hybrid architecture methods is largely attributed to the large number of parameters and the large receptive fields of non-local self-attention. We believe that large-kernel volumetric depthwise convolutions can obtain large receptive fields with fewer parameters. In this paper, we propose a lightweight three-dimensional convolutional network, LKDA-Net, for efficient and accurate three-dimensional volumetric segmentation. This network adopts a large-kernel depthwise convolution attention mechanism to simulate the self-attention mechanism of Transformers. Firstly, inspired by the Swin Transformer module, we investigate different-sized large-kernel convolution attention mechanisms to obtain larger global receptive fields, and replace the MLP in the Swin Transformer with the Inverted Bottleneck with Depthwise Convolutional Augmentation to reduce channel redundancy and enhance feature expression and segmentation performance. Secondly, we propose a skip connection fusion module to achieve smooth feature fusion, enabling the decoder to effectively utilize the features of the encoder. Finally, through experimental evaluations on three public datasets, namely Synapse, BTCV and ACDC, LKDA-Net outperforms existing models of various architectures in segmentation performance and has fewer parameters. Code: https://github.com/zouyunkai/LKDA-Net.https://doi.org/10.1371/journal.pone.0329806
spellingShingle	Ming Li Jingang Ma Jing Zhao LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation. PLoS ONE
title	LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.
title_full	LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.
title_fullStr	LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.
title_full_unstemmed	LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.
title_short	LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.
title_sort	lkda net hierarchical transformer with large kernel depthwise convolution attention for 3d medical image segmentation
url	https://doi.org/10.1371/journal.pone.0329806
work_keys_str_mv	AT mingli lkdanethierarchicaltransformerwithlargekerneldepthwiseconvolutionattentionfor3dmedicalimagesegmentation AT jingangma lkdanethierarchicaltransformerwithlargekerneldepthwiseconvolutionattentionfor3dmedicalimagesegmentation AT jingzhao lkdanethierarchicaltransformerwithlargekerneldepthwiseconvolutionattentionfor3dmedicalimagesegmentation

LKDA-Net: Hierarchical transformer with large Kernel depthwise convolution attention for 3D medical image segmentation.

Similar Items