SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation

Accurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a no...

Full description

Saved in:
Bibliographic Details
Main Authors: Runnan Zhang, Qi Tian, Jinghui Chu, Wei Lu
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11025809/
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850114093200965632
author Runnan Zhang
Qi Tian
Jinghui Chu
Wei Lu
author_facet Runnan Zhang
Qi Tian
Jinghui Chu
Wei Lu
author_sort Runnan Zhang
collection DOAJ
description Accurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a novel deep learning framework, namely Swin Transformer-based Dynamic Memory-Attention Network (SDMA-Net). SDMA-Net integrates a Swin Transformer for multiscale feature extraction, a Dynamic Channel Attention (DCA) module for frequency-aware feature refinement, and a Channel-Level Masked AutoEncoder (CL-MAE) for self supervised learning. Temporal dependencies are modeled using a Long Short-Term Memory (LSTM) network. Additionally, a Dynamic Memory Augmentation Module (DMAM) adaptively updates and retrieves motion patterns to enhance robustness against noise and occlusions. Experiments on a colonoscopy dataset of over 12,000 images demonstrate that SDMA-Net achieves superior classification accuracy and Area Under the Curve (AUC) compared to existing baselines. As a conclusion, our proposed SDMA-Net provides an effective and efficient solution for endoscopic motion detection and classification.
format Article
id doaj-art-2603ec5e3d0e4b4a8fe727a9bec64ae2
institution OA Journals
issn 2169-3536
language English
publishDate 2025-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj-art-2603ec5e3d0e4b4a8fe727a9bec64ae22025-08-20T02:36:59ZengIEEEIEEE Access2169-35362025-01-011310378610379710.1109/ACCESS.2025.357673911025809SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic NavigationRunnan Zhang0https://orcid.org/0009-0002-5818-2559Qi Tian1https://orcid.org/0009-0003-2676-5300Jinghui Chu2https://orcid.org/0000-0001-7926-8824Wei Lu3https://orcid.org/0000-0002-6566-775XSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaGeneral Surgery Department, Tianjin Children’s Hospital/Children’s Hospital, Tianjin University, Tianjin, ChinaSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaAccurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a novel deep learning framework, namely Swin Transformer-based Dynamic Memory-Attention Network (SDMA-Net). SDMA-Net integrates a Swin Transformer for multiscale feature extraction, a Dynamic Channel Attention (DCA) module for frequency-aware feature refinement, and a Channel-Level Masked AutoEncoder (CL-MAE) for self supervised learning. Temporal dependencies are modeled using a Long Short-Term Memory (LSTM) network. Additionally, a Dynamic Memory Augmentation Module (DMAM) adaptively updates and retrieves motion patterns to enhance robustness against noise and occlusions. Experiments on a colonoscopy dataset of over 12,000 images demonstrate that SDMA-Net achieves superior classification accuracy and Area Under the Curve (AUC) compared to existing baselines. As a conclusion, our proposed SDMA-Net provides an effective and efficient solution for endoscopic motion detection and classification.https://ieeexplore.ieee.org/document/11025809/Endoscopic navigationswin transformerlong short-term memory (LSTM)classification accuracy
spellingShingle Runnan Zhang
Qi Tian
Jinghui Chu
Wei Lu
SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
IEEE Access
Endoscopic navigation
swin transformer
long short-term memory (LSTM)
classification accuracy
title SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
title_full SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
title_fullStr SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
title_full_unstemmed SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
title_short SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
title_sort sdma net swin transformer based dynamic memory attention network for endoscopic navigation
topic Endoscopic navigation
swin transformer
long short-term memory (LSTM)
classification accuracy
url https://ieeexplore.ieee.org/document/11025809/
work_keys_str_mv AT runnanzhang sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation
AT qitian sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation
AT jinghuichu sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation
AT weilu sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation