SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation
Accurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a no...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/11025809/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850114093200965632 |
|---|---|
| author | Runnan Zhang Qi Tian Jinghui Chu Wei Lu |
| author_facet | Runnan Zhang Qi Tian Jinghui Chu Wei Lu |
| author_sort | Runnan Zhang |
| collection | DOAJ |
| description | Accurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a novel deep learning framework, namely Swin Transformer-based Dynamic Memory-Attention Network (SDMA-Net). SDMA-Net integrates a Swin Transformer for multiscale feature extraction, a Dynamic Channel Attention (DCA) module for frequency-aware feature refinement, and a Channel-Level Masked AutoEncoder (CL-MAE) for self supervised learning. Temporal dependencies are modeled using a Long Short-Term Memory (LSTM) network. Additionally, a Dynamic Memory Augmentation Module (DMAM) adaptively updates and retrieves motion patterns to enhance robustness against noise and occlusions. Experiments on a colonoscopy dataset of over 12,000 images demonstrate that SDMA-Net achieves superior classification accuracy and Area Under the Curve (AUC) compared to existing baselines. As a conclusion, our proposed SDMA-Net provides an effective and efficient solution for endoscopic motion detection and classification. |
| format | Article |
| id | doaj-art-2603ec5e3d0e4b4a8fe727a9bec64ae2 |
| institution | OA Journals |
| issn | 2169-3536 |
| language | English |
| publishDate | 2025-01-01 |
| publisher | IEEE |
| record_format | Article |
| series | IEEE Access |
| spelling | doaj-art-2603ec5e3d0e4b4a8fe727a9bec64ae22025-08-20T02:36:59ZengIEEEIEEE Access2169-35362025-01-011310378610379710.1109/ACCESS.2025.357673911025809SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic NavigationRunnan Zhang0https://orcid.org/0009-0002-5818-2559Qi Tian1https://orcid.org/0009-0003-2676-5300Jinghui Chu2https://orcid.org/0000-0001-7926-8824Wei Lu3https://orcid.org/0000-0002-6566-775XSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaGeneral Surgery Department, Tianjin Children’s Hospital/Children’s Hospital, Tianjin University, Tianjin, ChinaSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaSchool of Electrical and Information Engineering, Tianjin University, Tianjin, ChinaAccurate endoscopic motion navigation is crucial for minimally invasive surgical procedures. Nevertheless, endoscopic video data often exhibit low texture, variable lighting, and dynamic motion patterns, which poses significant challenges to existing methods. To address these issues, we propose a novel deep learning framework, namely Swin Transformer-based Dynamic Memory-Attention Network (SDMA-Net). SDMA-Net integrates a Swin Transformer for multiscale feature extraction, a Dynamic Channel Attention (DCA) module for frequency-aware feature refinement, and a Channel-Level Masked AutoEncoder (CL-MAE) for self supervised learning. Temporal dependencies are modeled using a Long Short-Term Memory (LSTM) network. Additionally, a Dynamic Memory Augmentation Module (DMAM) adaptively updates and retrieves motion patterns to enhance robustness against noise and occlusions. Experiments on a colonoscopy dataset of over 12,000 images demonstrate that SDMA-Net achieves superior classification accuracy and Area Under the Curve (AUC) compared to existing baselines. As a conclusion, our proposed SDMA-Net provides an effective and efficient solution for endoscopic motion detection and classification.https://ieeexplore.ieee.org/document/11025809/Endoscopic navigationswin transformerlong short-term memory (LSTM)classification accuracy |
| spellingShingle | Runnan Zhang Qi Tian Jinghui Chu Wei Lu SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation IEEE Access Endoscopic navigation swin transformer long short-term memory (LSTM) classification accuracy |
| title | SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation |
| title_full | SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation |
| title_fullStr | SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation |
| title_full_unstemmed | SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation |
| title_short | SDMA-Net: Swin Transformer-Based Dynamic Memory-Attention Network for Endoscopic Navigation |
| title_sort | sdma net swin transformer based dynamic memory attention network for endoscopic navigation |
| topic | Endoscopic navigation swin transformer long short-term memory (LSTM) classification accuracy |
| url | https://ieeexplore.ieee.org/document/11025809/ |
| work_keys_str_mv | AT runnanzhang sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation AT qitian sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation AT jinghuichu sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation AT weilu sdmanetswintransformerbaseddynamicmemoryattentionnetworkforendoscopicnavigation |