Abnormality-aware multimodal learning for WSI classification
Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into...
Saved in:
| Main Authors: | , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
Frontiers Media S.A.
2025-02-01
|
| Series: | Frontiers in Medicine |
| Subjects: | |
| Online Access: | https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/full |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1850189611263852544 |
|---|---|
| author | Thao M. Dang Qifeng Zhou Yuzhi Guo Hehuan Ma Saiyang Na Thao Bich Dang Jean Gao Junzhou Huang |
| author_facet | Thao M. Dang Qifeng Zhou Yuzhi Guo Hehuan Ma Saiyang Na Thao Bich Dang Jean Gao Junzhou Huang |
| author_sort | Thao M. Dang |
| collection | DOAJ |
| description | Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the Abnormality-Aware MultiModal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology. |
| format | Article |
| id | doaj-art-eecbbbe4b775490c951c5f8a090ca734 |
| institution | OA Journals |
| issn | 2296-858X |
| language | English |
| publishDate | 2025-02-01 |
| publisher | Frontiers Media S.A. |
| record_format | Article |
| series | Frontiers in Medicine |
| spelling | doaj-art-eecbbbe4b775490c951c5f8a090ca7342025-08-20T02:15:34ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-02-011210.3389/fmed.2025.15464521546452Abnormality-aware multimodal learning for WSI classificationThao M. Dang0Qifeng Zhou1Yuzhi Guo2Hehuan Ma3Saiyang Na4Thao Bich Dang5Jean Gao6Junzhou Huang7Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Pulmonary and Critical Care, University of Arizona, Phoenix, AZ, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesWhole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the Abnormality-Aware MultiModal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology.https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/fullWSI analysismultimodal fusionabnormal detectionfoundation modelGaussian Mixture Variational Autoencoder |
| spellingShingle | Thao M. Dang Qifeng Zhou Yuzhi Guo Hehuan Ma Saiyang Na Thao Bich Dang Jean Gao Junzhou Huang Abnormality-aware multimodal learning for WSI classification Frontiers in Medicine WSI analysis multimodal fusion abnormal detection foundation model Gaussian Mixture Variational Autoencoder |
| title | Abnormality-aware multimodal learning for WSI classification |
| title_full | Abnormality-aware multimodal learning for WSI classification |
| title_fullStr | Abnormality-aware multimodal learning for WSI classification |
| title_full_unstemmed | Abnormality-aware multimodal learning for WSI classification |
| title_short | Abnormality-aware multimodal learning for WSI classification |
| title_sort | abnormality aware multimodal learning for wsi classification |
| topic | WSI analysis multimodal fusion abnormal detection foundation model Gaussian Mixture Variational Autoencoder |
| url | https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/full |
| work_keys_str_mv | AT thaomdang abnormalityawaremultimodallearningforwsiclassification AT qifengzhou abnormalityawaremultimodallearningforwsiclassification AT yuzhiguo abnormalityawaremultimodallearningforwsiclassification AT hehuanma abnormalityawaremultimodallearningforwsiclassification AT saiyangna abnormalityawaremultimodallearningforwsiclassification AT thaobichdang abnormalityawaremultimodallearningforwsiclassification AT jeangao abnormalityawaremultimodallearningforwsiclassification AT junzhouhuang abnormalityawaremultimodallearningforwsiclassification |