Abnormality-aware multimodal learning for WSI classification

Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into...

Full description

Saved in:
Bibliographic Details
Main Authors: Thao M. Dang, Qifeng Zhou, Yuzhi Guo, Hehuan Ma, Saiyang Na, Thao Bich Dang, Jean Gao, Junzhou Huang
Format: Article
Language:English
Published: Frontiers Media S.A. 2025-02-01
Series:Frontiers in Medicine
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/full
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850189611263852544
author Thao M. Dang
Qifeng Zhou
Yuzhi Guo
Hehuan Ma
Saiyang Na
Thao Bich Dang
Jean Gao
Junzhou Huang
author_facet Thao M. Dang
Qifeng Zhou
Yuzhi Guo
Hehuan Ma
Saiyang Na
Thao Bich Dang
Jean Gao
Junzhou Huang
author_sort Thao M. Dang
collection DOAJ
description Whole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the Abnormality-Aware MultiModal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology.
format Article
id doaj-art-eecbbbe4b775490c951c5f8a090ca734
institution OA Journals
issn 2296-858X
language English
publishDate 2025-02-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Medicine
spelling doaj-art-eecbbbe4b775490c951c5f8a090ca7342025-08-20T02:15:34ZengFrontiers Media S.A.Frontiers in Medicine2296-858X2025-02-011210.3389/fmed.2025.15464521546452Abnormality-aware multimodal learning for WSI classificationThao M. Dang0Qifeng Zhou1Yuzhi Guo2Hehuan Ma3Saiyang Na4Thao Bich Dang5Jean Gao6Junzhou Huang7Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Pulmonary and Critical Care, University of Arizona, Phoenix, AZ, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesDepartment of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX, United StatesWhole slide images (WSIs) play a vital role in cancer diagnosis and prognosis. However, their gigapixel resolution, lack of pixel-level annotations, and reliance on unimodal visual data present challenges for accurate and efficient computational analysis. Existing methods typically divide WSIs into thousands of patches, which increases computational demands and makes it challenging to effectively focus on diagnostically relevant regions. Furthermore, these methods frequently rely on feature extractors pretrained on natural images, which are not optimized for pathology tasks, and overlook multimodal data sources such as cellular and textual information that can provide critical insights. To address these limitations, we propose the Abnormality-Aware MultiModal (AAMM) learning framework, which integrates abnormality detection and multimodal feature learning for WSI classification. AAMM incorporates a Gaussian Mixture Variational Autoencoder (GMVAE) to identify and select the most informative patches, reducing computational complexity while retaining critical diagnostic information. It further integrates multimodal features from pathology-specific foundation models, combining patch-level, cell-level, and text-level representations through cross-attention mechanisms. This approach enhances the ability to comprehensively analyze WSIs for cancer diagnosis and subtyping. Extensive experiments on normal-tumor classification and cancer subtyping demonstrate that AAMM achieves superior performance compared to state-of-the-art methods. By combining abnormal detection with multimodal feature integration, our framework offers an efficient and scalable solution for advancing computational pathology.https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/fullWSI analysismultimodal fusionabnormal detectionfoundation modelGaussian Mixture Variational Autoencoder
spellingShingle Thao M. Dang
Qifeng Zhou
Yuzhi Guo
Hehuan Ma
Saiyang Na
Thao Bich Dang
Jean Gao
Junzhou Huang
Abnormality-aware multimodal learning for WSI classification
Frontiers in Medicine
WSI analysis
multimodal fusion
abnormal detection
foundation model
Gaussian Mixture Variational Autoencoder
title Abnormality-aware multimodal learning for WSI classification
title_full Abnormality-aware multimodal learning for WSI classification
title_fullStr Abnormality-aware multimodal learning for WSI classification
title_full_unstemmed Abnormality-aware multimodal learning for WSI classification
title_short Abnormality-aware multimodal learning for WSI classification
title_sort abnormality aware multimodal learning for wsi classification
topic WSI analysis
multimodal fusion
abnormal detection
foundation model
Gaussian Mixture Variational Autoencoder
url https://www.frontiersin.org/articles/10.3389/fmed.2025.1546452/full
work_keys_str_mv AT thaomdang abnormalityawaremultimodallearningforwsiclassification
AT qifengzhou abnormalityawaremultimodallearningforwsiclassification
AT yuzhiguo abnormalityawaremultimodallearningforwsiclassification
AT hehuanma abnormalityawaremultimodallearningforwsiclassification
AT saiyangna abnormalityawaremultimodallearningforwsiclassification
AT thaobichdang abnormalityawaremultimodallearningforwsiclassification
AT jeangao abnormalityawaremultimodallearningforwsiclassification
AT junzhouhuang abnormalityawaremultimodallearningforwsiclassification