High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited...
Saved in:
| Main Authors: | , , , , , , , , , , , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
BMC
2025-06-01
|
| Series: | Diagnostic Pathology |
| Subjects: | |
| Online Access: | https://doi.org/10.1186/s13000-025-01663-w |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849470237913645056 |
|---|---|
| author | Lingyu Zhao Na Zhao Ruiqi Zhong Yiru Niu Ziyi Chang Peng Su Zhihui Wang Lifang Cui Bei Wang Huang Chen Xiaowen Wang Xiangbing Kong Baolin Du Fei Ren Dingrong Zhong |
| author_facet | Lingyu Zhao Na Zhao Ruiqi Zhong Yiru Niu Ziyi Chang Peng Su Zhihui Wang Lifang Cui Bei Wang Huang Chen Xiaowen Wang Xiangbing Kong Baolin Du Fei Ren Dingrong Zhong |
| author_sort | Lingyu Zhao |
| collection | DOAJ |
| description | Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings. Methods We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists. Results Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516–0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists. Conclusion The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment. Trial registration Not applicable. |
| format | Article |
| id | doaj-art-88a2e67f30c348c8b3d2907b81bdb58d |
| institution | Kabale University |
| issn | 1746-1596 |
| language | English |
| publishDate | 2025-06-01 |
| publisher | BMC |
| record_format | Article |
| series | Diagnostic Pathology |
| spelling | doaj-art-88a2e67f30c348c8b3d2907b81bdb58d2025-08-20T03:25:12ZengBMCDiagnostic Pathology1746-15962025-06-0120111410.1186/s13000-025-01663-wHigh-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide imagesLingyu Zhao0Na Zhao1Ruiqi Zhong2Yiru Niu3Ziyi Chang4Peng Su5Zhihui Wang6Lifang Cui7Bei Wang8Huang Chen9Xiaowen Wang10Xiangbing Kong11Baolin Du12Fei Ren13Dingrong Zhong14Department of Pathology, China-Japan Friendship HospitalChongqing Zhijian Life Technology Co. LTDChinese Academy of Medical Sciences & Peking Union Medical CollegeDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, Ordos Central HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalChongqing Zhijian Life Technology Co. LTDChongqing Zhijian Life Technology Co. LTDChongqing Zhijian Life Technology Co. LTDState Key Lab of Processors, Institute of Computing Technology, CASDepartment of Pathology, China-Japan Friendship HospitalAbstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings. Methods We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists. Results Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516–0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists. Conclusion The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment. Trial registration Not applicable.https://doi.org/10.1186/s13000-025-01663-wArtificial intelligenceLung adenocarcinomaGene mutationMultiple instance learningSelf-supervised |
| spellingShingle | Lingyu Zhao Na Zhao Ruiqi Zhong Yiru Niu Ziyi Chang Peng Su Zhihui Wang Lifang Cui Bei Wang Huang Chen Xiaowen Wang Xiangbing Kong Baolin Du Fei Ren Dingrong Zhong High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images Diagnostic Pathology Artificial intelligence Lung adenocarcinoma Gene mutation Multiple instance learning Self-supervised |
| title | High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images |
| title_full | High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images |
| title_fullStr | High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images |
| title_full_unstemmed | High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images |
| title_short | High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images |
| title_sort | high accuracy prediction of mutations in nine genes in lung adenocarcinoma via two stage multi instance learning on large scale whole slide images |
| topic | Artificial intelligence Lung adenocarcinoma Gene mutation Multiple instance learning Self-supervised |
| url | https://doi.org/10.1186/s13000-025-01663-w |
| work_keys_str_mv | AT lingyuzhao highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT nazhao highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT ruiqizhong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT yiruniu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT ziyichang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT pengsu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT zhihuiwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT lifangcui highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT beiwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT huangchen highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT xiaowenwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT xiangbingkong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT baolindu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT feiren highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages AT dingrongzhong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages |