High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images

Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited...

Full description

Saved in:
Bibliographic Details
Main Authors: Lingyu Zhao, Na Zhao, Ruiqi Zhong, Yiru Niu, Ziyi Chang, Peng Su, Zhihui Wang, Lifang Cui, Bei Wang, Huang Chen, Xiaowen Wang, Xiangbing Kong, Baolin Du, Fei Ren, Dingrong Zhong
Format: Article
Language:English
Published: BMC 2025-06-01
Series:Diagnostic Pathology
Subjects:
Online Access:https://doi.org/10.1186/s13000-025-01663-w
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849470237913645056
author Lingyu Zhao
Na Zhao
Ruiqi Zhong
Yiru Niu
Ziyi Chang
Peng Su
Zhihui Wang
Lifang Cui
Bei Wang
Huang Chen
Xiaowen Wang
Xiangbing Kong
Baolin Du
Fei Ren
Dingrong Zhong
author_facet Lingyu Zhao
Na Zhao
Ruiqi Zhong
Yiru Niu
Ziyi Chang
Peng Su
Zhihui Wang
Lifang Cui
Bei Wang
Huang Chen
Xiaowen Wang
Xiangbing Kong
Baolin Du
Fei Ren
Dingrong Zhong
author_sort Lingyu Zhao
collection DOAJ
description Abstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings. Methods We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists. Results Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516–0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists. Conclusion The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment. Trial registration Not applicable.
format Article
id doaj-art-88a2e67f30c348c8b3d2907b81bdb58d
institution Kabale University
issn 1746-1596
language English
publishDate 2025-06-01
publisher BMC
record_format Article
series Diagnostic Pathology
spelling doaj-art-88a2e67f30c348c8b3d2907b81bdb58d2025-08-20T03:25:12ZengBMCDiagnostic Pathology1746-15962025-06-0120111410.1186/s13000-025-01663-wHigh-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide imagesLingyu Zhao0Na Zhao1Ruiqi Zhong2Yiru Niu3Ziyi Chang4Peng Su5Zhihui Wang6Lifang Cui7Bei Wang8Huang Chen9Xiaowen Wang10Xiangbing Kong11Baolin Du12Fei Ren13Dingrong Zhong14Department of Pathology, China-Japan Friendship HospitalChongqing Zhijian Life Technology Co. LTDChinese Academy of Medical Sciences & Peking Union Medical CollegeDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, Ordos Central HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalDepartment of Pathology, China-Japan Friendship HospitalChongqing Zhijian Life Technology Co. LTDChongqing Zhijian Life Technology Co. LTDChongqing Zhijian Life Technology Co. LTDState Key Lab of Processors, Institute of Computing Technology, CASDepartment of Pathology, China-Japan Friendship HospitalAbstract Background Lung cancer is widely recognized as a prevalent malignant neoplasm. Traditional genetic testing methods face limitations such as high costs and lengthy procedures. The prediction of clinically relevant genetic mutations via histopathological images could facilitate the expedited identification of genetic mutations in clinical settings. Methods We collected 2,221 slides from 1999 patients diagnosed with lung adenocarcinoma. The data include whole-slide images data as well as information on gene mutations in EGFR, KRAS, ALK, HER2, and other rare genes (ROS1, RET, BRAF, PIK3CA, NRAS), and related clinical information. The self-supervised model DINO and the two-stage multi-instance network GAMIL were employed to accurately identify mutation statuses in 9 genes linked to tumorigenesis and cancer progression. The comparison of model performance involves the utilization of various foundation model (UNI), classification models (CLAM and Inception v3), external datasets (TCGA and other medical institutions), and comparative analysis with human pathologists. Results Our approach outperforms the CLAM and inception v3 model, achieving AUC values ranging from 0.825 to 0.987 for predicting gene mutations. The AUC value on the external test data set is 0.516–0.843. Furthermore, when comparing EGFR gene mutation prediction between pathologists and the GAMIL model, GAMIL exhibited a significantly higher AUC value of 0.810, exceeding the average AUC value of 0.508 achieved by pathologists. Conclusion The GAMIL models exhibit outstanding performance in delineating tumor regions in lung adenocarcinoma and in forecasting gene mutations. The utilization of these models presents substantial potential for markedly improving molecular testing efficiency and opening novel pathways for personalized treatment. Trial registration Not applicable.https://doi.org/10.1186/s13000-025-01663-wArtificial intelligenceLung adenocarcinomaGene mutationMultiple instance learningSelf-supervised
spellingShingle Lingyu Zhao
Na Zhao
Ruiqi Zhong
Yiru Niu
Ziyi Chang
Peng Su
Zhihui Wang
Lifang Cui
Bei Wang
Huang Chen
Xiaowen Wang
Xiangbing Kong
Baolin Du
Fei Ren
Dingrong Zhong
High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
Diagnostic Pathology
Artificial intelligence
Lung adenocarcinoma
Gene mutation
Multiple instance learning
Self-supervised
title High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
title_full High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
title_fullStr High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
title_full_unstemmed High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
title_short High-accuracy prediction of mutations in nine genes in lung adenocarcinoma via two-stage multi-instance learning on large-scale whole-slide images
title_sort high accuracy prediction of mutations in nine genes in lung adenocarcinoma via two stage multi instance learning on large scale whole slide images
topic Artificial intelligence
Lung adenocarcinoma
Gene mutation
Multiple instance learning
Self-supervised
url https://doi.org/10.1186/s13000-025-01663-w
work_keys_str_mv AT lingyuzhao highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT nazhao highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT ruiqizhong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT yiruniu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT ziyichang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT pengsu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT zhihuiwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT lifangcui highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT beiwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT huangchen highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT xiaowenwang highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT xiangbingkong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT baolindu highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT feiren highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages
AT dingrongzhong highaccuracypredictionofmutationsinninegenesinlungadenocarcinomaviatwostagemultiinstancelearningonlargescalewholeslideimages