Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Abstract We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans usin...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zahra Bashir, Manxi Lin, Aasa Feragen, Kamil Mikolaj, Caroline Taksøe-Vester, Anders Nymark Christensen, Morten B. S. Svendsen, Mette Hvilshøj Fabricius, Lisbeth Andreasen, Mads Nielsen, Martin Grønnebæk Tolsgaard
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-01-01
Series:	Scientific Reports
Subjects:	Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration
Online Access:	https://doi.org/10.1038/s41598-025-86536-4
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1832594744778162176
author	Zahra Bashir Manxi Lin Aasa Feragen Kamil Mikolaj Caroline Taksøe-Vester Anders Nymark Christensen Morten B. S. Svendsen Mette Hvilshøj Fabricius Lisbeth Andreasen Mads Nielsen Martin Grønnebæk Tolsgaard
author_facet	Zahra Bashir Manxi Lin Aasa Feragen Kamil Mikolaj Caroline Taksøe-Vester Anders Nymark Christensen Morten B. S. Svendsen Mette Hvilshøj Fabricius Lisbeth Andreasen Mads Nielsen Martin Grønnebæk Tolsgaard
author_sort	Zahra Bashir
collection	DOAJ
description	Abstract We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model’s ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model’s ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model’s performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.
format	Article
id	doaj-art-2167277683eb46b38111f9dd028a5b1f
institution	Kabale University
issn	2045-2322
language	English
publishDate	2025-01-01
publisher	Nature Portfolio
record_format	Article
series	Scientific Reports
spelling	doaj-art-2167277683eb46b38111f9dd028a5b1f2025-01-19T12:23:12ZengNature PortfolioScientific Reports2045-23222025-01-0115111010.1038/s41598-025-86536-4Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluationZahra Bashir0Manxi Lin1Aasa Feragen2Kamil Mikolaj3Caroline Taksøe-Vester4Anders Nymark Christensen5Morten B. S. Svendsen6Mette Hvilshøj Fabricius7Lisbeth Andreasen8Mads Nielsen9Martin Grønnebæk Tolsgaard10Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of CopenhagenTechnical University of Denmark (DTU)Technical University of Denmark (DTU)Technical University of Denmark (DTU)Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of CopenhagenTechnical University of Denmark (DTU)Copenhagen Academy for Medical Education and Simulation (CAMES)Department of Obstetrics and Gynecology, Slagelse HospitalDepartment of Obstetrics and Gynecology, Hvidovre HospitalDepartment of Computer Science, University of CopenhagenDepartment of Clinical Medicine, Faculty of Health and Medical Sciences, University of CopenhagenAbstract We aimed to develop and evaluate Explainable Artificial Intelligence (XAI) for fetal ultrasound using actionable concepts as feedback to end-users, using a prospective cross-center, multi-level approach. We developed, implemented, and tested a deep-learning model for fetal growth scans using both retrospective and prospective data. We used a modified Progressive Concept Bottleneck Model with pre-established clinical concepts as explanations (feedback on image optimization and presence of anatomical landmarks) as well as segmentations (outlining anatomical landmarks). The model was evaluated prospectively by assessing the following: the model’s ability to assess standard plane quality, the correctness of explanations, the clinical usefulness of explanations, and the model’s ability to discriminate between different levels of expertise among clinicians. We used 9352 annotated images for model development and 100 videos for prospective evaluation. Overall classification accuracy was 96.3%. The model’s performance in assessing standard plane quality was on par with that of clinicians. Agreement between model segmentations and explanations provided by expert clinicians was found in 83.3% and 74.2% of cases, respectively. A panel of clinicians evaluated segmentations as useful in 72.4% of cases and explanations as useful in 75.0% of cases. Finally, the model reliably discriminated between the performances of clinicians with different levels of experience (p- values < 0.01 for all measures) Our study has successfully developed an Explainable AI model for real-time feedback to clinicians performing fetal growth scans. This work contributes to the existing literature by addressing the gap in the clinical validation of Explainable AI models within fetal medicine, emphasizing the importance of multi-level, cross-institutional, and prospective evaluation with clinician end-users. The prospective clinical validation uncovered challenges and opportunities that could not have been anticipated if we had only focused on retrospective development and validation, such as leveraging AI to gauge operator competence in fetal ultrasound.https://doi.org/10.1038/s41598-025-86536-4Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration
spellingShingle	Zahra Bashir Manxi Lin Aasa Feragen Kamil Mikolaj Caroline Taksøe-Vester Anders Nymark Christensen Morten B. S. Svendsen Mette Hvilshøj Fabricius Lisbeth Andreasen Mads Nielsen Martin Grønnebæk Tolsgaard Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation Scientific Reports Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration
title	Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation
title_full	Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation
title_fullStr	Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation
title_full_unstemmed	Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation
title_short	Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation
title_sort	clinical validation of explainable ai for fetal growth scans through multi level cross institutional prospective end user evaluation
topic	Artificial intelligence, Fetal growth scans, Explainable AI, Human-AI collaboration
url	https://doi.org/10.1038/s41598-025-86536-4
work_keys_str_mv	AT zahrabashir clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT manxilin clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT aasaferagen clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT kamilmikolaj clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT carolinetaksøevester clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT andersnymarkchristensen clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT mortenbssvendsen clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT mettehvilshøjfabricius clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT lisbethandreasen clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT madsnielsen clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation AT martingrønnebæktolsgaard clinicalvalidationofexplainableaiforfetalgrowthscansthroughmultilevelcrossinstitutionalprospectiveenduserevaluation

Clinical validation of explainable AI for fetal growth scans through multi-level, cross-institutional prospective end-user evaluation

Similar Items