STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability
Abstract Generative artificial intelligence (AI) holds immense potential for medical applications, but the lack of a comprehensive evaluation framework and methodological deficiencies in existing studies hinder its effective implementation. Standardized assessment guidelines are crucial for ensuring...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2024-09-01
|
Series: | iMetaOmics |
Subjects: | |
Online Access: | https://doi.org/10.1002/imo2.7 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832575941152342016 |
---|---|
author | Jinghong Chen Lingxuan Zhu Weiming Mou Anqi Lin Dongqiang Zeng Chang Qi Zaoqu Liu Aimin Jiang Bufu Tang Wenjie Shi Ulf D. Kahlert Jianguo Zhou Shipeng Guo Xiaofan Lu Xu Sun Trunghieu Ngo Zhongji Pu Baolei Jia Che Ok Jeon Yongbin He Haiyang Wu Shuqin Gu Wisit Cheungpasitporn Haojie Huang Weipu Mao Shixiang Wang Xin Chen Loïc Cabannes Gerald Sng Gui Ren Iain S. Whitaker Stephen Ali Quan Cheng Kai Miao Shuofeng Yuan Peng Luo |
author_facet | Jinghong Chen Lingxuan Zhu Weiming Mou Anqi Lin Dongqiang Zeng Chang Qi Zaoqu Liu Aimin Jiang Bufu Tang Wenjie Shi Ulf D. Kahlert Jianguo Zhou Shipeng Guo Xiaofan Lu Xu Sun Trunghieu Ngo Zhongji Pu Baolei Jia Che Ok Jeon Yongbin He Haiyang Wu Shuqin Gu Wisit Cheungpasitporn Haojie Huang Weipu Mao Shixiang Wang Xin Chen Loïc Cabannes Gerald Sng Gui Ren Iain S. Whitaker Stephen Ali Quan Cheng Kai Miao Shuofeng Yuan Peng Luo |
author_sort | Jinghong Chen |
collection | DOAJ |
description | Abstract Generative artificial intelligence (AI) holds immense potential for medical applications, but the lack of a comprehensive evaluation framework and methodological deficiencies in existing studies hinder its effective implementation. Standardized assessment guidelines are crucial for ensuring reliable and consistent evaluation of generative AI in healthcare. Our objective is to develop robust, standardized guidelines tailored for evaluating generative AI performance in medical contexts. Through a rigorous literature review utilizing the Web of Sciences, Cochrane Library, PubMed, and Google Scholar, we focused on research testing generative AI capabilities in medicine. Our multidisciplinary team of experts conducted discussion sessions to develop a comprehensive 32‐item checklist. This checklist encompasses critical evaluation aspects of generative AI in medical applications, addressing key dimensions such as question collection, querying methodologies, and assessment techniques. The checklist and its broader assessment framework provide a holistic evaluation of AI systems, delineating a clear pathway from question gathering to result assessment. It guides researchers through potential challenges and pitfalls, enhancing research quality and reporting and aiding the evolution of generative AI in medicine and life sciences. Our framework furnishes a standardized, systematic approach for testing generative AI's applicability in medicine. For a concise checklist, please refer to Table S or visit GenAIMed.org. |
format | Article |
id | doaj-art-d662ca60c5b34c5db1a774cdbef12048 |
institution | Kabale University |
issn | 2996-9506 2996-9514 |
language | English |
publishDate | 2024-09-01 |
publisher | Wiley |
record_format | Article |
series | iMetaOmics |
spelling | doaj-art-d662ca60c5b34c5db1a774cdbef120482025-01-31T16:15:20ZengWileyiMetaOmics2996-95062996-95142024-09-0111n/an/a10.1002/imo2.7STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliabilityJinghong Chen0Lingxuan Zhu1Weiming Mou2Anqi Lin3Dongqiang Zeng4Chang Qi5Zaoqu Liu6Aimin Jiang7Bufu Tang8Wenjie Shi9Ulf D. Kahlert10Jianguo Zhou11Shipeng Guo12Xiaofan Lu13Xu Sun14Trunghieu Ngo15Zhongji Pu16Baolei Jia17Che Ok Jeon18Yongbin He19Haiyang Wu20Shuqin Gu21Wisit Cheungpasitporn22Haojie Huang23Weipu Mao24Shixiang Wang25Xin Chen26Loïc Cabannes27Gerald Sng Gui Ren28Iain S. Whitaker29Stephen Ali30Quan Cheng31Kai Miao32Shuofeng Yuan33Peng Luo34Department of Oncology, Zhujiang Hospital Southern Medical University Guangzhou ChinaDepartment of Oncology, Zhujiang Hospital Southern Medical University Guangzhou ChinaDepartment of Oncology, Zhujiang Hospital Southern Medical University Guangzhou ChinaDepartment of Oncology, Zhujiang Hospital Southern Medical University Guangzhou ChinaDepartment of Oncology, Nanfang Hospital Southern Medical University Guangzhou ChinaInstitute of Logic and Computation, TU Wien Wien AustriaInstitute of Basic Medical Sciences Chinese Academy of Medical Sciences and Peking Union Medical College Beijing ChinaDepartment of Urology, Changhai Hospital Naval Medical University (Second Military Medical University) Shanghai ChinaDepartment of Radiation Oncology, Zhongshan Hospital Fudan University Shanghai ChinaMolecular and Experimental Surgery, University Clinic for General‐, Visceral‐, Vascular‐ and Trans‐Plantation Surgery, Medical Faculty University Hospital Magdeburg Otto‐von Guericke University Magdeburg GermanyMolecular and Experimental Surgery, University Clinic for General‐, Visceral‐, Vascular‐ and Trans‐Plantation Surgery, Medical Faculty University Hospital Magdeburg Otto‐von Guericke University Magdeburg GermanyDepartment of Oncology The Second Affiliated Hospital of Zunyi Medical University Zunyi ChinaGZDLab Chongqing ChinaDepartment of Cancer and Functional Genomics, Institute of Genetics and Molecular and Cellular Biology CNRS/INSERM/UNISTRA Illkirch FranceLinguistique Informatique, UFR‐Linguistique Université Paris Cité Paris FranceLinguistique Informatique, UFR‐Linguistique Université Paris Cité Paris FranceXianghu Laboratory Hangzhou ChinaXianghu Laboratory Hangzhou ChinaDepartment of Life Science Chung‐Ang University Seoul KoreaSchool of Sport Medicine and Rehabilitation Beijing Sport University Beijing ChinaDepartment of Graduate School Tianjin Medical University Tianjin ChinaDuke Human Vaccine Institute Duke University Medical Center Durham North Carolina USADepartment of Medicine Mayo Clinic Rochester New York USADepartment of Biochemistry and Molecular Biology Mayo Clinic College of Medicine and Science Rochester New York USADepartment of Urology Zhongda Hospital Southeast University Nanjing ChinaBioinformatics Platform, Department of Experimental Research, State Key Laboratory of Oncology in South China, Guangdong Key Laboratory of Nasopharyngeal Carcinoma Diagnosis and Therapy, Guangdong Provincial Clinical Research Center for Cancer Sun Yat‐sen University Cancer Center Guangzhou ChinaDepartment of Pulmonary and Critical Care Medicine, Zhujiang Hospital Southern Medical University Guangzhou ChinaLinguistique Informatique, UFR‐Linguistique Université Paris Cité Paris FranceDepartment of Endocrinology Singapore General Hospital Singapore SingaporeReconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences Swansea University Medical School Swansea UKReconstructive Surgery and Regenerative Medicine Research Centre, Institute of Life Sciences Swansea University Medical School Swansea UKDepartment of Neurosurgery, Xiangya Hospital Central South University Changsha ChinaCancer Centre and Institute of Translational Medicine, Faculty of Health Sciences University of Macau Macau ChinaDepartment of Infectious Disease and Microbiology The University of Hong Kong‐Shenzhen Hospital Shenzhen ChinaDepartment of Oncology, Zhujiang Hospital Southern Medical University Guangzhou ChinaAbstract Generative artificial intelligence (AI) holds immense potential for medical applications, but the lack of a comprehensive evaluation framework and methodological deficiencies in existing studies hinder its effective implementation. Standardized assessment guidelines are crucial for ensuring reliable and consistent evaluation of generative AI in healthcare. Our objective is to develop robust, standardized guidelines tailored for evaluating generative AI performance in medical contexts. Through a rigorous literature review utilizing the Web of Sciences, Cochrane Library, PubMed, and Google Scholar, we focused on research testing generative AI capabilities in medicine. Our multidisciplinary team of experts conducted discussion sessions to develop a comprehensive 32‐item checklist. This checklist encompasses critical evaluation aspects of generative AI in medical applications, addressing key dimensions such as question collection, querying methodologies, and assessment techniques. The checklist and its broader assessment framework provide a holistic evaluation of AI systems, delineating a clear pathway from question gathering to result assessment. It guides researchers through potential challenges and pitfalls, enhancing research quality and reporting and aiding the evolution of generative AI in medicine and life sciences. Our framework furnishes a standardized, systematic approach for testing generative AI's applicability in medicine. For a concise checklist, please refer to Table S or visit GenAIMed.org.https://doi.org/10.1002/imo2.7generative AImedical and life science contextsreliabilitystandardized assessment guidelines |
spellingShingle | Jinghong Chen Lingxuan Zhu Weiming Mou Anqi Lin Dongqiang Zeng Chang Qi Zaoqu Liu Aimin Jiang Bufu Tang Wenjie Shi Ulf D. Kahlert Jianguo Zhou Shipeng Guo Xiaofan Lu Xu Sun Trunghieu Ngo Zhongji Pu Baolei Jia Che Ok Jeon Yongbin He Haiyang Wu Shuqin Gu Wisit Cheungpasitporn Haojie Huang Weipu Mao Shixiang Wang Xin Chen Loïc Cabannes Gerald Sng Gui Ren Iain S. Whitaker Stephen Ali Quan Cheng Kai Miao Shuofeng Yuan Peng Luo STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability iMetaOmics generative AI medical and life science contexts reliability standardized assessment guidelines |
title | STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
title_full | STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
title_fullStr | STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
title_full_unstemmed | STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
title_short | STAGER checklist: Standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
title_sort | stager checklist standardized testing and assessment guidelines for evaluating generative artificial intelligence reliability |
topic | generative AI medical and life science contexts reliability standardized assessment guidelines |
url | https://doi.org/10.1002/imo2.7 |
work_keys_str_mv | AT jinghongchen stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT lingxuanzhu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT weimingmou stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT anqilin stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT dongqiangzeng stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT changqi stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT zaoquliu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT aiminjiang stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT bufutang stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT wenjieshi stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT ulfdkahlert stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT jianguozhou stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT shipengguo stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT xiaofanlu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT xusun stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT trunghieungo stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT zhongjipu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT baoleijia stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT cheokjeon stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT yongbinhe stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT haiyangwu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT shuqingu stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT wisitcheungpasitporn stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT haojiehuang stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT weipumao stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT shixiangwang stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT xinchen stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT loiccabannes stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT geraldsngguiren stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT iainswhitaker stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT stephenali stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT quancheng stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT kaimiao stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT shuofengyuan stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability AT pengluo stagercheckliststandardizedtestingandassessmentguidelinesforevaluatinggenerativeartificialintelligencereliability |