A phenotype-based AI pipeline outperforms human experts in differentially diagnosing rare diseases using EHRs

Abstract Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated a...

Full description

Saved in:
Bibliographic Details
Main Authors: Xiaohao Mao, Yu Huang, Ye Jin, Lun Wang, Xuanzhong Chen, Honghong Liu, Xinglin Yang, Haopeng Xu, Xiaodong Luan, Ying Xiao, Siqin Feng, Jiahao Zhu, Xuegong Zhang, Rui Jiang, Shuyang Zhang, Ting Chen
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:npj Digital Medicine
Online Access:https://doi.org/10.1038/s41746-025-01452-1
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Rare diseases, affecting ~350 million people worldwide, pose significant challenges in clinical diagnosis due to the lack of experienced physicians and the complexity of differentiating between numerous rare diseases. To address these challenges, we introduce PhenoBrain, a fully automated artificial intelligence pipeline. PhenoBrain utilizes a BERT-based natural language processing model to extract phenotypes from clinical texts in EHRs and employs five new diagnostic models for differential diagnoses of rare diseases. The AI system was developed and evaluated on diverse, multi-country rare disease datasets, comprising 2271 cases with 431 rare diseases. In 1936 test cases, PhenoBrain achieved an average predicted top-3 recall of 0.513 and a top-10 recall of 0.654, surpassing 13 leading prediction methods. In a human-computer study with 75 cases, PhenoBrain exhibited exceptional performance with a top-3 recall of 0.613 and a top-10 recall of 0.813, surpassing the performance of 50 specialist physicians and large language models like ChatGPT and GPT-4. Combining PhenoBrain’s predictions with specialists increased the top-3 recall to 0.768, demonstrating its potential to enhance diagnostic accuracy in clinical workflows.
ISSN:2398-6352