Towards evaluating and building versatile large language models for medicine
Abstract In this study, we present MedS-Bench, a comprehensive benchmark to evaluate large language models (LLMs) in clinical contexts, MedS-Bench, spanning 11 high-level clinical tasks. We evaluate nine leading LLMs, e.g., MEDITRON, Llama 3, Mistral, GPT-4, Claude-3.5, etc. and found that most mode...
Saved in:
Main Authors: | Chaoyi Wu, Pengcheng Qiu, Jinxin Liu, Hongfei Gu, Na Li, Ya Zhang, Yanfeng Wang, Weidi Xie |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | npj Digital Medicine |
Online Access: | https://doi.org/10.1038/s41746-024-01390-4 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Similar Items
-
Toward cultural interpretability: A linguistic anthropological framework for describing and evaluating large language models
by: Graham M Jones, et al.
Published: (2025-03-01) -
Application, Challenges, and Prospects of Large Language Model in the Field of Traditional Chinese Medicine
by: CHEN Zijia, et al.
Published: (2024-08-01) -
Toward the Development of Large-Scale Word Embedding for Low-Resourced Language
by: Shahzad Nazir, et al.
Published: (2022-01-01) -
Software-Defined Radio FPGA Cores: Building towards a Domain-Specific Language
by: Lekhobola Tsoeunyane, et al.
Published: (2017-01-01) -
Realization of DVCCTA Based Versatile Modulator
by: Neeta Pandey, et al.
Published: (2014-01-01)