Voice EHR: introducing multimodal audio data for health
IntroductionArtificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive reco...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2025-01-01
|
Series: | Frontiers in Digital Health |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fdgth.2024.1448351/full |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | IntroductionArtificial intelligence (AI) models trained on audio data may have the potential to rapidly perform clinical tasks, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets collected with expensive recording equipment in high-income countries, which challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact on health equity.MethodsThis report introduces a novel protocol for audio data collection and a corresponding application that captures health information through guided questions.ResultsTo demonstrate the potential of Voice EHR as a biomarker of health, initial experiments on data quality and multiple case studies are presented in this report. Large language models (LLMs) were used to compare transcribed Voice EHR data with data (from the same patients) collected through conventional techniques like multiple choice questions. Information contained in the Voice EHR samples was consistently rated as equally or more relevant to a health evaluation.DiscussionThe HEAR application facilitates the collection of an audio electronic health record (“Voice EHR”) that may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and spoken language with semantic meaning and longitudinal context–potentially compensating for the typical limitations of unimodal clinical datasets. |
---|---|
ISSN: | 2673-253X |