A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case

Abstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicl...

Full description

Saved in:
Bibliographic Details
Main Authors: Ana González-Moreno, Alberto Ramos-González, Israel González-Carrasco, M. Dolores Alonso Díaz de Durana, Beatriz Sellers Gutiérrez-Argumosa, Alicia Moncada Salinero, Ana Belén Pastor-Magro, Beatriz González-Piñeiro, Miguel A. Tejedor-Alonso, Paloma Martínez
Format: Article
Language:English
Published: Nature Portfolio 2025-01-01
Series:Scientific Data
Online Access:https://doi.org/10.1038/s41597-025-04503-0
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832572016204447744
author Ana González-Moreno
Alberto Ramos-González
Israel González-Carrasco
M. Dolores Alonso Díaz de Durana
Beatriz Sellers Gutiérrez-Argumosa
Alicia Moncada Salinero
Ana Belén Pastor-Magro
Beatriz González-Piñeiro
Miguel A. Tejedor-Alonso
Paloma Martínez
author_facet Ana González-Moreno
Alberto Ramos-González
Israel González-Carrasco
M. Dolores Alonso Díaz de Durana
Beatriz Sellers Gutiérrez-Argumosa
Alicia Moncada Salinero
Ana Belén Pastor-Magro
Beatriz González-Piñeiro
Miguel A. Tejedor-Alonso
Paloma Martínez
author_sort Ana González-Moreno
collection DOAJ
description Abstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies. To evaluate inter-annotator agreement, a triple annotation was performed on 8% of the texts. The guidelines followed to create the corpus are also provided. To determine the validation of the corpus and introduce a real use case, we performed some experiments using this resource in the context of a supervised named entity recognition (NER) task by fine-tuning encoder-based transformers. In these experiments, an average F-measure of 86.2% was achieved. These results indicate that the corpus used is suitable for training and testing approaches to NER related to the field of allergology.
format Article
id doaj-art-8f052481bce94a0eb742ee3138156b2e
institution Kabale University
issn 2052-4463
language English
publishDate 2025-01-01
publisher Nature Portfolio
record_format Article
series Scientific Data
spelling doaj-art-8f052481bce94a0eb742ee3138156b2e2025-02-02T12:08:13ZengNature PortfolioScientific Data2052-44632025-01-0112111310.1038/s41597-025-04503-0A clinical narrative corpus on nut allergy: annotation schema, guidelines and use caseAna González-Moreno0Alberto Ramos-González1Israel González-Carrasco2M. Dolores Alonso Díaz de Durana3Beatriz Sellers Gutiérrez-Argumosa4Alicia Moncada Salinero5Ana Belén Pastor-Magro6Beatriz González-Piñeiro7Miguel A. Tejedor-Alonso8Paloma Martínez9Allergy Unit, Hospital Universitario Fundación AlcorcónComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadAllergy Unit, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónInformation Systems and Technologies Department, Hospital Universitario Fundación AlcorcónInformation Systems and Technologies Department, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadAbstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies. To evaluate inter-annotator agreement, a triple annotation was performed on 8% of the texts. The guidelines followed to create the corpus are also provided. To determine the validation of the corpus and introduce a real use case, we performed some experiments using this resource in the context of a supervised named entity recognition (NER) task by fine-tuning encoder-based transformers. In these experiments, an average F-measure of 86.2% was achieved. These results indicate that the corpus used is suitable for training and testing approaches to NER related to the field of allergology.https://doi.org/10.1038/s41597-025-04503-0
spellingShingle Ana González-Moreno
Alberto Ramos-González
Israel González-Carrasco
M. Dolores Alonso Díaz de Durana
Beatriz Sellers Gutiérrez-Argumosa
Alicia Moncada Salinero
Ana Belén Pastor-Magro
Beatriz González-Piñeiro
Miguel A. Tejedor-Alonso
Paloma Martínez
A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
Scientific Data
title A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
title_full A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
title_fullStr A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
title_full_unstemmed A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
title_short A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
title_sort clinical narrative corpus on nut allergy annotation schema guidelines and use case
url https://doi.org/10.1038/s41597-025-04503-0
work_keys_str_mv AT anagonzalezmoreno aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT albertoramosgonzalez aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT israelgonzalezcarrasco aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT mdoloresalonsodiazdedurana aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT beatrizsellersgutierrezargumosa aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT aliciamoncadasalinero aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT anabelenpastormagro aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT beatrizgonzalezpineiro aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT miguelatejedoralonso aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT palomamartinez aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT anagonzalezmoreno clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT albertoramosgonzalez clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT israelgonzalezcarrasco clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT mdoloresalonsodiazdedurana clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT beatrizsellersgutierrezargumosa clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT aliciamoncadasalinero clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT anabelenpastormagro clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT beatrizgonzalezpineiro clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT miguelatejedoralonso clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase
AT palomamartinez clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase