A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case
Abstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicl...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Nature Portfolio
2025-01-01
|
Series: | Scientific Data |
Online Access: | https://doi.org/10.1038/s41597-025-04503-0 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
_version_ | 1832572016204447744 |
---|---|
author | Ana González-Moreno Alberto Ramos-González Israel González-Carrasco M. Dolores Alonso Díaz de Durana Beatriz Sellers Gutiérrez-Argumosa Alicia Moncada Salinero Ana Belén Pastor-Magro Beatriz González-Piñeiro Miguel A. Tejedor-Alonso Paloma Martínez |
author_facet | Ana González-Moreno Alberto Ramos-González Israel González-Carrasco M. Dolores Alonso Díaz de Durana Beatriz Sellers Gutiérrez-Argumosa Alicia Moncada Salinero Ana Belén Pastor-Magro Beatriz González-Piñeiro Miguel A. Tejedor-Alonso Paloma Martínez |
author_sort | Ana González-Moreno |
collection | DOAJ |
description | Abstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies. To evaluate inter-annotator agreement, a triple annotation was performed on 8% of the texts. The guidelines followed to create the corpus are also provided. To determine the validation of the corpus and introduce a real use case, we performed some experiments using this resource in the context of a supervised named entity recognition (NER) task by fine-tuning encoder-based transformers. In these experiments, an average F-measure of 86.2% was achieved. These results indicate that the corpus used is suitable for training and testing approaches to NER related to the field of allergology. |
format | Article |
id | doaj-art-8f052481bce94a0eb742ee3138156b2e |
institution | Kabale University |
issn | 2052-4463 |
language | English |
publishDate | 2025-01-01 |
publisher | Nature Portfolio |
record_format | Article |
series | Scientific Data |
spelling | doaj-art-8f052481bce94a0eb742ee3138156b2e2025-02-02T12:08:13ZengNature PortfolioScientific Data2052-44632025-01-0112111310.1038/s41597-025-04503-0A clinical narrative corpus on nut allergy: annotation schema, guidelines and use caseAna González-Moreno0Alberto Ramos-González1Israel González-Carrasco2M. Dolores Alonso Díaz de Durana3Beatriz Sellers Gutiérrez-Argumosa4Alicia Moncada Salinero5Ana Belén Pastor-Magro6Beatriz González-Piñeiro7Miguel A. Tejedor-Alonso8Paloma Martínez9Allergy Unit, Hospital Universitario Fundación AlcorcónComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadAllergy Unit, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónInformation Systems and Technologies Department, Hospital Universitario Fundación AlcorcónInformation Systems and Technologies Department, Hospital Universitario Fundación AlcorcónAllergy Unit, Hospital Universitario Fundación AlcorcónComputer Science and Engineering Department, Universidad Carlos III de Madrid, Av. UniversidadAbstract This article describes a dataset on nut allergy extracted from Spanish clinical records provided by the Hospital Universitario Fundación de Alcorcón (HUFA) in Madrid, Spain, in collaboration with its Allergology Unit and Information Systems and Technologies Department. There are few publicly available clinical texts in Spanish and having more is essential as a valuable resource to train and test information extraction systems. In total, 828 clinical notes in Spanish were employed and several experts participated in the annotation process by categorizing the annotated entities into medical semantic groups related to allergies. To evaluate inter-annotator agreement, a triple annotation was performed on 8% of the texts. The guidelines followed to create the corpus are also provided. To determine the validation of the corpus and introduce a real use case, we performed some experiments using this resource in the context of a supervised named entity recognition (NER) task by fine-tuning encoder-based transformers. In these experiments, an average F-measure of 86.2% was achieved. These results indicate that the corpus used is suitable for training and testing approaches to NER related to the field of allergology.https://doi.org/10.1038/s41597-025-04503-0 |
spellingShingle | Ana González-Moreno Alberto Ramos-González Israel González-Carrasco M. Dolores Alonso Díaz de Durana Beatriz Sellers Gutiérrez-Argumosa Alicia Moncada Salinero Ana Belén Pastor-Magro Beatriz González-Piñeiro Miguel A. Tejedor-Alonso Paloma Martínez A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case Scientific Data |
title | A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case |
title_full | A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case |
title_fullStr | A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case |
title_full_unstemmed | A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case |
title_short | A clinical narrative corpus on nut allergy: annotation schema, guidelines and use case |
title_sort | clinical narrative corpus on nut allergy annotation schema guidelines and use case |
url | https://doi.org/10.1038/s41597-025-04503-0 |
work_keys_str_mv | AT anagonzalezmoreno aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT albertoramosgonzalez aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT israelgonzalezcarrasco aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT mdoloresalonsodiazdedurana aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT beatrizsellersgutierrezargumosa aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT aliciamoncadasalinero aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT anabelenpastormagro aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT beatrizgonzalezpineiro aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT miguelatejedoralonso aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT palomamartinez aclinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT anagonzalezmoreno clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT albertoramosgonzalez clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT israelgonzalezcarrasco clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT mdoloresalonsodiazdedurana clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT beatrizsellersgutierrezargumosa clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT aliciamoncadasalinero clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT anabelenpastormagro clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT beatrizgonzalezpineiro clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT miguelatejedoralonso clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase AT palomamartinez clinicalnarrativecorpusonnutallergyannotationschemaguidelinesandusecase |