A text-speech multimodal Chinese named entity recognition model for crop diseases and pests

Abstract Named Entity Recognition for crop diseases and pests (NER-CDP) is significant in agricultural information extraction and offers vital data support for subsequent knowledge services and retrieval. However, existing NER-CDP methods rely heavily on plain text or external features such as radic...

Full description

Saved in:
Bibliographic Details
Main Authors: Ruilin Liu, Xuchao Guo, HongMei Zhu, Lu Wang
Format: Article
Language:English
Published: Nature Portfolio 2025-02-01
Series:Scientific Reports
Subjects:
Online Access:https://doi.org/10.1038/s41598-025-88874-9
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1850067312015572992
author Ruilin Liu
Xuchao Guo
HongMei Zhu
Lu Wang
author_facet Ruilin Liu
Xuchao Guo
HongMei Zhu
Lu Wang
author_sort Ruilin Liu
collection DOAJ
description Abstract Named Entity Recognition for crop diseases and pests (NER-CDP) is significant in agricultural information extraction and offers vital data support for subsequent knowledge services and retrieval. However, existing NER-CDP methods rely heavily on plain text or external features such as radicals and font types and have limited effect on improving word segmentation. In this paper, we propose a multimodal named entity recognition model (CDP-MCNER) based on cross-modal attention to solve the issue of the performance degradation of the NER model caused by potential word segmentation errors. We introduce audio modality information into the field of NER-CDP for the first time and use the pauses in audio sentences to assist Chinese word segmentation. The CDP-MCNER model adopts cross-modal attention as the main architecture to fully integrate the textual and acoustic modalities. Then some data augmentation techniques, such as introducing disturbances in the text encoder, and frequency domain enhancement in the acoustic encoder are used to enhance the diversity of multimodal inputs. To improve the accuracy of the prediction label, the Masked CTC (Connectionist Temporal Classification) Loss is used to further align the multimodal semantic representation. In the experiment studies, we compare with classical text-only models, lexicon-enhanced models, and multimodal models, our model achieves the optimal precision, recall, and F1 score of 91.32%, 93.05%, and 92.18%, respectively. Furthermore, the optimal F1 scores of our method are 81.05% and 79.23% based on the public domain datasets, CNERTA and Ai-SHELL. The experimental results show the effectiveness and generalization of the CDP-MCNER model in the task of NER-CDP.
format Article
id doaj-art-0ef5bf2da0b946e8a55896f995db8db5
institution DOAJ
issn 2045-2322
language English
publishDate 2025-02-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj-art-0ef5bf2da0b946e8a55896f995db8db52025-08-20T02:48:22ZengNature PortfolioScientific Reports2045-23222025-02-0115111310.1038/s41598-025-88874-9A text-speech multimodal Chinese named entity recognition model for crop diseases and pestsRuilin Liu0Xuchao Guo1HongMei Zhu2Lu Wang3School of Information Management, Nanjing Agricultural UniversitySchool of Information Science and Engineering, Shandong Agricultural UniversitySchool of Information Science and Engineering, Shandong Agricultural UniversitySchool of Artificial Intelligence, Shandong Women’s UniversityAbstract Named Entity Recognition for crop diseases and pests (NER-CDP) is significant in agricultural information extraction and offers vital data support for subsequent knowledge services and retrieval. However, existing NER-CDP methods rely heavily on plain text or external features such as radicals and font types and have limited effect on improving word segmentation. In this paper, we propose a multimodal named entity recognition model (CDP-MCNER) based on cross-modal attention to solve the issue of the performance degradation of the NER model caused by potential word segmentation errors. We introduce audio modality information into the field of NER-CDP for the first time and use the pauses in audio sentences to assist Chinese word segmentation. The CDP-MCNER model adopts cross-modal attention as the main architecture to fully integrate the textual and acoustic modalities. Then some data augmentation techniques, such as introducing disturbances in the text encoder, and frequency domain enhancement in the acoustic encoder are used to enhance the diversity of multimodal inputs. To improve the accuracy of the prediction label, the Masked CTC (Connectionist Temporal Classification) Loss is used to further align the multimodal semantic representation. In the experiment studies, we compare with classical text-only models, lexicon-enhanced models, and multimodal models, our model achieves the optimal precision, recall, and F1 score of 91.32%, 93.05%, and 92.18%, respectively. Furthermore, the optimal F1 scores of our method are 81.05% and 79.23% based on the public domain datasets, CNERTA and Ai-SHELL. The experimental results show the effectiveness and generalization of the CDP-MCNER model in the task of NER-CDP.https://doi.org/10.1038/s41598-025-88874-9Crop diseases and pestsMultimodal named entity recognitionData augmentationCross-modal attentionMasked CTC Loss
spellingShingle Ruilin Liu
Xuchao Guo
HongMei Zhu
Lu Wang
A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
Scientific Reports
Crop diseases and pests
Multimodal named entity recognition
Data augmentation
Cross-modal attention
Masked CTC Loss
title A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
title_full A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
title_fullStr A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
title_full_unstemmed A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
title_short A text-speech multimodal Chinese named entity recognition model for crop diseases and pests
title_sort text speech multimodal chinese named entity recognition model for crop diseases and pests
topic Crop diseases and pests
Multimodal named entity recognition
Data augmentation
Cross-modal attention
Masked CTC Loss
url https://doi.org/10.1038/s41598-025-88874-9
work_keys_str_mv AT ruilinliu atextspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT xuchaoguo atextspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT hongmeizhu atextspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT luwang atextspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT ruilinliu textspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT xuchaoguo textspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT hongmeizhu textspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests
AT luwang textspeechmultimodalchinesenamedentityrecognitionmodelforcropdiseasesandpests