Specialized Large Language Model for Standardization of Locomotive Maintenance Data

Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hind...

Full description

Saved in:
Bibliographic Details
Main Authors: CHEN Ao, LI Chen, YAN Jiayun, PENG Liantie, TIAN Ye, LIU Leixinyuan
Format: Article
Language:zho
Published: Editorial Office of Control and Information Technology 2024-06-01
Series:Kongzhi Yu Xinxi Jishu
Subjects:
Online Access:http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849224921860800512
author CHEN Ao
LI Chen
YAN Jiayun
PENG Liantie
TIAN Ye
LIU Leixinyuan
author_facet CHEN Ao
LI Chen
YAN Jiayun
PENG Liantie
TIAN Ye
LIU Leixinyuan
author_sort CHEN Ao
collection DOAJ
description Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.
format Article
id doaj-art-005a4c4fd77d4f8693baf1d55702912a
institution Kabale University
issn 2096-5427
language zho
publishDate 2024-06-01
publisher Editorial Office of Control and Information Technology
record_format Article
series Kongzhi Yu Xinxi Jishu
spelling doaj-art-005a4c4fd77d4f8693baf1d55702912a2025-08-25T06:48:04ZzhoEditorial Office of Control and Information TechnologyKongzhi Yu Xinxi Jishu2096-54272024-06-0146727959791311Specialized Large Language Model for Standardization of Locomotive Maintenance DataCHEN AoLI ChenYAN JiayunPENG LiantieTIAN YeLIU LeixinyuanStandardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200locomotive overhaul dataRCM (reliability centered maintenance)large language modeldata standardizationdata preprocessinginformation extraction
spellingShingle CHEN Ao
LI Chen
YAN Jiayun
PENG Liantie
TIAN Ye
LIU Leixinyuan
Specialized Large Language Model for Standardization of Locomotive Maintenance Data
Kongzhi Yu Xinxi Jishu
locomotive overhaul data
RCM (reliability centered maintenance)
large language model
data standardization
data preprocessing
information extraction
title Specialized Large Language Model for Standardization of Locomotive Maintenance Data
title_full Specialized Large Language Model for Standardization of Locomotive Maintenance Data
title_fullStr Specialized Large Language Model for Standardization of Locomotive Maintenance Data
title_full_unstemmed Specialized Large Language Model for Standardization of Locomotive Maintenance Data
title_short Specialized Large Language Model for Standardization of Locomotive Maintenance Data
title_sort specialized large language model for standardization of locomotive maintenance data
topic locomotive overhaul data
RCM (reliability centered maintenance)
large language model
data standardization
data preprocessing
information extraction
url http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200
work_keys_str_mv AT chenao specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata
AT lichen specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata
AT yanjiayun specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata
AT pengliantie specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata
AT tianye specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata
AT liuleixinyuan specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata