Specialized Large Language Model for Standardization of Locomotive Maintenance Data
Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hind...
Saved in:
| Main Authors: | , , , , , |
|---|---|
| Format: | Article |
| Language: | zho |
| Published: |
Editorial Office of Control and Information Technology
2024-06-01
|
| Series: | Kongzhi Yu Xinxi Jishu |
| Subjects: | |
| Online Access: | http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| _version_ | 1849224921860800512 |
|---|---|
| author | CHEN Ao LI Chen YAN Jiayun PENG Liantie TIAN Ye LIU Leixinyuan |
| author_facet | CHEN Ao LI Chen YAN Jiayun PENG Liantie TIAN Ye LIU Leixinyuan |
| author_sort | CHEN Ao |
| collection | DOAJ |
| description | Standardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM. |
| format | Article |
| id | doaj-art-005a4c4fd77d4f8693baf1d55702912a |
| institution | Kabale University |
| issn | 2096-5427 |
| language | zho |
| publishDate | 2024-06-01 |
| publisher | Editorial Office of Control and Information Technology |
| record_format | Article |
| series | Kongzhi Yu Xinxi Jishu |
| spelling | doaj-art-005a4c4fd77d4f8693baf1d55702912a2025-08-25T06:48:04ZzhoEditorial Office of Control and Information TechnologyKongzhi Yu Xinxi Jishu2096-54272024-06-0146727959791311Specialized Large Language Model for Standardization of Locomotive Maintenance DataCHEN AoLI ChenYAN JiayunPENG LiantieTIAN YeLIU LeixinyuanStandardization is one of the key steps to analyze locomotive overhaul data with a focus on reliability-centered maintenance (RCM). However, traditional manual methods encounter challenges such as small sample sizes, non-standardized data formats, analytical complexities, and high labour costs, hindering the achievement of data standardization. Large language models (LLM), featuring powerful performance in natural language processing comprehension and handling complex tasks, have made great academic and industrial progress in recent years. This study initially investigated the application performance of LLMs in information extraction from locomotive overhaul data, with the following three reveals, as the universal information extraction (UIE) LLM is suitable for information extraction in the field of locomotive overhaul; expanding the size of locomotive data helps improve the UIE performance in information extraction from locomotive overhaul data; balancing the types of fault labels does not notably help improve this performance. Subsequent explorations concentrated on difficulties in data annotation. The script writing method was utilized for automated annotation of data, and ChatGLM was leveraged to standardize locomotive overhaul data, yielding Bleu-4, Rouge-1, Rouge-2, and Rouge-L metrics of 86.87%, 89.60%, 87.54%, and 94.26%, respectively, in alignment with the requirements of engineering applications. Further developments introduced an auxiliary data standardization pre-processing tool to streamline the standardization process by encapsulating the LLM.http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200locomotive overhaul dataRCM (reliability centered maintenance)large language modeldata standardizationdata preprocessinginformation extraction |
| spellingShingle | CHEN Ao LI Chen YAN Jiayun PENG Liantie TIAN Ye LIU Leixinyuan Specialized Large Language Model for Standardization of Locomotive Maintenance Data Kongzhi Yu Xinxi Jishu locomotive overhaul data RCM (reliability centered maintenance) large language model data standardization data preprocessing information extraction |
| title | Specialized Large Language Model for Standardization of Locomotive Maintenance Data |
| title_full | Specialized Large Language Model for Standardization of Locomotive Maintenance Data |
| title_fullStr | Specialized Large Language Model for Standardization of Locomotive Maintenance Data |
| title_full_unstemmed | Specialized Large Language Model for Standardization of Locomotive Maintenance Data |
| title_short | Specialized Large Language Model for Standardization of Locomotive Maintenance Data |
| title_sort | specialized large language model for standardization of locomotive maintenance data |
| topic | locomotive overhaul data RCM (reliability centered maintenance) large language model data standardization data preprocessing information extraction |
| url | http://ctet.csrzic.com/thesisDetails#10.13889/j.issn.2096-5427.2024.03.200 |
| work_keys_str_mv | AT chenao specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata AT lichen specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata AT yanjiayun specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata AT pengliantie specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata AT tianye specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata AT liuleixinyuan specializedlargelanguagemodelforstandardizationoflocomotivemaintenancedata |