cLegal-QA: a Chinese legal question answering with natural language generation methods

Abstract Legal question answering (Legal QA) aims to provide accurate and timely answers to legal questions, significantly reducing the workload of legal professionals. This approach improves the efficiency of the judiciary and ensures prompt, professional legal assistance to the public. Currently,...

Full description

Saved in:
Bibliographic Details
Main Authors: Yizhen Wang, Xueying Shen, Zixian Huang, Lihui Niu, Shiyan Ou
Format: Article
Language:English
Published: Springer 2024-12-01
Series:Complex & Intelligent Systems
Subjects:
Online Access:https://doi.org/10.1007/s40747-024-01675-x
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1832571191560241152
author Yizhen Wang
Xueying Shen
Zixian Huang
Lihui Niu
Shiyan Ou
author_facet Yizhen Wang
Xueying Shen
Zixian Huang
Lihui Niu
Shiyan Ou
author_sort Yizhen Wang
collection DOAJ
description Abstract Legal question answering (Legal QA) aims to provide accurate and timely answers to legal questions, significantly reducing the workload of legal professionals. This approach improves the efficiency of the judiciary and ensures prompt, professional legal assistance to the public. Currently, a major challenge is the absence of a large-scale dataset tailored for Chinese generative legal question answering. To address this, our study developed a comprehensive automatic question answering dataset for Chinese civil law, named cLegal-QA, which comprises 14,000 high-frequency questions from Chinese legal communities. This dataset spans various legal disputes and includes questions, disputes, scenarios, multiple lawyer responses, and gold-standard answers from human annotators. Additionally, we employed a generative QA model specifically designed for the cLegal-QA dataset. The results indicate that fully-supervised models, notably UniLM, T5, and BART, substantially outperform zero-shot models on this dataset, with ChatYuan being the most effective among the zero-shot models. Our analysis also reveals that answers labeled with 60–80% accuracy yield the highest efficiency. Furthermore, we evaluated the real-world performance of these models with expert validation and applied transfer learning to new civil disputes. While the QA models demonstrate commendable performance on the dataset, there is still potential for further improvement.
format Article
id doaj-art-b35c9abd940945d1bdaf68d1eb23c5f2
institution Kabale University
issn 2199-4536
2198-6053
language English
publishDate 2024-12-01
publisher Springer
record_format Article
series Complex & Intelligent Systems
spelling doaj-art-b35c9abd940945d1bdaf68d1eb23c5f22025-02-02T12:49:00ZengSpringerComplex & Intelligent Systems2199-45362198-60532024-12-0111112110.1007/s40747-024-01675-xcLegal-QA: a Chinese legal question answering with natural language generation methodsYizhen Wang0Xueying Shen1Zixian Huang2Lihui Niu3Shiyan Ou4School of Information Management, Nanjing UniversitySchool of Information Management, Nanjing UniversityState Key Laboratory for Novel Software Technology, Nanjing UniversitySchool of Information Management, Wuhan UniversitySchool of Information Management, Nanjing UniversityAbstract Legal question answering (Legal QA) aims to provide accurate and timely answers to legal questions, significantly reducing the workload of legal professionals. This approach improves the efficiency of the judiciary and ensures prompt, professional legal assistance to the public. Currently, a major challenge is the absence of a large-scale dataset tailored for Chinese generative legal question answering. To address this, our study developed a comprehensive automatic question answering dataset for Chinese civil law, named cLegal-QA, which comprises 14,000 high-frequency questions from Chinese legal communities. This dataset spans various legal disputes and includes questions, disputes, scenarios, multiple lawyer responses, and gold-standard answers from human annotators. Additionally, we employed a generative QA model specifically designed for the cLegal-QA dataset. The results indicate that fully-supervised models, notably UniLM, T5, and BART, substantially outperform zero-shot models on this dataset, with ChatYuan being the most effective among the zero-shot models. Our analysis also reveals that answers labeled with 60–80% accuracy yield the highest efficiency. Furthermore, we evaluated the real-world performance of these models with expert validation and applied transfer learning to new civil disputes. While the QA models demonstrate commendable performance on the dataset, there is still potential for further improvement.https://doi.org/10.1007/s40747-024-01675-xLegal question answeringChinese civil lawAnswering generationNatural language generation
spellingShingle Yizhen Wang
Xueying Shen
Zixian Huang
Lihui Niu
Shiyan Ou
cLegal-QA: a Chinese legal question answering with natural language generation methods
Complex & Intelligent Systems
Legal question answering
Chinese civil law
Answering generation
Natural language generation
title cLegal-QA: a Chinese legal question answering with natural language generation methods
title_full cLegal-QA: a Chinese legal question answering with natural language generation methods
title_fullStr cLegal-QA: a Chinese legal question answering with natural language generation methods
title_full_unstemmed cLegal-QA: a Chinese legal question answering with natural language generation methods
title_short cLegal-QA: a Chinese legal question answering with natural language generation methods
title_sort clegal qa a chinese legal question answering with natural language generation methods
topic Legal question answering
Chinese civil law
Answering generation
Natural language generation
url https://doi.org/10.1007/s40747-024-01675-x
work_keys_str_mv AT yizhenwang clegalqaachineselegalquestionansweringwithnaturallanguagegenerationmethods
AT xueyingshen clegalqaachineselegalquestionansweringwithnaturallanguagegenerationmethods
AT zixianhuang clegalqaachineselegalquestionansweringwithnaturallanguagegenerationmethods
AT lihuiniu clegalqaachineselegalquestionansweringwithnaturallanguagegenerationmethods
AT shiyanou clegalqaachineselegalquestionansweringwithnaturallanguagegenerationmethods