Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models

Arch-Eval benchmark for assessing chinese architectural domain knowledge in large language models

Abstract The burgeoning application of Large Language Models (LLMs) in Natural Language Processing (NLP) has prompted scrutiny of their domain-specific knowledge processing, especially in the construction industry. Despite high demand, there is a scarcity of evaluative studies for LLMs in this area....

Full description

Saved in:

Bibliographic Details
Main Authors:	Jie Wu, Mincheng Jiang, Juntian Fan, Shimin Li, Hongtao Xu, Ye Zhao
Format:	Article
Language:	English
Published:	Nature Portfolio 2025-04-01
Series:	Scientific Reports
Subjects:	LLMs’ assessment Construction knowledge Domain specialization Answer-only (AO) evaluation Chain-of-thought (COT)
Online Access:	https://doi.org/10.1038/s41598-025-98236-0
Tags:	Add Tag No Tags, Be the first to tag this record!

Similar Items

Meticulous Thought Defender: Fine-Grained Chain-of-Thought (CoT) for Detecting Prompt Injection Attacks of Large Language Models
by: Lijuan Shi, et al.
Published: (2025-01-01)

Measuring and Improving the Efficiency of Python Code Generated by LLMs Using CoT Prompting and Fine-Tuning
by: Ramya Jonnala, et al.
Published: (2025-01-01)

SHIELD: an evaluation benchmark for face spoofing and forgery detection with multimodal large language models
by: Yichen Shi, et al.
Published: (2025-06-01)

Capability-based training framework for generative AI in higher education
by: Pablo Burneo-Arteaga, et al.
Published: (2025-06-01)

Correction: Capability-based training framework for generative AI in higher education
by: Pablo Burneo-Arteaga, et al.
Published: (2025-08-01)

Towards a benchmark dataset for large language models in the context of process automation
by: Tejennour Tizaoui, et al.
Published: (2024-12-01)

Heavy chain-only antibodies with a stabilized human VH in transgenic chickens for therapeutic antibody discovery
by: Christine N. Vuong, et al.
Published: (2024-12-01)

Syntactic-Guided Chain of Thought for Iterative Implicit and Explicit Target Detection in Aspect-Based Sentiment Analysis
by: Mohammad Radi, et al.
Published: (2025-01-01)

A benchmark for evaluating crisis information generation capabilities in LLMs
by: Ruilian Han, et al.
Published: (2025-03-01)

Analyzing Diagnostic Reasoning of Vision–Language Models via Zero-Shot Chain-of-Thought Prompting in Medical Visual Question Answering
by: Fatema Tuj Johora Faria, et al.
Published: (2025-07-01)

A clinician-based comparative study of large language models in answering medical questions: the case of asthma
by: Yong Yin, et al.
Published: (2025-04-01)

Benchmarking pre-trained text embedding models in aligning built asset information
by: Mehrzad Shahinmoghadam, et al.
Published: (2025-07-01)

A Formal Method for Conceptual Fit Analysis
by: Antoni Olivé
Published: (2015-12-01)

Neonatal Hypogycemia
by: J Gordon Millichap
Published: (1988-11-01)

RVBench: Role values benchmark for role-playing LLMs
by: Ye Wang, et al.
Published: (2025-08-01)

Leveraging large language models for spelling correction in Turkish
by: Ceren Guzel Turhan
Published: (2025-06-01)

Toward HydroLLM: a benchmark dataset for hydrology-specific knowledge assessment for large language models
by: Dilara Kizilkaya, et al.
Published: (2025-01-01)

3D meshwork architecture of the outer coat protein CotE: implications for bacterial endospore sporulation and germination
by: Dukwon Lee, et al.
Published: (2025-04-01)

Genomic analyses support locally derived crown-of-thorns seastar outbreaks in the Pacific
by: Carlos Leiva, et al.
Published: (2025-08-01)

Different oxygenation modalities for early post-extubation: a single center randomized controlled trial
by: Walid Omar Ahmed, et al.
Published: (2025-03-01)

Expression Analysis of Heavy-Chain-Only Antibodies in Cloudy Catshark and Japanese Bullhead Shark
by: Reo Uemura, et al.
Published: (2025-01-01)

Development of Safety Domain Ontology Knowledge Base for Fall Accidents
by: Hyunsoung Park, et al.
Published: (2025-06-01)

Hajj-FQA: A benchmark Arabic dataset for developing question-answering systems on Hajj fatwas
by: Hayfa A. Aleid, et al.
Published: (2025-07-01)

ЦИФРОВИЙ ПРИСТРІЙ ЗАХИСТУ ВІД ЗАВАД ДЛЯ АНАЛОГОВИХ ОГЛЯДОВИХ РАДІОЛОКАЦІЙНИХ СТАНЦІЙ МЕТРОВОГО ДІАПАЗОНУ ХВИЛЬ
by: М.Р. Арасланов, et al.
Published: (2024-12-01)

Використання COTS-технологій для відновлення працездатності й модернізації вузлів та блоків аналогових радіолокаційних станцій радіотехнічних військ
by: М. Р. Арасланов, et al.
Published: (2023-08-01)

Multi-Vehicle Object Recognition Method Based on YOLOv7-W
by: Xin Liu, et al.
Published: (2025-01-01)

A matching model for culture and tourism customer service questions based on domain dictionary fusion
by: ZHU Xinjuan, et al.
Published: (2024-06-01)

Designing and Evaluating a Dual-Stream Transformer-Based Architecture for Visual Question Answering
by: Faheem Shehzad, et al.
Published: (2024-01-01)

Deep Learning Architecture to Infer Kennedy Classification of Partially Edentulous Arches Using Object Detection Techniques and Piecewise Annotations
by: Zohaib Khurshid, MRes, FDTFEd, FHEA, et al.
Published: (2025-02-01)

Domain Forensic Sciences: A Modern Perspective on the Problem
by: V. O. Kuznetsov
Published: (2023-11-01)

YOLO in Healthcare: A Comprehensive Review of Detection Architectures, Domain Applications, and Future Innovations
by: Damodharan Palaniappan, et al.
Published: (2025-01-01)

A scalable framework for evaluating multiple language models through cross-domain generation and hallucination detection
by: Sorup Chakraborty, et al.
Published: (2025-08-01)

Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion
by: Junkai Zhang, et al.
Published: (2025-04-01)

Cross-Encoder-Based Semantic Evaluation of Extractive and Generative Question Answering in Low-Resourced African Languages
by: Funebi Francis Ijebu, et al.
Published: (2025-03-01)

An efficient bacterial laccase-mediated system for polyurethane foam degradation
by: Xiaomin Zhu, et al.
Published: (2025-08-01)

Improved COT control strategy for CRM Boost-PFC circuits applied to marine integrated power systems
by: Hongxing Chen, et al.
Published: (2025-05-01)

A Systematic, Cross-Model Evaluation of Ensemble Light Scattering Sensors
by: Abhay Vidwans, et al.
Published: (2023-11-01)

An Inexpensive, 3D-Printable, Arduino- and Blu-Ray-Based Confocal Laser and Fluorescent Scanning Microscope
by: Justin Loose, et al.
Published: (2025-01-01)

Enzymatic Oxidation of Aflatoxin M<sub>1</sub> in Milk Using CotA Laccase
by: Yongpeng Guo, et al.
Published: (2024-11-01)

TSTBench: A Comprehensive Benchmark for Text Style Transfer
by: Yifei Xie, et al.
Published: (2025-05-01)