Semantic and lexical analysis of pre-trained vision language artificial intelligence models for automated image descriptions in civil engineering

Abstract This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigatio...

Full description

Saved in:
Bibliographic Details
Main Authors: Pedram Bazrafshan, Kris Melag, Arvin Ebrahimkhanlou
Format: Article
Language:English
Published: Springer Nature 2025-08-01
Series:AI in Civil Engineering
Subjects:
Online Access:https://doi.org/10.1007/s43503-025-00063-9
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract This paper investigates the application of pre-trained Vision-Language Models (VLMs) for describing images from civil engineering materials and construction sites, with a focus on construction components, structural elements, and materials. The novelty of this study lies in the investigation of VLMs for this specialized domain, which has not been previously addressed. As a case study, the paper evaluates ChatGPT-4v’s ability to serve as a descriptor tool by comparing its performance with three human descriptions (a civil engineer and two engineering interns). The contributions of this work include adapting a pre-trained VLM to civil engineering applications without additional fine-tuning and benchmarking its performance using both semantic similarity analysis (SentenceTransformers) and lexical similarity methods. Utilizing two datasets—one from a publicly available online repository and another manually collected by the authors—the study employs whole-text and sentence pair-wise similarity analyses to assess the model’s alignment with human descriptions. Results demonstrate that the best-performing model achieved an average similarity of 76% (4% standard deviation) when compared to human-generated descriptions. The analysis also reveals better performance on the publicly available dataset.
ISSN:2097-0943
2730-5392