Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models

In the field of databases, Large Language Models (LLMs) have recently been studied for generating SQL queries from textual descriptions, while their use for conceptual or logical data modeling remains less explored. The conceptual design of relational databases commonly relies on the entity-relation...

Full description

Saved in:
Bibliographic Details
Main Authors: Andrea Avignone, Alessia Tierno, Alessandro Fiori, Silvia Chiusano
Format: Article
Language:English
Published: MDPI AG 2025-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/16/5/368
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1849327107039035392
author Andrea Avignone
Alessia Tierno
Alessandro Fiori
Silvia Chiusano
author_facet Andrea Avignone
Alessia Tierno
Alessandro Fiori
Silvia Chiusano
author_sort Andrea Avignone
collection DOAJ
description In the field of databases, Large Language Models (LLMs) have recently been studied for generating SQL queries from textual descriptions, while their use for conceptual or logical data modeling remains less explored. The conceptual design of relational databases commonly relies on the entity-relationship (ER) data model, where translation rules enable mapping an ER schema into corresponding relational tables with their constraints. Our study investigates the capability of LLMs to describe in natural language a database conceptual data model based on the ER schema. Whether for documentation, onboarding, or communication with non-technical stakeholders, LLMs can significantly improve the process of explaining the ER schema by generating accurate descriptions about how the components interact as well as the represented information. To guide the LLM with challenging constructs, specific hints are defined to provide an enriched ER schema. Different LLMs have been explored (ChatGPT 3.5 and 4, Llama2, Gemini, Mistral 7B) and different metrics (F1 score, ROUGE, perplexity) are used to assess the quality of the generated descriptions and compare the different LLMs.
format Article
id doaj-art-93486bd208564c099ad75d9be00d018f
institution Kabale University
issn 2078-2489
language English
publishDate 2025-04-01
publisher MDPI AG
record_format Article
series Information
spelling doaj-art-93486bd208564c099ad75d9be00d018f2025-08-20T03:47:58ZengMDPI AGInformation2078-24892025-04-0116536810.3390/info16050368Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data ModelsAndrea Avignone0Alessia Tierno1Alessandro Fiori2Silvia Chiusano3Department of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, ItalyDepartment of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, ItalyDepartment of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, ItalyDepartment of Control and Computer Engineering, Politecnico di Torino, 10129 Torino, ItalyIn the field of databases, Large Language Models (LLMs) have recently been studied for generating SQL queries from textual descriptions, while their use for conceptual or logical data modeling remains less explored. The conceptual design of relational databases commonly relies on the entity-relationship (ER) data model, where translation rules enable mapping an ER schema into corresponding relational tables with their constraints. Our study investigates the capability of LLMs to describe in natural language a database conceptual data model based on the ER schema. Whether for documentation, onboarding, or communication with non-technical stakeholders, LLMs can significantly improve the process of explaining the ER schema by generating accurate descriptions about how the components interact as well as the represented information. To guide the LLM with challenging constructs, specific hints are defined to provide an enriched ER schema. Different LLMs have been explored (ChatGPT 3.5 and 4, Llama2, Gemini, Mistral 7B) and different metrics (F1 score, ROUGE, perplexity) are used to assess the quality of the generated descriptions and compare the different LLMs.https://www.mdpi.com/2078-2489/16/5/368relational databaselarge language modelsdatabase designentity-relationship
spellingShingle Andrea Avignone
Alessia Tierno
Alessandro Fiori
Silvia Chiusano
Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
Information
relational database
large language models
database design
entity-relationship
title Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
title_full Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
title_fullStr Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
title_full_unstemmed Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
title_short Exploring Large Language Models’ Ability to Describe Entity-Relationship Schema-Based Conceptual Data Models
title_sort exploring large language models ability to describe entity relationship schema based conceptual data models
topic relational database
large language models
database design
entity-relationship
url https://www.mdpi.com/2078-2489/16/5/368
work_keys_str_mv AT andreaavignone exploringlargelanguagemodelsabilitytodescribeentityrelationshipschemabasedconceptualdatamodels
AT alessiatierno exploringlargelanguagemodelsabilitytodescribeentityrelationshipschemabasedconceptualdatamodels
AT alessandrofiori exploringlargelanguagemodelsabilitytodescribeentityrelationshipschemabasedconceptualdatamodels
AT silviachiusano exploringlargelanguagemodelsabilitytodescribeentityrelationshipschemabasedconceptualdatamodels