Decoding the Mystery: How Can LLMs Turn Text Into Cypher in Complex Knowledge Graphs?
The integration of Knowledge Graphs (KGs) with Question Answering (QA) systems is transforming the landscape of Artificial Intelligence (AI). Through the combination of these technologies, novel features can be provided for the translation of questions in natural language into database queries. Even...
Saved in:
| Main Authors: | , , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
IEEE
2025-01-01
|
| Series: | IEEE Access |
| Subjects: | |
| Online Access: | https://ieeexplore.ieee.org/document/10990239/ |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | The integration of Knowledge Graphs (KGs) with Question Answering (QA) systems is transforming the landscape of Artificial Intelligence (AI). Through the combination of these technologies, novel features can be provided for the translation of questions in natural language into database queries. Even if a lot of work is emerging in this domain, this is not the case when we refer to translation of text to Cypher queries, where Cypher is one of the dominant query languages used for the development of KGs (e.g., based on the Neo4j technology). In this context, this paper provides a robust and efficient framework to systematically assess the efficiency of Large Language Models (LLMs) to support Text-to-Cypher conversion, focusing on the evaluation of open-source LLMs. The framework utilizes metrics and validators offered by an open-source software library that we developed, called CyVer. This study also assesses the impact of different schema representations of the KG on schema-aware query generation and the performance of LLMs on questions of different complexity requiring a depth of reasoning on the KG. A case study is described based on the application of the detailed framework in a KG with a large and complex schema that hosts data to track information related to the Sustainable Development Goals (SDGs). The experimental results demonstrate the effectiveness of the proposed framework, highlight the importance of the size of open-source models in the semantic comprehension of questions and the generation of valid Cypher queries, and stress the challenge for the generation of accurate queries in the case of questions requiring complex Cypher logic. |
|---|---|
| ISSN: | 2169-3536 |