Decoding the Mystery: How Can LLMs Turn Text Into Cypher in Complex Knowledge Graphs?

The integration of Knowledge Graphs (KGs) with Question Answering (QA) systems is transforming the landscape of Artificial Intelligence (AI). Through the combination of these technologies, novel features can be provided for the translation of questions in natural language into database queries. Even...

Full description

Saved in:
Bibliographic Details
Main Authors: Ioanna Mandilara, Christina Maria Androna, Eleni Fotopoulou, Anastasios Zafeiropoulos, Symeon Papavassiliou
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10990239/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The integration of Knowledge Graphs (KGs) with Question Answering (QA) systems is transforming the landscape of Artificial Intelligence (AI). Through the combination of these technologies, novel features can be provided for the translation of questions in natural language into database queries. Even if a lot of work is emerging in this domain, this is not the case when we refer to translation of text to Cypher queries, where Cypher is one of the dominant query languages used for the development of KGs (e.g., based on the Neo4j technology). In this context, this paper provides a robust and efficient framework to systematically assess the efficiency of Large Language Models (LLMs) to support Text-to-Cypher conversion, focusing on the evaluation of open-source LLMs. The framework utilizes metrics and validators offered by an open-source software library that we developed, called CyVer. This study also assesses the impact of different schema representations of the KG on schema-aware query generation and the performance of LLMs on questions of different complexity requiring a depth of reasoning on the KG. A case study is described based on the application of the detailed framework in a KG with a large and complex schema that hosts data to track information related to the Sustainable Development Goals (SDGs). The experimental results demonstrate the effectiveness of the proposed framework, highlight the importance of the size of open-source models in the semantic comprehension of questions and the generation of valid Cypher queries, and stress the challenge for the generation of accurate queries in the case of questions requiring complex Cypher logic.
ISSN:2169-3536