SPARQLGEN: One-Shot Prompt-based Approach for SPARQL Query Generation

Kovriguina, LiubovLiubovKovriguinaTeucher, RomanRomanTeucherRadyush, DaniilDaniilRadyushMouromtsev, DmitryDmitryMouromtsev2024-01-112024-01-112023https://publica.fraunhofer.de/handle/publica/4586782-s2.0-85176588073In this work, we present a one-shot generative approach (further referred to as SPARQLGEN) for generating SPARQL queries by augmenting Large Language Models (LLMs) with the relevant context within a single prompt. The prompt includes heterogeneous data sources: a question itself, an RDF subgraph required to answer the question, and an example of a correct SPARQL query for a different question. In the experiments, GPT-3, a popular pre-trained language model from OpenAI, was leveraged, but it is possible to extend the approach to any other generative LLM. We evaluate, how different types of context in the prompt influence the query generation performance on QALD-9, QALD-10 and Bestiary dataset (BESTIARY), which was created to test LLM performance on unseen data, and provide a detailed error analysis. One of the findings is that providing the model with the underlying KG and a random correct query improve the generation results. The approach shows strong results on QALD-9 dataset, but doesn’t generalize on QALD-10 and BESTIARY which can be caused by memorization problem.enAugmented Large Language ModelsKnowledge Graphs Question AnsweringPrompt Template DesignSPARQL query generationSPARQLGEN: One-Shot Prompt-based Approach for SPARQL Query Generationconference paper