Why We Use an Ontology for Text to SQL

We spent our first 12 months learning the hard way that LLM generated text-to-SQL doesn’t work.

Today, we still talk to dozens of Fortune 500 companies who are attempting this and are burning millions of dollars fine tuning and building custom models. So I wanted to take a minute to share why we no longer use this approach and the better approach to solve for it.

The Problem: Data is Complex

No matter how much table and column metadata we fed in – schemas, documentation from dbt, Alation, and more – it wasn’t enough. LLMs would still generate unreliable SQL queries because they lacked the right context. Without structure, these models had too much freedom, like giving a self-driving car no road to follow.

So, we went back to the drawing board.

The Solution: Am Ontology Architecture

What we ended up building was an ontology architecture. You can think of it as a map that defines the key objects in your business – tables, actions, metrics, and dashboards – and how they relate to one another. This map gives the LLM much-needed guidance. When a query request comes in, the ontology acts like tracks for the model to follow, ensuring the output is accurate and relevant. Without it, the LLM is prone to going off the rails.

It’s a bit like autonomous driving: Waymo spent years and billions solving self-driving cars. But what if you gave those cars better tracks to follow? That’s exactly what an ontology does for LLMs generating SQL.

Now we’re seeing how LLMs, when properly guided by an ontology, can reliably generate SQL and solve real-world challenges.

This table highlights the key differences between traditional methods and an ontology-driven approach for AI-powered data solutions.

Building AI for Data
Catalog, RAG & Fine-tuning
table & column schemas with additional documentation
Ontology
a structured representation of concepts and their relationships
• Verification of the output SQL requires manual intervention • Queries are visually presented in a natural language DSL
• No reliable performance increase with more docs or re-training ($$) • Each improvement predictably expands the question space
• Schema changes require full-model re-trainings • Schema changes require deterministic configuration changes
• You cannot tell why an LLM hallucinated a specific line of code • Every line of code generated with transparent logic
• Can never guarantee it won’t make up a formula for a metric • Always uses defined specific formulas for key business metrics


What Inspired Us

When we realized we needed an ontology, we borrowed the best ideas from existing systems:

• Palantir Technologies: Their intuitive UI lets users visualize entities on a canvas.

• Tobiko’s SQLGlot: A powerful tool for translating SQL across different syntaxes.

• Looker’s LookML: A pioneer in metric and dimension modeling.

• Tableau’s LOD Expressions: Brilliant at managing complex joins and fan-out paths.

The data space is full of great concepts, but architects often over-focus on individual improvements and miss the bigger picture. A truly effective ontology captures everything – every asset, table, metric, and dashboard – combining the best ideas from across platforms.

Summary of Ontology vs Catalog, RAG & Fine-Tuning

Table of Contents

Work with TextQL
Book a Demo

Stay Informed with TextQL's Newsletter