We spent our first 12 months learning the hard way that LLM generated text-to-SQL doesn’t work.
Today, we still talk to dozens of Fortune 500 companies who are attempting this and are burning millions of dollars fine tuning and building custom models. So I wanted to take a minute to share why we no longer use this approach and the better approach to solve for it.
The Problem: Data is Complex
No matter how much table and column metadata we fed in – schemas, documentation from dbt, Alation, and more – it wasn’t enough. LLMs would still generate unreliable SQL queries because they lacked the right context. Without structure, these models had too much freedom, like giving a self-driving car no road to follow.
So, we went back to the drawing board.
The Solution: Am Ontology Architecture
What we ended up building was an ontology architecture. You can think of it as a map that defines the key objects in your business – tables, actions, metrics, and dashboards – and how they relate to one another. This map gives the LLM much-needed guidance. When a query request comes in, the ontology acts like tracks for the model to follow, ensuring the output is accurate and relevant. Without it, the LLM is prone to going off the rails.
It’s a bit like autonomous driving: Waymo spent years and billions solving self-driving cars. But what if you gave those cars better tracks to follow? That’s exactly what an ontology does for LLMs generating SQL.
Now we’re seeing how LLMs, when properly guided by an ontology, can reliably generate SQL and solve real-world challenges.
This table highlights the key differences between traditional methods and an ontology-driven approach for AI-powered data solutions.
What Inspired Us
When we realized we needed an ontology, we borrowed the best ideas from existing systems:
• Palantir Technologies: Their intuitive UI lets users visualize entities on a canvas.
• Tobiko’s SQLGlot: A powerful tool for translating SQL across different syntaxes.
• Looker’s LookML: A pioneer in metric and dimension modeling.
• Tableau’s LOD Expressions: Brilliant at managing complex joins and fan-out paths.
The data space is full of great concepts, but architects often over-focus on individual improvements and miss the bigger picture. A truly effective ontology captures everything – every asset, table, metric, and dashboard – combining the best ideas from across platforms.
Stay Informed with TextQL's Newsletter
Sign up to receive updates on product news and future blog posts.