Databricks SQL Warehouse | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki Data Warehouses Databricks SQL Warehouse

Contents

Databricks SQL Warehouse

Databricks SQL is the lakehouse's warehouse impersonation -- a high-performance SQL engine (Photon) and BI surface built specifically to compete with Snowflake.

A Databricks SQL Warehouse is a cluster (or serverless pool) that runs SQL queries against Delta Lake tables, optimized for the BI workloads that Databricks historically lost to Snowflake. It is what you point Tableau, Power BI, Looker, dbt, or a JDBC client at when you want to query the lakehouse without using a notebook.

The simplest way to think about it: Databricks SQL is a warehouse-shaped UI and execution mode bolted onto a Spark-and-Delta-Lake platform. Underneath, it's still the lakehouse architecture — Parquet files in object storage, Delta transactions, Unity Catalog governance — but the user experience and the engine (Photon) are tuned for low-latency SQL the way Snowflake is.

Origin Story

For most of Databricks' history (2013—2020), it was a notebook company. Customers used PySpark in a notebook, ran ETL jobs, did data science, and trained ML models. The BI team — the analysts who lived in Tableau and Looker — did not use Databricks. They used Snowflake. This was an enormous structural problem. The BI workload is where the recurring revenue lives, where the seat counts are highest, and where the C-suite forms its opinion of which platform is "the data platform." Databricks was being shut out of it.

Databricks SQL was the response. It was previewed in November 2020 as SQL Analytics, went GA in 2021, and was rebranded Databricks SQL later that year. The launch had three coupled pieces, all of which had to work for the strategy to make sense:

Photon, a new vectorized query execution engine written in C++, designed to give Databricks competitive scan and aggregation performance against Snowflake's engine. Photon was announced in June 2020 and is the technical heart of Databricks SQL. Without Photon, the lakehouse would have been too slow on BI workloads to compete; with it, the gap closed dramatically.
A SQL editor and BI surface: a query editor, dashboards, alerts, JDBC/ODBC endpoints, and integration with the BI tools customers actually use. This was the "warehouse experience" that notebook-first Databricks had never offered.
Serverless SQL Warehouses (GA 2022), which removed the cluster-spin-up latency that had made Databricks feel slow and operational compared to Snowflake's instant virtual warehouses.

The reason Databricks SQL exists is the mirror image of why Snowpark exists: Databricks needed a warehouse story to fight Snowflake. Just as Snowflake couldn't credibly sell to data engineers without a Python story, Databricks couldn't credibly sell to BI buyers without a low-latency SQL story. Databricks SQL is that story.

How It Works

A SQL Warehouse is a managed compute resource sized in T-shirt terms (X-Small through 4X-Large). You pick a size, optionally make it serverless, and connect a BI tool to its JDBC endpoint. Behind the scenes:

Compute is one or more cluster nodes running Photon. Serverless warehouses run on a Databricks-managed pool that starts in seconds; classic warehouses run in your own cloud account and have minute-scale spin-up.
Storage is whatever Delta Lake tables you've registered in Unity Catalog. The data lives in your S3/ADLS/GCS bucket as Parquet files with a Delta transaction log.
The query engine is Photon, which executes most operators (filters, aggregates, joins, scans) in vectorized C++ rather than the JVM Spark code path. Photon is roughly 2—4x faster than the JVM engine on typical BI queries and is what makes Databricks SQL competitive on warehouse-style benchmarks.
Caching happens at multiple layers: a remote results cache, a local SSD disk cache for hot Parquet files, and predictive I/O for prefetching. These are all designed to mimic the "feels instant" behavior of Snowflake's result cache.
Governance flows through Unity Catalog: row/column security, audit logs, lineage, and data sharing all hang off the same metadata layer the rest of the platform uses.

What It's Good At

BI dashboards on lakehouse data. Tableau, Power BI, Looker, ThoughtSpot, Mode — all connect via JDBC and benefit from Photon and the result cache.
dbt-on-Databricks workflows. Databricks SQL is the preferred dbt target on the platform; the dbt-databricks adapter is mature and well-supported.
Querying open-format data without copying it. The whole point is that the data is already in Delta/Parquet on object storage. You don't have to ingest it into a proprietary warehouse to query it.
Mixed warehouse + ML shops. Teams that use Databricks for ML and Spark already get warehouse functionality without standing up a second platform.

What It's Bad At (or still catching up on)

Tiny, latency-sensitive BI on a cold cluster. Serverless helps a lot, but for the very lowest-latency dashboard hits, Snowflake's result cache and instant warehouse model still feels more responsive in some benchmarks.
Pure SQL shops with no Python or Spark needs. If your team only does SQL and BI, the operational complexity of the broader Databricks platform (workspaces, clusters, notebooks, jobs) is overhead you don't need. Snowflake or BigQuery is simpler.
Concurrency at extreme scale. Snowflake's multi-cluster warehouse model is still the gold standard for very high BI concurrency; Databricks SQL is competitive but not always equivalent at the upper end.

The Opinionated Take

Databricks SQL is the most strategically important product Databricks has ever shipped, and it's the product that most clearly proves the lakehouse thesis. For years, "lakehouse" was a slide in a deck. Databricks SQL plus Photon plus Unity Catalog plus Delta Lake is the actual implementation — a system that does warehouse things on lake storage, well enough that you can run Tableau against it without apology.

The convergence story is now impossible to deny. Snowflake added Iceberg tables on customer-managed object storage; Databricks added Photon and a serverless SQL endpoint. Both companies are building the same thing from opposite directions. A 2026 Snowflake deployment with Iceberg looks shockingly similar to a 2026 Databricks deployment with SQL Warehouses: open table format on cloud storage, fast vectorized SQL engine, governance layer, BI surface. The differences are real but increasingly secondary.

The honest assessment of where Databricks SQL still trails Snowflake: developer experience polish, the "credit-card-and-go" simplicity of provisioning, and the smoothness of small-cluster cold-start. Where it leads: it doesn't lock your data in proprietary storage, it shares one metadata layer with your ML and Spark workloads, and it's typically cheaper at the high end on heavy ELT jobs. Which one wins for a given customer comes down to organizational gravity (BI team vs. ML team) more than it does to product capability.

How TextQL Fits

TextQL Ana connects to Databricks via the SQL Warehouse JDBC endpoint and treats it as a first-class target alongside Snowflake. The Unity Catalog metadata layer is particularly valuable for an AI analyst — column descriptions, table tags, and lineage all come through in a structured way that helps TextQL generate correct queries on the first attempt. For Databricks customers running BI workloads, TextQL is best deployed against a dedicated SQL Warehouse rather than a general-purpose cluster, both for performance isolation and for cost transparency.

See TextQL in action

Databricks SQL Warehouse

Released 2020 (preview as "SQL Analytics"); GA 2021; rebranded "Databricks SQL" 2021; Serverless GA 2022

Vendor Databricks

Type SQL query engine and BI service

Engine Photon (vectorized C++ execution engine)

Category Data Warehouse

Monthly mindshare ~100K · ~20% of Databricks customers; the SQL/BI persona use case