NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Google BigQuery
Google BigQuery — Google Cloud's serverless, pay-per-query data warehouse. Pioneered separation of storage and compute and remains the easiest warehouse to get started with.
Google BigQuery is a data warehouse you don't have to run. There's no cluster to provision, no nodes to resize, no "warehouse size" dropdown to pick. You drop data in, write SQL, and Google figures out the rest. The bill shows up at the end of the month based on how much data your queries scanned (or how many "slots" of compute you reserved).
If Snowflake is a rental car where you still have to pick your engine size, BigQuery is an Uber — you say where you want to go, and a car shows up. You never think about the car.
That "just show up and query" experience is BigQuery's single biggest advantage, and it comes from the fact that BigQuery is not really a product Google built for you. It's a commercial wrapper around a system Google built for itself almost two decades ago to search its own logs.
The story starts in 2006 inside Google, with a system called Dremel. Google engineers needed to run interactive SQL-like queries across petabytes of log data — web crawl stats, ad click streams, production traces — and MapReduce was too slow. MapReduce is a batch system; you submit a job and come back hours later. Dremel was designed to return results in seconds over trillions of rows.
The key insight of Dremel, published in a famous 2010 paper ("Dremel: Interactive Analysis of Web-Scale Datasets"), was to combine two ideas:
At Google's internal scale, Dremel could scan tens of billions of rows in seconds. In 2010, Google announced BigQuery as a commercial version of Dremel, and it became generally available in 2011 as one of the earliest Google Cloud products. The team was led by engineers including Jordan Tigani and Siddartha Naidu; Tigani later co-founded MotherDuck around DuckDB, partly as a reaction to what he saw running BigQuery for a decade.
BigQuery predates Snowflake's public launch by about three years and predates Redshift by about a year. It was, depending on how you count, the first true serverless cloud data warehouse.
BigQuery looks like a single black box, but inside it's four services bolted together:
1. Colossus (storage). Your tables live as columnar files in Colossus, Google's successor to GFS. The format is called Capacitor and it's Google's proprietary columnar encoding, with heavy compression and value-level indexing. Crucially, storage is fully separated from compute — Colossus is a planet-scale object store that BigQuery queries read from directly.
2. Dremel (query engine). When you run a query, BigQuery parses the SQL, builds a query plan, and dispatches it as a tree of workers called "slots." A slot is a unit of CPU + memory. Large queries get more slots and finish faster; small queries get fewer. You never see slots as individual machines because they're scheduled out of a shared pool.
3. Jupiter (network). Google's internal petabit-scale datacenter network. This is the unsung hero of BigQuery — because Jupiter is so fast, BigQuery can shuffle data between workers at speeds that would melt a normal cloud network. This is why BigQuery joins at scale surprisingly well.
4. Borg (scheduling). Google's cluster manager (the ancestor of Kubernetes) schedules slots onto physical machines. You never see this layer.
The practical result: you type SELECT ... FROM huge_table and within a few hundred milliseconds, thousands of workers across Google's fleet are scanning your data in parallel. No warehouse to wake up, no autoscaling lag.
BigQuery has two main pricing models, and picking the wrong one can cost you ten times more than the right one.
On-demand pricing charges per byte scanned — roughly $6.25 per TB as of 2026. This sounds simple but has a nasty property: an SELECT * on a wide table is dramatically more expensive than a selective query, because columnar storage means you pay for the columns you touch. Teams that don't know this file $50,000 surprise bills.
Capacity (reserved slots) pricing lets you buy slots — either flat-rate monthly commitments or the newer BigQuery Editions (Standard, Enterprise, Enterprise Plus) launched in 2023, which auto-scale slot usage within a range. This is flat-rate compute, like a traditional warehouse.
The rough rule: if you're spending more than about $10K/month on on-demand, switch to Editions. Below that, on-demand is almost always cheaper and simpler.
This is where BigQuery's positioning gets awkward. For years, Google pitched on-demand as "pay only for what you use." In practice, it's pay-per-query-shape, which is an alien concept to most data teams who think in terms of compute hours. The 2023 introduction of Editions was a quiet admission that the consumption model needed to look more like Snowflake's for BigQuery to compete in large enterprises.
Google's official positioning: BigQuery is the analytical heart of a broader "data cloud" that includes BigLake (open-format table support), BigQuery ML (in-warehouse machine learning), Vertex AI integration, and Gemini-powered natural language querying. The pitch is that BigQuery is the only warehouse deeply integrated with a tier-one AI stack, because Google owns both ends.
What Google would rather you not focus on: BigQuery is structurally married to Google Cloud. If your org runs on AWS or Azure, the cross-cloud story — BigQuery Omni, which runs BigQuery compute inside other clouds — exists but is a second-class experience with real feature gaps. BigQuery is by far the strongest warehouse if you're on GCP and a much weaker fit if you're not.
BigQuery vs Snowflake. Snowflake is multi-cloud first; BigQuery is Google-first. Snowflake gives you fine-grained control over warehouses (T-shirt sizes, auto-suspend, per-workload isolation); BigQuery is more opaque and more automatic. Most teams pick Snowflake when they want control and multi-cloud portability, and BigQuery when they want to write zero operational code and they already live on GCP.
BigQuery vs Redshift. Not really a contest anymore. Redshift is faster than it used to be, but BigQuery is dramatically easier to operate and scales smoother. Redshift wins mostly on "we're already all-in on AWS and IAM integration matters."
BigQuery vs Databricks. Databricks is optimized for ML/data-science workloads and raw file access; BigQuery is optimized for SQL analytics and BI. They're converging but still clearly different cultures — Databricks feels like an ML platform that learned SQL, BigQuery feels like a SQL engine that learned ML.
BigQuery is excellent at:
CREATE MODEL SQL. It's not Databricks, but it's shockingly good for forecasting, clustering, and logistic regression without leaving SQL.BigQuery is bad at:
SELECT * can blow a month's budget.Three trends worth watching:
1. BigLake and Iceberg. Google's answer to the open-format trend is BigLake, which lets BigQuery query Apache Iceberg and Parquet files in Google Cloud Storage (and increasingly S3) with the same engine. This is Google hedging against the "warehouse as closed store" critique from Databricks.
2. Gemini in BigQuery. Google is integrating Gemini models into BigQuery Studio for natural-language-to-SQL and code assist. Expect the warehouse boundary to blur with the AI layer over the next two years.
3. BigQuery Editions as the new default. On-demand pricing will quietly become a niche option for small users. Enterprises will live on Editions, which looks much more like Snowflake credits.
TextQL Ana connects natively to BigQuery through the standard SQL dialect and BigQuery's authorization model, including row-level and column-level security. Because BigQuery tables are schema-enforced and typically well-governed (especially under a Dataplex or Dataform setup), they tend to be the cleanest substrate for LLM-generated SQL. TextQL respects BigQuery slot reservations and project-level cost controls, which matters for teams on Editions pricing that want to avoid runaway on-demand scans.
See TextQL in action
Related topics