NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
ClickHouse
ClickHouse is the dominant real-time analytics database. Created at Yandex in 2009 by Alexey Milovidov to power Yandex.Metrica, open-sourced in 2016, and commercialized by ClickHouse Inc. starting in 2021. It is the speed champion of the OLAP world.
ClickHouse is the dominant real-time analytics database in 2026. It is fast, it is open source, it is operationally simple by the standards of its category, and it has eaten the lunch of every other real-time OLAP engine. If you want to run analytical queries over event-shaped data with sub-second latency at high concurrency, ClickHouse is the default answer and the burden is on alternatives to justify themselves.
The shortest version of the ClickHouse story is: a Russian search-engine company built it in-house to make their web analytics product faster, open-sourced it almost as an afterthought in 2016, and accidentally launched the most successful new analytical database since BigQuery.
ClickHouse began life inside Yandex — the Russian search giant — in 2009. Yandex needed an analytics engine to power Yandex.Metrica, their web analytics product (essentially Russia's Google Analytics). Metrica's problem was the same problem every web analytics product has: billions of events per day, queries that need to scan billions of rows to compute aggregations, and users who expect dashboards to render in under a second.
Yandex tried existing options. They didn't fit — either too slow, too expensive, or both. So Alexey Milovidov, an engineer at Yandex, started building a custom column-store analytical database in C++. The first version went into production around 2009. Over the following years, ClickHouse became Yandex's internal standard for analytics workloads, processing trillions of rows of data per day across many internal services.
In June 2016, Yandex open-sourced ClickHouse under the Apache 2.0 license. There was no commercial company, no big launch, no marketing. It was just an open-source release of a tool Yandex had built for itself. The bet, if there was one, was that other engineers would find it useful and contribute back.
The bet paid off in a way nobody at Yandex anticipated. By 2018-2019, ClickHouse had become the favorite of a small but growing community of engineers at companies with serious analytics workloads — Cloudflare, Uber, eBay — who had tried Druid, Pinot, Vertica, Redshift, BigQuery, and found that ClickHouse was just faster at the kinds of queries they cared about. Often by an order of magnitude. At a fraction of the hardware cost.
In September 2021, Aaron Katz (formerly of Elastic) co-founded ClickHouse Inc. with Milovidov and Yury Izrailevsky (formerly of Netflix), pulling the project out of Yandex into an independent commercial company headquartered in San Francisco. The company raised a $50M Series A and a $250M Series B in the same year, both at valuations exceeding $2 billion. ClickHouse Cloud, the managed service, launched in late 2022.
ClickHouse's speed is not magic. It is the accumulation of dozens of well-chosen design decisions and years of micro-optimization in C++ by an engineering team that genuinely cares about cycles. The most important pieces:
1. Vectorized query execution. Instead of processing one row at a time, ClickHouse processes batches of values (columns) at once, exploiting SIMD instructions on modern CPUs. This is the same idea as MonetDB/X100, Vertica, and DuckDB — but ClickHouse implements it with extreme care.
2. Columnar storage with aggressive compression. Each column is stored separately, with codec selection per column (delta-of-delta for timestamps, double-delta for sequences, gorilla for floats, LZ4 or ZSTD on top). The result: data on disk is often 5-20x smaller than the row-store equivalent, and queries scan less data.
3. The MergeTree engine. ClickHouse's primary storage engine sorts data by a primary key, writes it to immutable parts on disk, and merges parts in the background. The primary key acts as a sparse index — queries that filter on it can skip entire data parts. There are many MergeTree variants (ReplacingMergeTree, AggregatingMergeTree, SummingMergeTree, CollapsingMergeTree) that handle different update patterns.
4. No JVM, no garbage collection. ClickHouse is a single C++ binary. There is no JVM warmup, no GC pauses, no off-heap memory management. The operational simplicity of "deploy a binary" is genuinely a feature.
5. Query parallelism. A single query is split across all available CPU cores within a node, and across all nodes in a distributed cluster. ClickHouse uses every cycle you give it.
6. Materialized views as ETL. ClickHouse materialized views are not periodic recomputations — they are triggers that fire on insert and write derived data into another table. This makes them effective for streaming aggregations and rollups.
The combined result: queries that take seconds on Snowflake or Redshift often take tens of milliseconds on ClickHouse. Benchmarks bear this out (with the usual caveat that benchmarks are imperfect).
ClickHouse is the right answer for a remarkably wide range of use cases:
The honest weaknesses:
ClickHouse won the real-time OLAP category, and it won it through engineering excellence, operational simplicity, and a permissive open-source license. The Apache 2.0 license was decisive — unlike Druid's complexity or Pinot's smaller community, ClickHouse was easy to adopt without a vendor relationship and easy to commercialize without legal friction.
ClickHouse Cloud is the strategic question for the next few years. Open-source ClickHouse is so good that it competes with the managed offering — many sophisticated users self-host. The bet of ClickHouse Inc. is that the operational benefits of cloud (auto-scaling, separation of compute and storage, multi-region, governance) will pull large enterprises toward Cloud, while the open-source version remains the path for smaller users and the source of the funnel.
In 2026, if you are picking a real-time OLAP database for a new project and you do not have a strong reason to choose otherwise, you should pick ClickHouse.
ClickHouse sits downstream of event streaming and stream processing, and in parallel to (not downstream of) data warehouses. Events flow in via Kafka (the Kafka table engine), HTTP, or batch loads from S3/Parquet. Queries come from BI tools, application backends, or end users via the ClickHouse SQL interface (HTTP or native protocol).
ClickHouse is a first-class TextQL connection. TextQL Ana connects to ClickHouse over its native or HTTP interface and queries it the same way it queries Snowflake, BigQuery, or Postgres. The interesting use case is freshness: ClickHouse's sub-second query latency on data that is seconds old means TextQL users can ask questions about events happening right now, not events from yesterday's batch load. For organizations that have invested in a streaming pipeline ending in ClickHouse, TextQL becomes the natural-language interface to the freshest data in the company.
See TextQL in action