ClickHouse | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

ClickHouse

ClickHouse is the dominant real-time analytics database. Created at Yandex in 2009 by Alexey Milovidov to power Yandex.Metrica, open-sourced in 2016, and commercialized by ClickHouse Inc. starting in 2021. It is the speed champion of the OLAP world.

ClickHouse is the dominant real-time analytics database in 2026. It is fast, it is open source, it is operationally simple by the standards of its category, and it has eaten the lunch of every other real-time OLAP engine. If you want to run analytical queries over event-shaped data with sub-second latency at high concurrency, ClickHouse is the default answer and the burden is on alternatives to justify themselves.

The shortest version of the ClickHouse story is: a Russian search-engine company built it in-house to make their web analytics product faster, open-sourced it almost as an afterthought in 2016, and accidentally launched the most successful new analytical database since BigQuery.

The Yandex Origin

ClickHouse began life inside Yandex — the Russian search giant — in 2009. Yandex needed an analytics engine to power Yandex.Metrica, their web analytics product (essentially Russia's Google Analytics). Metrica's problem was the same problem every web analytics product has: billions of events per day, queries that need to scan billions of rows to compute aggregations, and users who expect dashboards to render in under a second.

Yandex tried existing options. They didn't fit — either too slow, too expensive, or both. So Alexey Milovidov, an engineer at Yandex, started building a custom column-store analytical database in C++. The first version went into production around 2009. Over the following years, ClickHouse became Yandex's internal standard for analytics workloads, processing trillions of rows of data per day across many internal services.

In June 2016, Yandex open-sourced ClickHouse under the Apache 2.0 license. There was no commercial company, no big launch, no marketing. It was just an open-source release of a tool Yandex had built for itself. The bet, if there was one, was that other engineers would find it useful and contribute back.

The bet paid off in a way nobody at Yandex anticipated. By 2018-2019, ClickHouse had become the favorite of a small but growing community of engineers at companies with serious analytics workloads — Cloudflare, Uber, eBay — who had tried Druid, Pinot, Vertica, Redshift, BigQuery, and found that ClickHouse was just faster at the kinds of queries they cared about. Often by an order of magnitude. At a fraction of the hardware cost.

In September 2021, Aaron Katz (formerly of Elastic) co-founded ClickHouse Inc. with Milovidov and Yury Izrailevsky (formerly of Netflix), pulling the project out of Yandex into an independent commercial company headquartered in San Francisco. The company raised a $50M Series A and a $250M Series B in the same year, both at valuations exceeding $2 billion. ClickHouse Cloud, the managed service, launched in late 2022.

Why ClickHouse Is So Fast

ClickHouse's speed is not magic. It is the accumulation of dozens of well-chosen design decisions and years of micro-optimization in C++ by an engineering team that genuinely cares about cycles. The most important pieces:

1. Vectorized query execution. Instead of processing one row at a time, ClickHouse processes batches of values (columns) at once, exploiting SIMD instructions on modern CPUs. This is the same idea as MonetDB/X100, Vertica, and DuckDB — but ClickHouse implements it with extreme care.

2. Columnar storage with aggressive compression. Each column is stored separately, with codec selection per column (delta-of-delta for timestamps, double-delta for sequences, gorilla for floats, LZ4 or ZSTD on top). The result: data on disk is often 5-20x smaller than the row-store equivalent, and queries scan less data.

3. The MergeTree engine. ClickHouse's primary storage engine sorts data by a primary key, writes it to immutable parts on disk, and merges parts in the background. The primary key acts as a sparse index — queries that filter on it can skip entire data parts. There are many MergeTree variants (ReplacingMergeTree, AggregatingMergeTree, SummingMergeTree, CollapsingMergeTree) that handle different update patterns.

4. No JVM, no garbage collection. ClickHouse is a single C++ binary. There is no JVM warmup, no GC pauses, no off-heap memory management. The operational simplicity of "deploy a binary" is genuinely a feature.

5. Query parallelism. A single query is split across all available CPU cores within a node, and across all nodes in a distributed cluster. ClickHouse uses every cycle you give it.

6. Materialized views as ETL. ClickHouse materialized views are not periodic recomputations — they are triggers that fire on insert and write derived data into another table. This makes them effective for streaming aggregations and rollups.

The combined result: queries that take seconds on Snowflake or Redshift often take tens of milliseconds on ClickHouse. Benchmarks bear this out (with the usual caveat that benchmarks are imperfect).

Where ClickHouse Wins

ClickHouse is the right answer for a remarkably wide range of use cases:

In-product analytics. Posthog, Tinybird, Sentry, and many other developer-tools companies are built on ClickHouse to serve customer-facing analytics.
Observability and log analytics. Cloudflare uses ClickHouse to serve logs from their entire global network. Uber uses it for observability. The "ClickHouse instead of Elasticsearch for logs" pattern has become common.
Ad-tech and clickstream. The original use case, still huge.
Time-series analytics. ClickHouse's time-series performance rivals or exceeds purpose-built TSDBs for many workloads.
General OLAP at any scale where you want speed and operational simplicity.

Where ClickHouse Has Rough Edges

The honest weaknesses:

Updates and deletes are awkward. ClickHouse is optimized for append-only event data. Update and delete operations exist but are expensive (they trigger merges and are eventually consistent). If your workload is "update individual rows frequently," ClickHouse is not the right tool.
Joins, while improved, are not its strength. Older versions of ClickHouse only supported broadcast joins. Newer versions have hash joins, merge joins, and grace hash joins, but join planning and execution are still less mature than in Snowflake or BigQuery. Star schema joins work fine; complex multi-way joins on large tables are still where ClickHouse is weakest.
Distributed cluster operations require care. Sharding, replication, and rebalancing are all manual operations. ClickHouse Cloud abstracts much of this away, but self-hosted clusters require expertise.
Schema migrations on huge tables can be expensive. Adding or modifying columns on petabyte-scale tables is doable but requires planning.

The Honest Vendor Take

ClickHouse won the real-time OLAP category, and it won it through engineering excellence, operational simplicity, and a permissive open-source license. The Apache 2.0 license was decisive — unlike Druid's complexity or Pinot's smaller community, ClickHouse was easy to adopt without a vendor relationship and easy to commercialize without legal friction.

ClickHouse Cloud is the strategic question for the next few years. Open-source ClickHouse is so good that it competes with the managed offering — many sophisticated users self-host. The bet of ClickHouse Inc. is that the operational benefits of cloud (auto-scaling, separation of compute and storage, multi-region, governance) will pull large enterprises toward Cloud, while the open-source version remains the path for smaller users and the source of the funnel.

In 2026, if you are picking a real-time OLAP database for a new project and you do not have a strong reason to choose otherwise, you should pick ClickHouse.

Where ClickHouse Sits in the Stack

ClickHouse sits downstream of event streaming and stream processing, and in parallel to (not downstream of) data warehouses. Events flow in via Kafka (the Kafka table engine), HTTP, or batch loads from S3/Parquet. Queries come from BI tools, application backends, or end users via the ClickHouse SQL interface (HTTP or native protocol).

How TextQL Works with ClickHouse

ClickHouse is a first-class TextQL connection. TextQL Ana connects to ClickHouse over its native or HTTP interface and queries it the same way it queries Snowflake, BigQuery, or Postgres. The interesting use case is freshness: ClickHouse's sub-second query latency on data that is seconds old means TextQL users can ask questions about events happening right now, not events from yesterday's batch load. For organizations that have invested in a streaming pipeline ending in ClickHouse, TextQL becomes the natural-language interface to the freshest data in the company.

See TextQL in action

ClickHouse

Created 2009 at Yandex (internal)

Open-sourced June 2016

Original creator Alexey Milovidov (Yandex)

Company founded September 2021 (ClickHouse Inc.)

CEO Aaron Katz (co-founder)

License Apache 2.0

Written in C++

Notable users Cloudflare, Uber, eBay, Spotify, GitLab, Tinybird, Posthog, Sentry

Category Real-time Analytics

Monthly mindshare ~100K · ~36K GitHub stars; the speed champion; rising fast