Apache Pinot | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Apache Pinot

Apache Pinot is a real-time analytics database created at LinkedIn in 2013-2014 to power user-facing analytics features like 'Who Viewed Your Profile.' Open-sourced in 2015, it became an Apache top-level project in 2021, and is commercialized by StarTree.

Apache Pinot is a real-time OLAP database designed for the specific use case of user-facing analytics at extreme scale — the kind of analytics where every LinkedIn user, every Uber driver, every Slack admin, can pull up a personalized dashboard that aggregates millions of underlying events in under 100 milliseconds. It was built at LinkedIn, by LinkedIn engineers, to solve LinkedIn's specific problem: powering features like "Who Viewed Your Profile," "Talent Insights," and "Article Analytics" against the entire LinkedIn dataset, for hundreds of millions of users, with sub-second latency.

If Apache Druid was the engine of ad-tech analytics in the 2010s, Pinot was the engine of social-media-scale user-facing analytics. The two share a similar architectural lineage but were designed with different scale assumptions and different query patterns. In 2026, both are being squeezed by ClickHouse — but Pinot has a more defensible niche than Druid because of its specific strengths in extreme-concurrency, low-latency point lookups.

The LinkedIn Origin

Pinot was built starting around 2013-2014 inside LinkedIn by an engineering team led by Kishore Gopalakrishna, Praveen Neppalli Naga, and Jean-François Im. The motivating problem was a feature LinkedIn product managers kept wanting to ship: "show every user a personalized dashboard with stats about their profile, their posts, their network." The naive implementation — precomputed daily aggregates per user — was too stale and didn't allow drill-down. The "right" implementation — live aggregations over the raw event stream — was too slow on existing infrastructure to serve hundreds of millions of users at LinkedIn's traffic.

LinkedIn already had Druid in production for some use cases, but Druid wasn't optimized for the kinds of workloads LinkedIn wanted to support (millions of QPS, high-cardinality user-keyed queries, sub-100ms p99 latencies). So they built Pinot, with a deliberate design goal: support the LinkedIn-specific pattern of "every user gets a fast personalized dashboard," not just the ad-tech pattern of "a few analysts drill into aggregate data."

Pinot was open-sourced in 2015, became an Apache incubator project in 2018, and graduated to a top-level Apache project in December 2021. In 2019, several of the original creators left LinkedIn to found StarTree, the commercial company stewarding Pinot, building StarTree Cloud as a managed Pinot offering.

How Pinot Differs From Druid and ClickHouse

Pinot, Druid, and ClickHouse all share the basic real-time OLAP recipe (columnar storage, time-partitioned segments, streaming ingestion from Kafka). The differences are in the optimizations:

Pinot's distinctive strengths:

Star-tree indexes. Pinot's signature feature. A star-tree is a precomputed multi-dimensional index that lets the engine answer aggregation queries without scanning raw rows — you trade ingest-time computation for query-time speed. For LinkedIn's use case (the same dashboard being served to millions of different users with the same query shape), this is enormously powerful.
Extreme query concurrency. Pinot is designed for thousands or tens of thousands of QPS on a single cluster. Most real-time OLAP systems can do high QPS for simple lookups and lower QPS for complex aggregations; Pinot is engineered to keep both fast at LinkedIn-scale concurrency.
Upserts (real-time mutability). Pinot supports primary-key-based upserts with low latency, which is unusual for OLAP databases. This makes it useful for use cases where the latest state of an entity matters (e.g., "current order status by customer").
JSON and text indexes. Pinot has native support for JSON columns and full-text search via Lucene-backed indexes, blurring the line between OLAP database and search engine.

Druid's strengths over Pinot:

Larger commercial ecosystem (Imply has been around longer than StarTree).
Wider deployment outside its origin company.
Slightly more mature tooling.

ClickHouse's strengths over Pinot:

Operational simplicity (single C++ binary vs Pinot's multi-process Java architecture with ZooKeeper, controller, broker, server, minion).
Faster on most general-purpose benchmarks.
Larger community and ecosystem.
Better SQL support, especially for joins.

The honest summary: if your workload is "thousands of users hitting personalized dashboards with similar query shapes," Pinot is the best-fit engine in this category. If your workload is "general-purpose real-time OLAP with diverse query patterns," ClickHouse is probably faster, simpler, and easier to operate.

The Pinot User-Facing Analytics Niche

LinkedIn's "Who Viewed Your Profile" page is the canonical Pinot use case. Every LinkedIn user, when they load that page, triggers a query like "show this user the list of people who viewed their profile in the last 90 days, with counts and aggregations." The query is keyed by a single user ID. The data is sourced from billions of profile-view events across all of LinkedIn. The latency budget is under 100 milliseconds. The concurrency is hundreds of thousands of these queries per second across the LinkedIn user base.

This is a hard problem. Star-tree indexes plus high concurrency plus real-time ingestion from Kafka is exactly what Pinot was built for, and it does this kind of work better than anything else in its category.

Other companies have adopted Pinot for similar patterns:

Uber uses Pinot to power UberEats restaurant analytics, driver earnings dashboards, and several other in-product analytics features.
Stripe uses Pinot for parts of its Sigma analytics product.
Slack uses Pinot for workspace analytics shown to admins.
Walmart, Doordash, Target use Pinot for real-time merchandising and operational analytics.

The pattern: large consumer platforms with millions of end users who each get a personalized analytical view. Pinot is the right tool when the workload looks like this.

Where Pinot Loses

Outside its niche, Pinot has the same problems as Druid:

Operational complexity. Multiple process types (controller, broker, server, minion), ZooKeeper or Helix dependency, deep storage. Real engineering investment to operate.
JVM tax. Java runtime, garbage collection, off-heap memory tuning.
Smaller community than ClickHouse. Fewer integrations, fewer Stack Overflow answers, fewer engineers who already know the tool.
SQL is improving but still not at ClickHouse parity. Pinot's multi-stage query engine, added in recent versions, has improved join support significantly, but ClickHouse remains ahead on general SQL flexibility.

For general-purpose real-time analytics, ClickHouse is the easier and faster choice. Pinot's case rests on its specific optimizations for the user-facing analytics pattern.

The StarTree Bet

StarTree, like Imply for Druid and Confluent for Kafka, is the commercial company hoping to turn an open-source project into a sustainable business. StarTree's bet is that the user-facing analytics niche is large enough to support a managed-Pinot business, and that the operational simplification of StarTree Cloud will pull customers away from self-hosting. The competitive question is whether ClickHouse Cloud (which has more momentum, more capital, and more brand recognition) will absorb the user-facing analytics market over time.

Where Pinot Sits in the Stack

Pinot sits downstream of event streaming (Kafka is the standard ingest path) and serves queries to applications, BI tools, and end users. It is an analytical serving database, not a transformation engine — like Druid and ClickHouse, you typically pair it with Flink or another stream processor for non-trivial enrichment.

How TextQL Works with Pinot

TextQL Ana connects to Pinot via its SQL interface (REST or JDBC) and queries it the same way it queries other SQL backends. Where Pinot is genuinely interesting for TextQL users is in organizations that have already built user-facing analytics on Pinot — TextQL becomes a natural-language interface to data that already powers customer-visible features, with the same freshness and concurrency characteristics.

See TextQL in action

Apache Pinot

Created 2013-2014 at LinkedIn

Open-sourced 2015

Apache TLP December 2021

Original creators Kishore Gopalakrishna, Praveen Neppalli Naga, Jean-François Im (LinkedIn)

Commercial sponsor StarTree (founded 2019)

License Apache 2.0

Written in Java

Notable users LinkedIn, Uber, Stripe, Walmart, Slack, Doordash

Category Real-time Analytics

Monthly mindshare ~20K · ~5K GitHub stars; user-facing analytics niche; LinkedIn origin