Materialize | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Materialize

Materialize is a streaming database that maintains SQL views incrementally as data changes. Founded in 2019 by Frank McSherry and Arjun Narayan, it is the commercial flagship of differential dataflow -- the most academically interesting branch of stream processing.

Materialize is a streaming database. The simplest way to understand it: you write a SQL CREATE MATERIALIZED VIEW statement, and Materialize keeps that view up to date automatically, in real time, as the underlying data changes. No matter how complex the query — joins across multiple sources, multi-level aggregations, recursive CTEs — the view is always consistent with the latest input, and queries against it are sub-millisecond.

That sounds modest until you realize what it actually means. In a normal database, materialized views are recomputed periodically (slow) or maintained with hand-written triggers (fragile). In Materialize, view maintenance is the core engine, built from the ground up on a piece of computer science research called differential dataflow that was specifically designed to solve incremental computation as a first-class problem.

The Differential Dataflow Origin

The most important thing to understand about Materialize is that it is the commercial vehicle for Frank McSherry's decade-plus of research on incremental computation. McSherry was a researcher at Microsoft Research Silicon Valley in the early 2010s, where he and collaborators (including Derek Murray, Martín Abadi, Michael Isard, and Rebecca Isaacs) built Naiad — a system for incremental, iterative dataflow computation that produced one of the most cited systems papers of the decade. When Microsoft Research SV closed in 2014, McSherry continued the work in open source, building Timely Dataflow and Differential Dataflow as Rust libraries.

The mathematical core: a SQL query can be compiled into a dataflow graph, and that graph can be maintained incrementally — when an input record changes, you compute the delta through the dataflow and apply only that delta to the output. The result is that complex SQL views update in milliseconds when their inputs change, no matter how big the underlying tables are. This is provably correct, handles arbitrarily complex queries (including recursion and joins), and scales to large state.

In 2019, Arjun Narayan (a database researcher who had worked at Cockroach Labs) co-founded Materialize with McSherry and Nikhil Benesch to commercialize differential dataflow as a database. The pitch from day one: "your CTO has been promising real-time analytics for years, and you have been faking it with cron jobs. We will give you actual real-time, with SQL, with strong consistency, and you do not have to learn Flink."

What Materialize Actually Does

You connect Materialize to a source — typically a Kafka topic, a Postgres CDC stream, or a webhook — and the source becomes available as a streaming SQL table. Then you write SQL views:

CREATE MATERIALIZED VIEW active_user_summary AS
SELECT
  u.region,
  COUNT(DISTINCT s.user_id) AS active_users,
  SUM(s.duration_seconds) / 60.0 AS total_minutes
FROM sessions s
JOIN users u ON s.user_id = u.id
WHERE s.started_at > NOW() - INTERVAL '1 hour'
GROUP BY u.region;

In a normal warehouse, this query runs whenever you ask for it — maybe in seconds, maybe in minutes, depending on data size. In Materialize, the result is precomputed and maintained incrementally. As new sessions arrive, the view updates within milliseconds. When you SELECT from active_user_summary, you read from a precomputed table, not a recomputation.

Materialize speaks the PostgreSQL wire protocol, so any tool that talks to Postgres — BI tools, ORMs, application code, dbt — can talk to Materialize. This is a deliberate strategic choice: instead of inventing a new client API, Materialize meets developers where they already are.

It also supports SUBSCRIBE — a way for clients to get pushed notifications whenever a view changes. This turns Materialize into a backend for real-time UIs: your application subscribes to a view, and the database streams updates as they happen, with no polling.

Where Materialize Fits and Where It Doesn't

The honest case for Materialize: you have a use case where the result of a SQL query needs to be always-up-to-date, where the query is too complex for a hand-rolled cache, and where latency matters in milliseconds, not seconds. Examples:

Real-time leaderboards, counters, and live dashboards.
Operational alerting where the alert condition is a join across multiple streams.
Online feature pipelines for ML models that need consistent, low-latency feature lookups.
Anti-fraud and policy enforcement where rules are expressed as SQL over event streams.
Live, user-facing analytics in SaaS applications where each user's view depends on streaming data.

The cases where Materialize is not the right tool:

Bulk analytical queries over historical data. That's what a warehouse is for. Materialize is optimized for keeping a small-to-medium-sized result set continuously up to date, not for scanning a year of facts.
Simple stream-to-warehouse ingestion. Upsolver, Snowpipe Streaming, or Flink SQL is a better fit.
Massive-scale streaming joins or aggregations. Flink scales further than Materialize for the largest workloads. Materialize's sweet spot is "complex SQL over moderate data with millisecond freshness."

The Honest Vendor Take

Materialize is, technically, one of the most interesting databases in the world. The differential dataflow foundation is real computer science, the engineering team is exceptional, and the product solves real problems that no other tool solves quite the same way. For the use cases it fits, there is no substitute.

The strategic challenge is category size. The market for "streaming database that maintains SQL views incrementally" is real but smaller than the market for "stream processor that pushes events into a warehouse" (Flink + Snowflake/Databricks). Materialize is competing in a niche where the alternatives are either much more complex (Flink + a serving layer) or much simpler (just refresh a warehouse table every minute). Convincing customers that they need exactly incremental view maintenance — not eventual consistency, not micro-batch — is the perpetual education task.

Recent direction: Materialize has been investing heavily in productionization (multi-replica clusters, role-based access control, cloud-native operations) and in making the developer experience feel more like Postgres-with-superpowers than a research project. The bet is that once developers experience true incremental SQL, they don't want to go back.

Where Materialize Sits in the Stack

Materialize sits between event sources and consumers. Sources: Kafka, Pulsar, Kinesis, Postgres CDC, MySQL CDC, webhooks, S3. Consumers: BI tools (via the Postgres protocol), application code, real-time UIs (via SUBSCRIBE), and downstream systems via Kafka sinks. It is simultaneously a stream processor (it computes) and a serving database (you query it directly), which is unusual in the streaming world where those layers are typically separated.

How TextQL Works with Materialize

Materialize is queryable via standard SQL through the Postgres protocol, which means TextQL Ana can connect to it like any Postgres-compatible database. The interesting use case is real-time business questions: a Materialize view that maintains "current sales by region by product line, updated continuously as orders flow through Kafka" can be queried by TextQL with the freshness of seconds, not the freshness of yesterday's batch. Materialize is one of the few backends where TextQL users can ask questions about events that are happening right now.

See TextQL in action

Materialize

Founded 2019, New York, NY

Founders Arjun Narayan (CEO), Frank McSherry, Nikhil Benesch

Underlying tech Differential Dataflow / Timely Dataflow (Rust)

Wire protocol PostgreSQL-compatible

License BSL (source-available); SaaS commercial

Notable funding Lightspeed, Kleiner Perkins, Redpoint

Category Stream Processing

Monthly mindshare ~10K · incremental view maintenance; small but technically respected; ~6K GitHub stars