ksqlDB | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

ksqlDB

ksqlDB is Confluent's SQL interface for stream processing on top of Apache Kafka. Originally launched as KSQL in 2017 and rebranded to ksqlDB in 2019, it has been quietly de-emphasized as Confluent has pivoted to Apache Flink as its strategic stream processing engine.

ksqlDB is a SQL interface for stream processing on top of Apache Kafka, built and maintained by Confluent. It lets you write declarative SQL queries that compile into long-running stream processing jobs against Kafka topics, with no Java or Scala code required. Originally launched as KSQL at Kafka Summit in August 2017 and rebranded to ksqlDB in November 2019, it was Confluent's flagship answer to "how do we make stream processing accessible to people who don't want to write Flink jobs?"

The honest 2026 take: ksqlDB is still useful for simple use cases inside Kafka-native shops, but Confluent itself has shifted strategically to Apache Flink as its preferred stream processing engine. ksqlDB still exists, still works, and still ships, but it is no longer the future Confluent is selling.

The Origin and the Pitch

In 2017, Confluent had a problem. Kafka had won the streaming transport war, and Kafka Streams (a Java library for stream processing on Kafka) existed for engineers who wanted to write code. But there was a much larger audience — analysts, data engineers, application developers — who knew SQL but did not want to write a Kafka Streams Java application. Meanwhile, Apache Flink was emerging as the most powerful stream processing engine, but Flink was not part of the Kafka ecosystem and had its own learning curve.

Confluent's response was KSQL, designed by Hojjat Jafarpour (who joined Confluent specifically to lead the project) and the Confluent streaming team. The pitch was simple: write SQL, get streaming. The query syntax looked familiar to anyone who knew SQL, but with streaming primitives like WINDOW TUMBLING (SIZE 5 MINUTES) and continuous queries that returned results as new data arrived rather than running once and stopping.

The 2019 rebrand to ksqlDB came with an architectural expansion: Confluent positioned it not just as a stream processor but as a streaming database. They added pull queries (point lookups against materialized state), connector integration (running Kafka Connect connectors from inside ksqlDB), and a richer set of database-like features. The pitch became: "ksqlDB is your one-stop streaming database — ingest, transform, store, serve, all from SQL."

How ksqlDB Actually Works

Under the hood, ksqlDB is built on Kafka Streams — it compiles your KSQL queries into Kafka Streams topologies, which then run as long-lived stream processing applications. A KSQL query like:

CREATE TABLE user_session_counts AS
  SELECT user_id, COUNT(*) AS session_count
  FROM user_sessions
  WINDOW TUMBLING (SIZE 1 HOUR)
  GROUP BY user_id
  EMIT CHANGES;

becomes a Kafka Streams application that consumes from the user_sessions topic, maintains a per-user session count in local state (RocksDB), and emits updated counts to a result topic every time the count changes. The state lives on the ksqlDB server's local disk; Kafka itself stores the changelog backing the state for recovery.

The two core abstractions are streams (unbounded, append-only sequences of events, like a Kafka topic) and tables (the latest value for each key, derived from a stream by aggregation or compaction). You can convert between them, join them, window them, and emit results back to Kafka.

Strengths and Real Use Cases

Where ksqlDB genuinely shines:

Simple Kafka topic-to-topic transformations. Filtering, projection, format conversion, simple joins. KSQL is the fastest way to express "take topic A, do X, write to topic B."
Light enrichment of streams with reference data. Joining a fast-moving event stream with a slowly-changing lookup table, where both live in Kafka.
Quick prototyping of stream processing logic. You can iterate in a SQL prompt much faster than you can compile a Flink job.
Confluent Platform shops where Kafka is the center of gravity. If your team is already deep in Confluent and you don't want to introduce a separate Flink cluster, ksqlDB is the integrated option.

For these cases, ksqlDB is genuinely productive and well-fit.

Why ksqlDB Lost Strategic Momentum

By 2022-2023, several things had become clear:

1. Flink was a more capable engine. For complex stateful processing, low-latency joins, exactly-once across multiple sources and sinks, large-state workloads, and the most demanding production use cases, Flink simply did more and did it better. ksqlDB inherited Kafka Streams's design constraints, and Kafka Streams was optimized for "embedded library inside an application," not for "central streaming compute platform."

2. Flink SQL closed the accessibility gap. ksqlDB's main advantage over Flink was its SQL interface. As Flink SQL matured, that advantage shrank. By 2023, you could write the same kinds of SQL queries against Flink that you could against ksqlDB, with a much more powerful engine underneath.

3. The market consolidated around Flink. AWS, Alibaba, and almost every major vendor running managed stream processing converged on Flink. Confluent's customers, especially the larger ones, started asking for Flink rather than ksqlDB.

4. The licensing controversy. ksqlDB ships under the Confluent Community License, which is source-available but not OSI-approved open source. This limited its adoption outside Confluent shops and created friction for cloud vendors who would have hosted it.

In January 2023, Confluent acquired Immerok, a Flink company founded by ex-Ververica engineers, and made Flink SQL a first-class offering on Confluent Cloud. The strategic message, even if not stated explicitly, was that Flink is Confluent's stream processing future and ksqlDB is the legacy product for existing users.

Should You Use ksqlDB in 2026?

The honest framework:

You have a small, simple Kafka transformation use case and you're already on Confluent Platform. Use ksqlDB. It's there, it works, and it's the simplest path.
You're starting a new streaming compute project on Confluent Cloud. Use Flink SQL on Confluent Cloud. Confluent itself will steer you there.
You need complex stateful processing, large state, or maximum performance. Use Apache Flink directly, or a managed Flink service.
You want incremental SQL view maintenance with millisecond freshness. Use Materialize, not ksqlDB.

The category ksqlDB was built for — "SQL on streams, accessible to non-Java developers" — was a real category. Flink SQL ended up being its winner, not KSQL.

Where ksqlDB Sits in the Stack

ksqlDB sits between Kafka topics. Inputs are Kafka topics; outputs are Kafka topics (or, for pull queries, point lookups against ksqlDB's internal state stores). It does not connect to external sources or sinks directly except via Kafka Connect connectors that ksqlDB can manage on your behalf. The architectural assumption is "Kafka is everything and ksqlDB is the SQL layer over it."

How TextQL Works with ksqlDB

ksqlDB pushes processed events into Kafka topics, which downstream systems then consume into warehouses, lakehouses, or real-time OLAP databases. TextQL Ana connects to those downstream destinations, not to ksqlDB directly. The role ksqlDB plays in a TextQL stack is to do lightweight transformations on Kafka data before it lands in the analytical layer that TextQL queries.

See TextQL in action

ksqlDB

Original release August 2017 (as KSQL)

Rebranded November 2019 (as ksqlDB)

Vendor Confluent

License Confluent Community License (source-available, not OSI-approved)

Built on Kafka Streams

Interface SQL-like (KSQL)

Strategic status De-emphasized in favor of Flink SQL on Confluent Cloud

Category Stream Processing

Monthly mindshare ~15K · Confluent's SQL-on-Kafka; de-emphasized after Immerok acquisition