NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
ksqlDB
ksqlDB is Confluent's SQL interface for stream processing on top of Apache Kafka. Originally launched as KSQL in 2017 and rebranded to ksqlDB in 2019, it has been quietly de-emphasized as Confluent has pivoted to Apache Flink as its strategic stream processing engine.
ksqlDB is a SQL interface for stream processing on top of Apache Kafka, built and maintained by Confluent. It lets you write declarative SQL queries that compile into long-running stream processing jobs against Kafka topics, with no Java or Scala code required. Originally launched as KSQL at Kafka Summit in August 2017 and rebranded to ksqlDB in November 2019, it was Confluent's flagship answer to "how do we make stream processing accessible to people who don't want to write Flink jobs?"
The honest 2026 take: ksqlDB is still useful for simple use cases inside Kafka-native shops, but Confluent itself has shifted strategically to Apache Flink as its preferred stream processing engine. ksqlDB still exists, still works, and still ships, but it is no longer the future Confluent is selling.
In 2017, Confluent had a problem. Kafka had won the streaming transport war, and Kafka Streams (a Java library for stream processing on Kafka) existed for engineers who wanted to write code. But there was a much larger audience — analysts, data engineers, application developers — who knew SQL but did not want to write a Kafka Streams Java application. Meanwhile, Apache Flink was emerging as the most powerful stream processing engine, but Flink was not part of the Kafka ecosystem and had its own learning curve.
Confluent's response was KSQL, designed by Hojjat Jafarpour (who joined Confluent specifically to lead the project) and the Confluent streaming team. The pitch was simple: write SQL, get streaming. The query syntax looked familiar to anyone who knew SQL, but with streaming primitives like WINDOW TUMBLING (SIZE 5 MINUTES) and continuous queries that returned results as new data arrived rather than running once and stopping.
The 2019 rebrand to ksqlDB came with an architectural expansion: Confluent positioned it not just as a stream processor but as a streaming database. They added pull queries (point lookups against materialized state), connector integration (running Kafka Connect connectors from inside ksqlDB), and a richer set of database-like features. The pitch became: "ksqlDB is your one-stop streaming database — ingest, transform, store, serve, all from SQL."
Under the hood, ksqlDB is built on Kafka Streams — it compiles your KSQL queries into Kafka Streams topologies, which then run as long-lived stream processing applications. A KSQL query like:
CREATE TABLE user_session_counts AS
SELECT user_id, COUNT(*) AS session_count
FROM user_sessions
WINDOW TUMBLING (SIZE 1 HOUR)
GROUP BY user_id
EMIT CHANGES;
becomes a Kafka Streams application that consumes from the user_sessions topic, maintains a per-user session count in local state (RocksDB), and emits updated counts to a result topic every time the count changes. The state lives on the ksqlDB server's local disk; Kafka itself stores the changelog backing the state for recovery.
The two core abstractions are streams (unbounded, append-only sequences of events, like a Kafka topic) and tables (the latest value for each key, derived from a stream by aggregation or compaction). You can convert between them, join them, window them, and emit results back to Kafka.
Where ksqlDB genuinely shines:
For these cases, ksqlDB is genuinely productive and well-fit.
By 2022-2023, several things had become clear:
1. Flink was a more capable engine. For complex stateful processing, low-latency joins, exactly-once across multiple sources and sinks, large-state workloads, and the most demanding production use cases, Flink simply did more and did it better. ksqlDB inherited Kafka Streams's design constraints, and Kafka Streams was optimized for "embedded library inside an application," not for "central streaming compute platform."
2. Flink SQL closed the accessibility gap. ksqlDB's main advantage over Flink was its SQL interface. As Flink SQL matured, that advantage shrank. By 2023, you could write the same kinds of SQL queries against Flink that you could against ksqlDB, with a much more powerful engine underneath.
3. The market consolidated around Flink. AWS, Alibaba, and almost every major vendor running managed stream processing converged on Flink. Confluent's customers, especially the larger ones, started asking for Flink rather than ksqlDB.
4. The licensing controversy. ksqlDB ships under the Confluent Community License, which is source-available but not OSI-approved open source. This limited its adoption outside Confluent shops and created friction for cloud vendors who would have hosted it.
In January 2023, Confluent acquired Immerok, a Flink company founded by ex-Ververica engineers, and made Flink SQL a first-class offering on Confluent Cloud. The strategic message, even if not stated explicitly, was that Flink is Confluent's stream processing future and ksqlDB is the legacy product for existing users.
The honest framework:
The category ksqlDB was built for — "SQL on streams, accessible to non-Java developers" — was a real category. Flink SQL ended up being its winner, not KSQL.
ksqlDB sits between Kafka topics. Inputs are Kafka topics; outputs are Kafka topics (or, for pull queries, point lookups against ksqlDB's internal state stores). It does not connect to external sources or sinks directly except via Kafka Connect connectors that ksqlDB can manage on your behalf. The architectural assumption is "Kafka is everything and ksqlDB is the SQL layer over it."
ksqlDB pushes processed events into Kafka topics, which downstream systems then consume into warehouses, lakehouses, or real-time OLAP databases. TextQL Ana connects to those downstream destinations, not to ksqlDB directly. The role ksqlDB plays in a TextQL stack is to do lightweight transformations on Kafka data before it lands in the analytical layer that TextQL queries.
See TextQL in action