Redpanda | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Redpanda

Redpanda is a Kafka-compatible streaming platform built from scratch in C++ by Redpanda Data (formerly Vectorized), founded in 2019 by Alexander Gallego. It speaks the Kafka wire protocol but ships without a JVM, without ZooKeeper, and with a Raft-based architecture designed for lower latency and simpler operations.

Redpanda is a Kafka-compatible streaming platform built from scratch in C++. It speaks the Apache Kafka wire protocol, so existing Kafka clients (producers, consumers, Connect, Streams, schema clients) can talk to it without modification, but everything underneath the protocol is different: no JVM, no ZooKeeper, no separate broker-and-coordinator processes, a Raft-based replication protocol, and a thread-per-core execution model designed to extract maximum performance from modern hardware.

The plain-English version: Redpanda is what happens when someone looks at Kafka and decides the protocol is genius but the implementation is a 2010-era artifact. The bet is that the wire protocol — not the codebase — is what the ecosystem actually depends on, and that a clean-sheet rewrite can deliver the same compatibility with materially better performance and operational simplicity.

The Founding Story

Redpanda Data (originally Vectorized) was founded in 2019 by Alexander Gallego, a former Concord Systems and Akamai engineer who had spent years working on high-performance distributed systems. Gallego's pitch was deliberately contrarian: Kafka is the right abstraction, but Java was the wrong implementation language for a low-latency storage system. He wanted to rebuild Kafka in C++, using the Seastar framework (the same shared-nothing, thread-per-core C++ runtime that powers ScyllaDB), and with Raft as the consensus protocol instead of ZooKeeper.

The company raised a $15 million Series A from Lightspeed in 2020, followed by a $50 million Series B in 2021 led by GV (Google Ventures), and a $100 million Series C in 2022 led by Lightspeed at a unicorn valuation. The company renamed itself from Vectorized to Redpanda Data in 2022 to align with the product name.

Gallego has remained CEO. The engineering team is concentrated on a relatively small group of senior systems engineers — the C++ codebase requires a different talent profile than the Java/Scala Kafka ecosystem, which is both a moat and a constraint.

What Redpanda Actually Is

Strip away the marketing and Redpanda is three things:

1. A Kafka-compatible broker. Redpanda implements the Kafka wire protocol, so any Kafka client library, Kafka Connect connector, Kafka Streams application, or schema registry client works against Redpanda without code changes. From the application's perspective, Redpanda is Kafka. This is the most important property of the product — it eliminates the migration tax that killed Pulsar's adoption.

2. A C++ rewrite of the broker internals. Redpanda's implementation uses the Seastar framework, which is built around shared-nothing, thread-per-core execution. Each CPU core runs an independent reactor loop with no shared state between cores; coordination happens through explicit message passing. The result is dramatically reduced contention, no JVM garbage collection pauses, and predictable tail latencies.

3. A simplified operational model. Redpanda ships as a single binary with no external dependencies. There is no separate ZooKeeper or KRaft quorum to operate; metadata management is built into the brokers via Raft. There is no separate JVM to tune. Cluster operations — adding brokers, rebalancing partitions, recovering from failures — are designed to be simpler than the Kafka equivalent.

The differentiation Redpanda markets most aggressively is latency. Their public benchmarks claim sub-10ms p99 tail latencies under load, compared to multi-hundred-millisecond p99 latencies for tuned Kafka clusters at similar throughput. The benchmark methodology is debated, but the underlying point — that a non-JVM, thread-per-core implementation has structural latency advantages — is real.

The Architecture in Slightly More Detail

A few specifics worth knowing:

No ZooKeeper, no KRaft (in the Kafka sense). Redpanda uses Raft directly as its consensus protocol for both metadata and data replication. Each partition is a Raft group; leadership election, log replication, and fault tolerance are built into the partition itself. This is a more uniform model than Kafka's traditional ZooKeeper-plus-ISR architecture, and it predates Kafka's own KRaft transition.

Thread-per-core via Seastar. Each broker process pins one user-space thread to each CPU core. Each core handles a disjoint subset of partitions; there is no shared mutable state across cores. This architecture is borrowed from ScyllaDB and is the same approach that lets ScyllaDB outperform Cassandra by an order of magnitude on the same hardware.

Tiered storage to S3. Like Kafka and Pulsar, Redpanda supports automatic offload of cold segments to object storage (S3, GCS, Azure Blob). This dramatically reduces the cost of long retention.

Redpanda Connect. In 2024, Redpanda acquired the open-source streaming ETL project Benthos and rebranded it as Redpanda Connect, giving the platform a native data movement framework analogous to Kafka Connect.

Schema Registry compatibility. Redpanda ships its own schema registry that is wire-compatible with Confluent Schema Registry, so existing Avro/Protobuf clients work without changes.

The Licensing Note: BSL, not Apache

This is the most important non-technical thing to know about Redpanda. The Redpanda core is licensed under the Business Source License (BSL), not Apache 2.0. Apache Kafka is Apache 2.0 — a true permissive open-source license. Redpanda is "source-available with restrictions": you can read the code, modify it, and run it, but you cannot offer Redpanda as a competing managed service. After a defined period (typically four years), each release converts to Apache 2.0.

This is the same licensing strategy MongoDB, CockroachDB, Sentry, and HashiCorp adopted, and it has the same trade-off: it protects the vendor from cloud-provider clones (the AWS-RDS-for-Redpanda scenario) but disqualifies Redpanda from being called "open source" in the strict OSI sense. For most customers this distinction does not matter — you can still self-host Redpanda for free. For customers who care about license purity (large enterprises with open-source policies, government agencies, anyone burned by license changes), it does matter, and Apache Kafka remains the only fully Apache-licensed option in the Kafka-compatible ecosystem.

Where Redpanda Fits

Redpanda's pitch lands strongest in three scenarios:

1. Latency-sensitive workloads. Trading systems, real-time gaming, ad tech, fraud detection. Workloads where p99 tail latency in the hundreds of milliseconds is unacceptable and the JVM garbage collection floor is a real constraint.

2. Operational simplicity at small-to-medium scale. Teams that want a Kafka-compatible streaming platform without running a Kafka platform team. The single-binary, no-ZooKeeper deployment model genuinely is simpler than running Kafka 3.x with ZooKeeper, and arguably still simpler than Kafka 4.x with KRaft.

3. Cost-sensitive deployments at scale. Tiered storage plus more efficient hardware utilization (thread-per-core means you can run hot on a cheaper instance) can make Redpanda meaningfully cheaper than Confluent Cloud for the same workload, particularly with large retention windows.

The customers Redpanda has won publicly tend to fall into these buckets: financial services firms (latency), modern application teams (simplicity), and cost-conscious large-scale users (efficiency).

Where Redpanda Has to Beat Kafka — on Two Dimensions

The honest assessment of Redpanda's competitive position:

The Kafka ecosystem is sticky. Most of the value of Kafka in 2026 is not the broker itself — it is the ecosystem of connectors, stream processors, schema registries, monitoring tools, training material, and engineers who already know how it works. Redpanda inherits the protocol-compatible layer of that ecosystem (clients, Connect, Streams), but it has to reproduce or replace the parts that are tied to Confluent's commercial products. That is real work, and the gap is closing but not closed.

Confluent is a moving target. The KRaft transition in Kafka 4.0 took away one of Redpanda's best operational arguments (no ZooKeeper). Tiered storage in Kafka took away another (better cold-storage economics). Confluent's WarpStream acquisition adds an S3-backed cost-optimized option. Every Redpanda differentiator forces Confluent to respond, and Confluent has more resources and a larger contributor base to respond with.

Redpanda has to win on both performance AND operational simplicity to displace Kafka. Either alone is not enough. A faster Kafka that is harder to run will lose to vanilla Kafka. A simpler Kafka that is no faster will lose to managed Kafka (MSK, Confluent Cloud, Aiven). Redpanda's bet is that it can credibly claim both, and that the combination is enough to win greenfield workloads and migrate the latency-sensitive subset of existing Kafka users.

So far the evidence is mixed. Redpanda has real customers and real adoption, particularly in latency-sensitive niches. It has not displaced Kafka as the default choice for new streaming projects, and it is not clear it can. The BSL license also limits its appeal to a subset of the market that Apache Kafka does not have to think about.

The Honest Vendor Take

Redpanda is the most credible "Kafka but better" attempt to date. The engineering is real, the product works, the latency advantages are genuine for the workloads where they matter, and the operational simplicity is meaningful at small-to-medium scale. The team is one of the strongest distributed systems groups in commercial open-source data infrastructure, and Alexander Gallego's bet on a C++ rewrite has aged well technically.

But the Kafka ecosystem is enormous, the protocol is the moat, and Confluent has been steadily eroding Redpanda's differentiators with each release. Redpanda's path to dominance requires either Confluent stumbling, a meaningful shift in the latency-vs-cost calculus, or a sustained execution gap that Redpanda can widen. None of those are impossible; none are guaranteed.

For customers, the practical advice is: if you have explicit latency or operational-simplicity requirements that vanilla Kafka does not meet, Redpanda is the strongest alternative on the market. If you are picking a streaming platform with no specific reason to look beyond the default, Apache Kafka — run on Confluent Cloud, MSK, or self-hosted — is still the rational choice.

How TextQL Works with Redpanda

TextQL does not connect to Redpanda directly — like all streaming platforms, Redpanda is a transport, not a query engine. TextQL Ana connects to the analytical destinations that Redpanda events flow into: warehouses, lakehouses, real-time OLAP databases. From TextQL's perspective Redpanda looks identical to Kafka, because it speaks the same wire protocol — which is exactly the point.

See TextQL in action

Redpanda

Founded 2019 (as Vectorized)

Founder Alexander Gallego

Headquarters San Francisco, CA

Renamed Redpanda Data (2022)

Compatible with Apache Kafka wire protocol

Written in C++ (using Seastar framework)

License Business Source License (BSL) -- not Apache 2.0

Key products Redpanda (self-managed), Redpanda Cloud, Redpanda Connect

Category Event Streaming

Monthly mindshare ~15K · Kafka-compatible C++ rewrite; ~9K GitHub stars; rising but small