Amazon Kinesis | Data Ecosystem Wiki

Contents

Amazon Kinesis

Amazon Kinesis is AWS's managed event streaming service, launched in 2013 as Amazon's homegrown answer to Apache Kafka. It is the path of least resistance for streaming inside AWS, but loses to Kafka almost everywhere else.

Amazon Kinesis is AWS's managed event streaming service. It is, functionally, AWS's answer to Apache Kafka — a durable, partitioned log that producers write to and consumers read from — but built entirely as an AWS service with AWS APIs and AWS-flavored everything. Amazon launched it at re:Invent in November 2013, just as Kafka was becoming popular outside LinkedIn, and the strategic intent was clear: don't let Kafka become an unmanaged piece of infrastructure that AWS customers have to run themselves on EC2.

A decade later, the verdict is mixed. Kinesis is genuinely useful and very widely deployed inside AWS shops, but it never escaped the AWS ecosystem and Kafka is still the dominant cross-cloud, cross-vendor standard. Amazon eventually conceded the point and launched Amazon MSK (Managed Streaming for Apache Kafka) in 2019, giving customers a "real Kafka" option on AWS. Today, the two services coexist awkwardly inside AWS's own portfolio.

The Four Kinesis Services

"Kinesis" is actually a family of four services that share branding and not much else:

Kinesis Data Streams. This is the core: a Kafka-like distributed log. You create a stream, configure shards (partitions), and write records. Consumers read from shards and track their position. Records are retained for 24 hours by default, configurable up to 365 days. This is the service people mean when they say "Kinesis" without qualification.

Kinesis Data Firehose. A simpler "fire and forget" service that takes a stream of records and lands them in S3, Redshift, OpenSearch, or a few other destinations. No consumer code — you just configure a destination. Firehose is genuinely useful for the "I want all my events in S3 with minimal effort" use case, and it has no good Kafka equivalent.

Kinesis Data Analytics (now Managed Service for Apache Flink). A managed Apache Flink service for stream processing. Originally launched with a SQL interface based on Apache Calcite, AWS pivoted in 2020-2021 to make it a Flink runtime. The current product is genuinely Flink under the hood.

Kinesis Video Streams. A separate product for streaming video and audio from connected devices into AWS. Different use case, shares almost nothing technically with Data Streams. Mostly relevant to IoT and surveillance workloads.

How Kinesis Compares to Kafka

The conceptual model is nearly identical — streams of records, partitioned for parallelism, consumed by independent consumers tracking their own position. The differences are mostly about operations and ecosystem:

Shards vs. partitions. Kinesis calls its parallelism unit a "shard." Each shard has fixed capacity (1 MB/sec or 1,000 records/sec write, 2 MB/sec read). To scale up, you split shards; to scale down, you merge them. This is more rigid than Kafka partitions and creates surprises when traffic spikes — many Kinesis users have been bitten by ProvisionedThroughputExceededException errors during a Black Friday event. AWS introduced on-demand mode in 2021, which auto-scales shard capacity, partly addressing this.

Retention. Kafka can retain data effectively forever (limited by disk and now, with tiered storage, by S3). Kinesis maxes out at 365 days, and the longer the retention, the higher the per-GB cost. For "replay last year's events" use cases, Kafka with tiered storage wins.

Ecosystem. This is where Kinesis loses badly. Every modern data tool — Flink, Spark, Snowflake, Databricks, ClickHouse, dbt, Debezium, Fivetran — has first-class Kafka support. Kinesis support is usually present but secondary, sometimes via an adapter, sometimes lagging in features. If you want to plug your stream into the broader data ecosystem, Kafka is the wire format that everyone speaks.

Multi-cloud and portability. Kinesis is AWS-only. There is no Kinesis on GCP, no Kinesis on-prem, no open-source Kinesis. If your company ever wants to move workloads to another cloud, your Kinesis pipelines do not move with you. Kafka does.

Pricing and operations. Kinesis is fully managed with no servers to run, which is its core selling point. You pay per shard-hour and per PUT payload unit. At small scale this is cheap; at very large scale it gets expensive enough that customers start considering self-managed Kafka or Confluent Cloud as alternatives.

When Kinesis Is the Right Answer

The honest case for Kinesis: you are an all-AWS shop, you do not want to operate Kafka, you do not need cross-cloud portability, and your use case fits within Kinesis's capacity model. This describes a lot of companies, especially those whose data team is a few engineers managing infrastructure on the side. For these teams, Kinesis Data Streams plus Firehose plus Lambda is a genuinely productive stack: events flow from producers into streams, Lambda functions process them, and Firehose dumps the raw stream into S3 for the warehouse to pick up.

Kinesis Firehose specifically is the unsung hero of this story. The "I just want my clickstream events in S3" use case is enormous, and Firehose solves it with about ten lines of configuration. There is no equivalent in the Kafka world that is as turnkey — you have to set up Kafka Connect with the S3 sink, manage the cluster, handle failures, etc.

When Kinesis Is the Wrong Answer

If any of the following apply, Kinesis is probably the wrong choice:

You operate across clouds or have on-prem infrastructure. Use Kafka.
You need the broader streaming ecosystem (Flink, Spark Streaming, Debezium CDC, ksqlDB). Use Kafka. The Kinesis adapters exist but are second-class.
You need long retention for replay or event sourcing. Use Kafka with tiered storage.
You have a team that can operate Kafka or you can pay Confluent Cloud. Then there is no operational benefit to Kinesis, and Kafka's ecosystem advantage is decisive.

There is also a third option that more and more AWS shops are picking: Amazon MSK, AWS's own managed Kafka. MSK gives you the Kafka wire protocol, the Kafka ecosystem, and AWS's billing and IAM integration. It is, in many ways, AWS's tacit admission that Kinesis Data Streams did not win the protocol war.

Where Kinesis Sits in the Stack

Kinesis lives in the same layer as Kafka — it is a transport between producers (application code, IoT devices, CDC streams) and consumers (stream processors, warehouses, real-time databases, Lambda functions). The most common pattern in AWS shops: producers write to a Kinesis Data Stream, a Lambda function or Flink job processes events, and Firehose lands the raw stream in S3 for downstream analytics in Athena, Redshift, or Snowflake.

How TextQL Works with Kinesis

Like Kafka, Kinesis is a transport layer that TextQL does not query directly. TextQL Ana connects to the systems where Kinesis events eventually land — most commonly S3 (queried via Athena or loaded into Snowflake/Redshift), or directly into a data warehouse via Firehose. The Kinesis layer determines how fresh the data is by the time TextQL sees it; the typical Firehose-to-S3-to-warehouse path lands events in front of analytics within minutes.

See TextQL in action

Amazon Kinesis

Launched November 2013 at AWS re:Invent

Vendor Amazon Web Services

License Proprietary, AWS-only

Components Data Streams, Data Firehose, Data Analytics, Video Streams

Pricing model Per shard-hour and per PUT payload unit (or on-demand)

Category Event Streaming

Monthly mindshare ~100K · AWS-native streaming; smaller than Kafka by Stack Overflow tag activity (~5x less)