Amazon Redshift | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Amazon Redshift

Amazon Redshift — AWS's MPP columnar data warehouse. The first major cloud data warehouse, launched in 2012 based on ParAccel, and still the default warehouse for deeply AWS-native shops.

Amazon Redshift is AWS's cloud data warehouse. It was the first one on the market at meaningful scale, launching in late 2012, and for a few years it basically was the cloud data warehouse category. Today it's the quietest of the "big four" — not the hype vendor at conferences, not the darling of data-influencer Twitter — but it's still running some of the largest warehouses on earth inside AWS's biggest customers.

The simple way to think about Redshift: it's what happens when you take a classic on-prem warehouse and put it on an AWS price tag. Unlike BigQuery (which started serverless) or Snowflake (which started multi-cloud), Redshift was born as a managed version of a traditional MPP database. That original DNA is both its greatest strength (predictable, fast, deeply AWS-native) and its greatest liability (it was slower to adopt the decoupled-storage, serverless world that Snowflake and BigQuery built their brands on).

Origin Story: ParAccel, 2012, and "Redshift" the Name

In 2012, AWS was shopping for a warehouse engine to managed-host. It settled on ParAccel, a Delaware-founded analytical database company that had built an MPP columnar database as a fork of PostgreSQL 8.0. AWS licensed ParAccel's code, hired a team to AWS-ify it, and on November 28, 2012 at re:Invent, Andy Jassy announced it under the name Redshift — a tongue-in-cheek jab at Oracle (whose corporate color is red, and which AWS was actively trying to move enterprise customers away from). "Redshift" in astronomy means moving away from the observer; the message was "move off Oracle and onto us."

Redshift went GA in February 2013 at a price AWS could shout from rooftops: roughly $1,000 per terabyte per year, more than 10x cheaper than Teradata or Oracle Exadata at the time. For an industry used to seven-figure warehouse capex, this was a shock. Redshift became the default starter warehouse for a generation of startups and data teams — Airbnb, Pinterest, Lyft, and thousands of others ran on it.

The ParAccel ancestry still shows: Redshift's SQL dialect is PostgreSQL-flavored, its query planner lineage is traditional MPP, and its distribution-key / sort-key tuning knobs are direct descendants of ParAccel's physical design options. For years, being good at Redshift meant being good at dist keys and sort keys — an art form most data engineers were glad to leave behind when Snowflake showed up.

Architecture: Leader + Compute, Then RA3, Then Serverless

Redshift has gone through three major architectural eras, and understanding them is most of understanding Redshift today.

Era 1: Classic clusters (2012–2019). You provisioned a cluster of nodes — one leader node and N compute nodes — and storage lived on the local SSDs of the compute nodes. This meant compute and storage scaled together. If you ran out of disk, you added nodes (and doubled compute whether you needed it or not). If you needed more compute, you also got more disk. It was fast, but inelastic. Sound familiar? This is the exact problem Snowflake was founded to solve.

Era 2: RA3 and managed storage (2019). AWS launched RA3 node types, which separate storage from compute by using S3 as the backing store, with local SSDs acting as a cache. This was Redshift's "catch up to Snowflake" moment. You could finally scale compute and storage independently. Data Sharing (copy-free sharing between clusters) and cross-cluster queries followed.

Era 3: Redshift Serverless (2021, GA 2022). AWS launched Redshift Serverless, which drops the concept of a "cluster" entirely. You specify a base RPU (Redshift Processing Units) capacity, and AWS auto-scales compute up and down. You only pay when queries run. This is Redshift's answer to BigQuery's zero-ops pitch and Snowflake's auto-suspend, and for many workloads it's now the recommended starting point.

Under the hood, Redshift is still a columnar MPP database: data is distributed across slices (one per CPU core), queries are compiled to native code via a result-caching compiler, and the AQUA (Advanced Query Accelerator) hardware layer uses FPGAs in some node types to push filters down into storage. Redshift raw performance is genuinely competitive with Snowflake and BigQuery on most workloads; the difference is almost always operational experience, not speed.

Pricing and How It's Sold

Redshift pricing is simpler than BigQuery's but has more dials than Snowflake's:

Provisioned clusters (RA3) — pay per node-hour, reserved instances give 1- or 3-year discounts up to ~75%.
Redshift Serverless — pay per RPU-hour of compute used, plus S3 storage.
Redshift Spectrum — a separate pricing line for querying S3 data from a Redshift cluster, billed per TB scanned (similar to BigQuery's on-demand).

AWS heavily pushes reserved instances to enterprise customers because it locks in revenue. The effective discount vs Snowflake credits depends enormously on utilization. A well-utilized reserved Redshift cluster can be meaningfully cheaper than Snowflake; a poorly-utilized one is meaningfully more expensive.

AWS's Positioning, and Where It's Biased

AWS's pitch is straightforward: Redshift is the warehouse for AWS-native data estates. Their argument has three legs.

IAM and VPC integration. Redshift lives inside your VPC, uses your IAM roles, respects your security groups, and integrates with AWS Lake Formation for lake governance. For regulated industries where the security review is half the procurement process, "it's another AWS service" is a huge advantage.
Zero-ETL. AWS has been aggressively rolling out "zero-ETL" integrations — automatic replication from Aurora, DynamoDB, RDS, and S3 into Redshift without any pipeline code. The pitch is "your operational data just shows up in the warehouse." This is AWS's ecosystem advantage weaponized.
Price/performance at scale. AWS publishes benchmarks (take them with salt, as you should any vendor benchmark) claiming Redshift beats Snowflake and BigQuery on price/performance for large, steady workloads.

Where AWS's pitch is biased: they rarely talk about developer experience, ecosystem momentum, or multi-cloud. Snowflake has a much richer partner ecosystem, a much simpler mental model, and can run on AWS, Azure, or GCP. Databricks has a stronger data science story. BigQuery has a better serverless story. Redshift's honest answer to all of these is "yes, but you're already on AWS, so who cares?" For a large minority of customers, that answer is sufficient. For everyone else, it's why Redshift has quietly lost market share to Snowflake since ~2018.

The internal AWS story is also more complicated than the marketing. AWS has several overlapping analytical products — Athena (serverless Presto on S3), EMR (managed Spark/Hadoop), OpenSearch, and Redshift — and customers regularly struggle to pick between them. The recent SageMaker Lakehouse and Amazon S3 Tables launches (the latter offering managed Iceberg tables in S3) suggest AWS is nudging its lake/warehouse story toward a more unified architecture, but the lines are still blurry.

What Redshift Is Good At (and Not)

Redshift is good at:

Deep AWS integration. IAM, VPC, KMS, CloudWatch, Lake Formation, Glue, S3, Kinesis — all first-class. If your security team requires data never leaves your AWS account, Redshift is a very clean answer.
Predictable, steady-state workloads. A well-sized reserved RA3 cluster running a stable dbt job set is cost-effective and fast.
Federated queries. Redshift can query across S3 (Spectrum), Aurora, RDS, and other sources in one SQL statement. This is genuinely useful and underappreciated.
PostgreSQL-compatible SQL. Most Postgres tooling and drivers just work. Migration from Postgres warehouses is easier than from Oracle or SQL Server.

Redshift is bad at:

Zero-ops simplicity. Provisioned Redshift still asks you to think about node types, resize operations, WLM queues, vacuum, and dist/sort keys. Serverless hides most of this but not all.
Multi-tenant workload isolation. Snowflake's per-warehouse isolation is cleaner than Redshift's WLM (Workload Management) queues, which are notoriously fiddly.
Multi-cloud. Redshift runs on AWS only. For organizations that are strategically multi-cloud, Redshift is a non-starter.
Developer hype. Redshift has fallen out of the "cool kid" discourse. Recruiting data engineers who are excited to work on Redshift (vs Snowflake or Databricks) is harder in 2026 than it was in 2016.
Concurrency at extreme scale. Serverless helps a lot, but provisioned Redshift historically struggled with thundering-herd BI workloads unless you stood up Concurrency Scaling clusters.

Where the Puck Is Going

The honest read: Redshift is consolidating into the AWS-native segment rather than competing head-on with Snowflake or Databricks for net-new logos. AWS's strategy appears to be:

Make Redshift Serverless the default surface so small teams stop being scared off by cluster management.
Use zero-ETL and S3 Tables (Iceberg) to make it trivially easy to get data into Redshift from elsewhere in AWS.
Compete on security, compliance, and enterprise procurement rather than on developer experience.
Let SageMaker Lakehouse and Athena pick up the "open-format, lake-centric" persona that Databricks owns.

Expect Redshift to remain the warehouse of choice for tens of thousands of AWS-centric enterprises for years, while ceding the innovation narrative to others.

TextQL and Redshift

TextQL Ana connects to Redshift (both provisioned and Serverless) via the standard JDBC/ODBC interface and respects Redshift's role-based access and row/column security. Because Redshift is PostgreSQL-compatible, TextQL's SQL generation handles it cleanly without dialect-specific workarounds. For AWS-native customers, TextQL runs inside the customer's VPC and queries Redshift using a least-privileged IAM role — meaning business users get natural-language analytics without data leaving the AWS security perimeter.

See TextQL in action

Amazon Redshift

Founded 2012 (GA February 2013)

HQ Seattle, WA

Parent Amazon Web Services

Category Data Warehouse

Based on ParAccel (PostgreSQL fork, MPP)

Architecture MPP columnar, leader + compute nodes; RA3 / Serverless

SQL dialect PostgreSQL-compatible

Monthly mindshare ~300K · ~30K AWS accounts; was the original cloud DW; declining vs Snowflake/BQ but still entrenched