NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Amazon Redshift
Amazon Redshift — AWS's MPP columnar data warehouse. The first major cloud data warehouse, launched in 2012 based on ParAccel, and still the default warehouse for deeply AWS-native shops.
Amazon Redshift is AWS's cloud data warehouse. It was the first one on the market at meaningful scale, launching in late 2012, and for a few years it basically was the cloud data warehouse category. Today it's the quietest of the "big four" — not the hype vendor at conferences, not the darling of data-influencer Twitter — but it's still running some of the largest warehouses on earth inside AWS's biggest customers.
The simple way to think about Redshift: it's what happens when you take a classic on-prem warehouse and put it on an AWS price tag. Unlike BigQuery (which started serverless) or Snowflake (which started multi-cloud), Redshift was born as a managed version of a traditional MPP database. That original DNA is both its greatest strength (predictable, fast, deeply AWS-native) and its greatest liability (it was slower to adopt the decoupled-storage, serverless world that Snowflake and BigQuery built their brands on).
In 2012, AWS was shopping for a warehouse engine to managed-host. It settled on ParAccel, a Delaware-founded analytical database company that had built an MPP columnar database as a fork of PostgreSQL 8.0. AWS licensed ParAccel's code, hired a team to AWS-ify it, and on November 28, 2012 at re:Invent, Andy Jassy announced it under the name Redshift — a tongue-in-cheek jab at Oracle (whose corporate color is red, and which AWS was actively trying to move enterprise customers away from). "Redshift" in astronomy means moving away from the observer; the message was "move off Oracle and onto us."
Redshift went GA in February 2013 at a price AWS could shout from rooftops: roughly $1,000 per terabyte per year, more than 10x cheaper than Teradata or Oracle Exadata at the time. For an industry used to seven-figure warehouse capex, this was a shock. Redshift became the default starter warehouse for a generation of startups and data teams — Airbnb, Pinterest, Lyft, and thousands of others ran on it.
The ParAccel ancestry still shows: Redshift's SQL dialect is PostgreSQL-flavored, its query planner lineage is traditional MPP, and its distribution-key / sort-key tuning knobs are direct descendants of ParAccel's physical design options. For years, being good at Redshift meant being good at dist keys and sort keys — an art form most data engineers were glad to leave behind when Snowflake showed up.
Redshift has gone through three major architectural eras, and understanding them is most of understanding Redshift today.
Era 1: Classic clusters (2012–2019). You provisioned a cluster of nodes — one leader node and N compute nodes — and storage lived on the local SSDs of the compute nodes. This meant compute and storage scaled together. If you ran out of disk, you added nodes (and doubled compute whether you needed it or not). If you needed more compute, you also got more disk. It was fast, but inelastic. Sound familiar? This is the exact problem Snowflake was founded to solve.
Era 2: RA3 and managed storage (2019). AWS launched RA3 node types, which separate storage from compute by using S3 as the backing store, with local SSDs acting as a cache. This was Redshift's "catch up to Snowflake" moment. You could finally scale compute and storage independently. Data Sharing (copy-free sharing between clusters) and cross-cluster queries followed.
Era 3: Redshift Serverless (2021, GA 2022). AWS launched Redshift Serverless, which drops the concept of a "cluster" entirely. You specify a base RPU (Redshift Processing Units) capacity, and AWS auto-scales compute up and down. You only pay when queries run. This is Redshift's answer to BigQuery's zero-ops pitch and Snowflake's auto-suspend, and for many workloads it's now the recommended starting point.
Under the hood, Redshift is still a columnar MPP database: data is distributed across slices (one per CPU core), queries are compiled to native code via a result-caching compiler, and the AQUA (Advanced Query Accelerator) hardware layer uses FPGAs in some node types to push filters down into storage. Redshift raw performance is genuinely competitive with Snowflake and BigQuery on most workloads; the difference is almost always operational experience, not speed.
Redshift pricing is simpler than BigQuery's but has more dials than Snowflake's:
AWS heavily pushes reserved instances to enterprise customers because it locks in revenue. The effective discount vs Snowflake credits depends enormously on utilization. A well-utilized reserved Redshift cluster can be meaningfully cheaper than Snowflake; a poorly-utilized one is meaningfully more expensive.
AWS's pitch is straightforward: Redshift is the warehouse for AWS-native data estates. Their argument has three legs.
Where AWS's pitch is biased: they rarely talk about developer experience, ecosystem momentum, or multi-cloud. Snowflake has a much richer partner ecosystem, a much simpler mental model, and can run on AWS, Azure, or GCP. Databricks has a stronger data science story. BigQuery has a better serverless story. Redshift's honest answer to all of these is "yes, but you're already on AWS, so who cares?" For a large minority of customers, that answer is sufficient. For everyone else, it's why Redshift has quietly lost market share to Snowflake since ~2018.
The internal AWS story is also more complicated than the marketing. AWS has several overlapping analytical products — Athena (serverless Presto on S3), EMR (managed Spark/Hadoop), OpenSearch, and Redshift — and customers regularly struggle to pick between them. The recent SageMaker Lakehouse and Amazon S3 Tables launches (the latter offering managed Iceberg tables in S3) suggest AWS is nudging its lake/warehouse story toward a more unified architecture, but the lines are still blurry.
Redshift is good at:
Redshift is bad at:
The honest read: Redshift is consolidating into the AWS-native segment rather than competing head-on with Snowflake or Databricks for net-new logos. AWS's strategy appears to be:
Expect Redshift to remain the warehouse of choice for tens of thousands of AWS-centric enterprises for years, while ceding the innovation narrative to others.
TextQL Ana connects to Redshift (both provisioned and Serverless) via the standard JDBC/ODBC interface and respects Redshift's role-based access and row/column security. Because Redshift is PostgreSQL-compatible, TextQL's SQL generation handles it cleanly without dialect-specific workarounds. For AWS-native customers, TextQL runs inside the customer's VPC and queries Redshift using a least-privileged IAM role — meaning business users get natural-language analytics without data leaving the AWS security perimeter.
See TextQL in action
Related topics