NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Snowpipe
Snowpipe is Snowflake's continuous, file-based ingestion service -- the way most Snowflake customers get data into the warehouse in near-real-time.
Snowpipe is the conveyor belt that drops new files into Snowflake automatically. You point it at a cloud storage location (an S3 bucket, an Azure container, a GCS bucket), tell it which target table to load into, and from then on, every new file that lands in that location gets loaded into Snowflake within a minute or two — without you running a COPY INTO statement, scheduling a job, or sizing a virtual warehouse.
The simple metaphor: Snowpipe is the mailroom of the data warehouse. Files arrive at the loading dock, the mailroom picks them up, and they end up sorted into the right table on the right shelf. You don't have to walk down to the dock and check.
Snowpipe became generally available in 2017, three years after Snowflake's core warehouse went GA. It was launched to solve a specific, embarrassing problem with the original Snowflake design: a "modern" cloud warehouse still required customers to run manual COPY INTO commands on a schedule to load data. That was acceptable in 2014 when Snowflake was selling against Teradata and the bar was low, but by 2017 the competitive landscape included streaming-native systems and the modern data stack was forming around the idea that data should just arrive. Customers wanted "set it and forget it" ingestion; Snowflake had nothing to offer them.
Snowpipe was the answer. Architecturally, it was a small but important shift: Snowflake would maintain its own serverless ingestion fleet (separate from customer virtual warehouses), watch cloud storage for new files via cloud-native event notifications (S3 Event Notifications, Azure Event Grid, GCS Pub/Sub), and load files within ~1 minute of arrival. Customers didn't have to size warehouses, didn't have to schedule jobs, and got billed in tiny per-file increments.
In 2023, Snowflake added Snowpipe Streaming, a row-level streaming API that bypasses the file-staging step entirely. That second iteration was the response to a different competitor: anyone using Kafka, Kinesis, or a CDC tool who wanted true low-latency streaming and was getting impatient with the file-based model. Snowpipe Streaming pushes the latency floor down from ~1 minute to a few seconds.
Classic Snowpipe is conceptually three pieces:
1. A stage. A pointer to a cloud storage location. Snowflake calls these "external stages" and they're just metadata that tells the warehouse "watch this S3 prefix."
2. A pipe. A small object that wraps a COPY INTO target_table FROM @stage statement. The pipe defines what gets loaded, into which table, with which file format and transformations.
3. An event hook. Either (a) the cloud provider notifies Snowflake automatically when a new file lands (event-based, the default and recommended path), or (b) your code calls Snowpipe's REST API to say "here are some new files." Snowflake then queues those files, picks them up on its serverless ingestion fleet, and loads them.
You pay per file loaded, not per warehouse-hour. There is no virtual warehouse to size for ingestion — Snowflake handles compute on its end. This pricing model is one of the reasons Snowpipe took off: for spiky, file-based workloads, it's dramatically cheaper than running a dedicated warehouse just to handle COPY INTO.
Snowpipe Streaming is a different beast. Instead of files, you push individual rows (or small batches of rows) through a Java SDK or a Snowflake Connector for Kafka. Rows land in the target table within seconds, not minutes. There's no file staging, no event notification, no COPY INTO. The tradeoff: you have to integrate the SDK into your producer, and the per-row pricing model is different. For Kafka users especially, Snowpipe Streaming is now the default path — it's cheaper, faster, and simpler than the file-based approach.
VARIANT column.COPY INTO. Anywhere you used to run a cron job that called COPY INTO, Snowpipe is almost always cheaper and lower-latency.COPY INTO statement, but anything substantial belongs in a downstream transformation layer (dbt, Snowpark, scheduled tasks).Snowpipe is one of the least glamorous and most important products in the Snowflake stack. Nobody buys Snowflake for Snowpipe, but almost every production Snowflake deployment depends on it. It quietly removed an entire category of customer pain (manual file loading, warehouse sizing for ingestion) and did so with a pricing model that aligned with how customers actually used it.
The interesting strategic shift is the move from file-based Snowpipe to row-based Snowpipe Streaming. File-based ingestion was a concession to the world Snowflake grew up in, where everyone already had S3 and the Modern Data Stack was file-batch underneath. Snowpipe Streaming is Snowflake admitting that the next decade is row-based and event-driven, and that if they don't own that ingestion path, Confluent or a streaming-native warehouse will. The two coexist, but the center of gravity is moving toward streaming.
The honest comparison to Databricks: Databricks Autoloader is the structural equivalent of Snowpipe (file-based, event-triggered, schema-evolving), and Delta Live Tables plus structured streaming sit in the same conceptual neighborhood as Snowpipe Streaming. Both companies arrived at similar designs for the same reason — the customer wants files to disappear from the dock and reappear in tables, automatically, with low latency, without managing a cluster.
Snowpipe is mostly invisible to TextQL: by the time TextQL Ana queries Snowflake, Snowpipe has already done its job. But it matters in one specific way — Snowpipe is what makes Snowflake feel "fresh enough" for AI-driven exploration. When a business user asks Ana about today's data, the reason that data exists is almost always because Snowpipe (or its streaming sibling) loaded it within the last few minutes. A well-configured Snowpipe pipeline is the difference between an AI analyst that feels live and one that feels stale.
See TextQL in action