NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Dagster
Dagster is the modern, asset-oriented workflow orchestrator. Created by Nick Schrock in 2018, it reframes orchestration around data assets rather than tasks and is the recommended choice for new data platforms.
Dagster is a modern workflow orchestrator built around the radical idea that the right primitive for a data pipeline is not a task but an asset. Where Apache Airflow thinks in terms of "this script ran successfully," Dagster thinks in terms of "this table is now fresh." That sounds like a small distinction. It is not. It changes how you build, debug, and grow a data platform. For new data infrastructure projects in 2026, Dagster is generally the recommended choice.
Dagster was created by Nick Schrock, who is best known in the broader engineering world as one of the co-creators of GraphQL at Facebook (along with Lee Byron and Dan Schafer). At Facebook, Schrock spent years on the Product Infrastructure team building tools that helped product engineers ship features quickly — the same team that produced React and GraphQL. After eight years at Facebook, he left in 2017 to start a new company, Elementl (later renamed Dagster Labs), with the explicit goal of fixing what he saw as the broken state of data engineering tooling.
Schrock's diagnosis: data engineers were stuck in a 2014-era abstraction. Airflow had won the orchestration market by being a Python-defined DAG runner, but its model — "tasks that produce side effects" — was the wrong primitive. In real life, what data engineers care about is whether the table in the warehouse is correct and current. Tasks are an implementation detail. Whatever framework you use should let you talk about the things you actually care about, not the scripts that produce them.
Dagster was first announced publicly in 2018, hit 1.0 in August 2022, and the company has raised over $50 million from Sequoia and Index Ventures. The managed product, Dagster+ (originally Dagster Cloud), launched in 2022 and became the company's primary commercial offering.
In Airflow you write a task: def load_orders(): .... The task runs. Whether the orders table actually got updated, and whether downstream consumers should now run, is up to you to wire together.
In Dagster, you write a software-defined asset:
@asset
def orders():
return read_from_source()
@asset
def daily_revenue(orders):
return orders.groupby("date").sum()
You declared two assets. Dagster figured out the DAG (because daily_revenue takes orders as an argument). It knows that if orders is updated, daily_revenue is now stale and can be re-materialized. It automatically maintains lineage. It can show you in the UI exactly which assets are fresh, which are stale, and which failed.
Kitchen analogy: Airflow is a chef's task list — "chop onions, sauté onions, add stock." Dagster is a recipe with ingredients — "stock requires sautéed onions, which requires chopped onions." If you ask Dagster for stock, it figures out the rest. If you tell Dagster the onions are already chopped, it skips that step. The asset is the noun, not the verb.
This sounds like a syntactic difference but it has deep consequences. Lineage is automatic — you don't bolt on a separate catalog. Backfills are asset-aware — "rebuild this table for the last 30 days" is a first-class operation. Partitioning is built in — Dagster knows that orders has a date partition and that each day can be materialized independently. Data quality checks attach to assets, not to tasks, and can block downstream materialization if they fail.
The asset abstraction. This is the headline and it deserves the headline. After working in Dagster for a few weeks, going back to Airflow feels like writing assembly after writing Python. You stop thinking about "what runs" and start thinking about "what data exists."
dbt integration is best in class. Every dbt model becomes a Dagster asset automatically via load_assets_from_dbt_project. Dagster understands the dbt DAG, can run subsets of it, and unifies it with non-dbt assets in a single lineage graph. For teams whose center of gravity is dbt, this alone is a reason to use Dagster.
Local development actually works. dagster dev spins up the full UI, scheduler, and a working environment in one command. No metadata database to provision. No Docker compose file with seven services. You can iterate on a DAG the same way you iterate on a Python script.
Types and IO managers. Dagster lets you declare the type of an asset (a DataFrame, a Snowflake table, a Parquet file) and define IO managers that handle reading and writing. This means your @asset function returns a Pandas DataFrame and Dagster handles writing it to S3 / Snowflake / wherever — you stop manually writing boilerplate persistence code in every task.
The UI. The asset graph view is genuinely better than Airflow's. You see freshness, lineage, partitions, and run history in one place. You can click an asset and see its definition, its upstream dependencies, and its materialization history without leaving the screen.
Mindshare lags Airflow by an order of magnitude. Job listings overwhelmingly ask for Airflow. New hires need to learn Dagster from scratch. This is the single biggest reason teams stay on Airflow even when they prefer Dagster's design.
Migration from Airflow is painful. There's no automatic translator. You rewrite. Dagster has shipped tooling to make this easier (including some Airflow-compatibility shims) but in practice migrations are months-long projects. For brownfield environments, this often kills the conversation before it starts.
The asset model has a learning curve for people who think in tasks. Engineers coming from Airflow sometimes spend their first week trying to translate "tasks" 1:1 into "assets" and getting frustrated. The mental shift — "stop thinking about scripts, start thinking about tables" — takes a few weeks to internalize.
Open source vs. Dagster+ tension. As with most open-core companies, there are features (some auth, some advanced scheduling, the cloud-hosted plane) that live only in the paid product. This is fine and reasonable, but it's something to know going in.
Dagster is, technically, the better orchestrator. Almost every architectural choice the team made — assets over tasks, IO managers, types, partitioning — has aged well, and the gap relative to Airflow has only widened over time. For a brand-new data platform in 2026, Dagster is the recommended choice. For an existing Airflow environment with hundreds of DAGs, the migration cost usually outweighs the benefit, and the right move is to stay on Airflow (likely managed by Astronomer).
Dagster's asset metadata is one of the richest sources of pipeline truth in the modern data stack. TextQL Ana reads from the warehouses Dagster materializes and can use Dagster's lineage and freshness signals to answer questions like "is this table fresh?" or "which upstream asset failed last night?" Dagster's built-in lineage means TextQL doesn't have to guess at provenance — the orchestrator already knows.
See TextQL in action