ETL & Data Integration | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki ETL & Integration ETL & Data Integration

Contents

ETL & Data Integration

ETL and data integration tools move data from source systems into the warehouse and transform it into analytics-ready models. The category was reshaped by Fivetran (managed ELT connectors) and dbt (in-warehouse SQL transformation).

ETL (Extract, Transform, Load) and its modern descendant ELT (Extract, Load, Transform) describe how raw data gets from where it lives — your CRM, your production database, your ad platforms, your event logs — into the data warehouse where analysts can actually use it. This is the plumbing layer of the data stack. It is unglamorous, mission-critical, and the place where most data teams spend more money than they expected to.

Think of it this way: if a warehouse is the kitchen, ETL is the supply chain. The chef cannot cook without ingredients arriving on time, in the right quantity, in usable condition. A great kitchen with a broken supply chain produces nothing. That is why integration vendors — as boring as they sound — have produced some of the largest exits in data infrastructure history.

A Brief History: How ETL Became ELT

For roughly thirty years, "ETL" meant one specific thing: a heavyweight, on-premise tool (Informatica PowerCenter, IBM DataStage, Ab Initio, Microsoft SSIS) that pulled data out of source systems, ran transformations on a dedicated middle-tier server, and then loaded clean, modeled data into a warehouse like Teradata or Oracle. The reason transformation happened in the middle was simple economics: warehouse compute was expensive. Every CPU cycle inside Teradata cost real money, so you did as much work as possible before the data ever touched the warehouse.

This worked, more or less, but it produced a particular kind of organization. You needed an "ETL team" — specialists who knew Informatica's drag-and-drop GUI, who maintained hundreds of mappings, who became the bottleneck for every analytics request. New data source? File a ticket. New column? File a ticket. Wait two weeks. The transformation logic lived in a proprietary tool that nobody outside the ETL team could read, version, or test.

Then in the early 2010s, two things happened that broke the model.

First, the cloud warehouses arrived. Amazon Redshift launched in 2012, BigQuery in 2011, and Snowflake in 2014. Suddenly warehouse compute was elastic, cheap, and billed by the second. The economic argument for transforming data outside the warehouse evaporated. Why pay an Informatica server to do what a Snowflake virtual warehouse could do faster, with the same SQL the analysts already knew?

Second, Fivetran (founded 2012) made a contrarian bet. Instead of building yet another ETL tool with transformation features, Fivetran built only the EL part — managed connectors that pulled data out of SaaS apps and dumped it raw into the warehouse. No transformations. No business logic. Just reliable, schema-aware, incremental replication. The pitch was outrageous at the time: "we will not transform your data." George Fraser, Fivetran's CEO, argued that transformation was a separate problem and that mixing it with extraction was the original sin of legacy ETL.

This is what people mean when they say Fivetran killed ETL. Not literally — ETL workloads still run all over the world — but Fivetran killed the category structure in which a single vendor owned both extraction and transformation. The new paradigm split the work: a connector vendor handled E and L, and a separate tool handled T inside the warehouse.

The dbt Revolution

The "T" in ELT was an open question for a few years. Teams loaded raw data into Redshift or Snowflake and then... wrote a tangled mess of stored procedures, scheduled SQL scripts, and Airflow DAGs to model it. There was no standard way to do this work.

In 2016, Tristan Handy (at consulting firm RJMetrics, soon to spin out as dbt Labs) released dbt — short for "data build tool." dbt's idea was almost embarrassingly simple: what if SQL transformations were just files in a Git repo? You write a SELECT statement that defines a model. dbt wraps it in CREATE TABLE AS (or a view, or an incremental merge) and runs it against your warehouse. It handles dependencies, materializations, testing, and documentation. That's it.

The impact was enormous, and not really because of the technology. dbt's real contribution was a job category: the analytics engineer. Before dbt, you had data engineers (who wrote Python and maintained pipelines) and data analysts (who wrote SQL and built dashboards). The middle layer — people who could model data, write production SQL, version it in Git, and treat it like software — did not have a name. dbt named them, gave them tools, and built a community of tens of thousands. Coalesce, dbt's annual conference, became the analytics engineering Davos.

By 2025, dbt was the default transformation layer in essentially every modern data stack. dbt Labs had raised a $4.2B-valuation round (early 2022) and was working on dbt Cloud, the dbt Semantic Layer (the Transform/MetricFlow acquisition), and dbt Mesh for multi-project workflows. The open-source dbt-core remained free; dbt Labs monetized hosted execution, IDE, and governance.

How the Modern Stack Splits the Work

The modern data integration stack typically looks like three layers stacked on top of the warehouse:

Layer	Job	Typical tools
—-	—-	—-
Extract & Load (EL)	Pull data from sources, land it raw	Fivetran, Airbyte, Stitch, Meltano, custom Python
Transform (T)	Model raw data into clean, joined, business-ready tables	dbt, SQLMesh, Coalesce
Orchestrate	Schedule, monitor, and chain everything together	Airflow, Dagster, Prefect, dbt Cloud

Some vendors try to span multiple layers. Matillion is the canonical example of an "all-in-one" cloud-native ETL tool that does both ingest and transform inside a single GUI. Informatica (post-cloud-rebrand as IDMC) tries to do everything plus governance and master data. Stitch, after Talend acquired it in 2018, became the budget option for teams who wanted Fivetran-style ingestion at lower cost.

How Vendors Position Themselves Differently

Fivetran says: "We are the boring infrastructure layer. Buy connectors from us, transform with dbt, and never worry about pipeline maintenance again." Their pitch is reliability and connector breadth. Their weakness is price — Fivetran's monthly active rows (MAR) pricing has produced more sticker shock than any other invoice in the modern data stack.

Airbyte says: "Fivetran is too expensive and proprietary. Our open-source connector framework lets you self-host, contribute connectors, and avoid vendor lock-in." Their pitch is openness. Their weakness is connector quality variance — the long tail of community connectors is hit or miss compared to Fivetran's hand-built ones.

dbt Labs says: "Transformation belongs in the warehouse, written in SQL, version-controlled, and tested. We invented this category." Their pitch is correct, but they are now defending the category against SQLMesh, Coalesce, and the slow encroachment of warehouse-native features (Snowflake Dynamic Tables, Databricks DLT).

Informatica says: "You need governance, lineage, master data, data quality, and transformation in one platform. We have been doing this since 1993." Their pitch is completeness. Their weakness is that essentially no startup or scaleup chooses Informatica voluntarily — it is a Fortune 500 incumbent buy.

Matillion says: "You want a visual GUI for ETL but you also want it to push down into Snowflake/BigQuery/Databricks compute." Their pitch is the cloud-native middle ground between Informatica's complexity and dbt's code-first ethos. Their challenge is that the market has bifurcated — buyers either want a fully managed connector service (Fivetran) with a code-based modeling layer (dbt), or they want a heavyweight enterprise platform (Informatica). The visual middle is a shrinking niche.

Stitch is largely a story about its 2018 acquisition by Talend, and Talend's subsequent 2023 acquisition by Qlik. Stitch was originally Singer-based (the open-source connector spec it pioneered) and aimed at the budget end of the market. Today it exists, but few greenfield deployments choose it.

The Honest Market Take

The modern data integration market in 2026 is essentially a duopoly at the top — Fivetran for EL and dbt for T — with everything else fighting for the edges. Airbyte is the credible open-source alternative to Fivetran. Informatica still owns the regulated-enterprise installed base. Matillion and Stitch occupy uncomfortable middle positions. And the warehouse vendors themselves (Snowflake, Databricks, BigQuery) keep absorbing pieces of this stack into their native platforms.

Two trends matter most for the next few years. First, reverse ETL (Hightouch, Census) flipped the script by moving warehouse data back out into operational systems. Second, AI-generated transformations are starting to challenge whether human-written dbt models will remain the norm or become a checkpoint that LLMs author and analysts review.

How TextQL Works with the ETL Layer

TextQL Ana reads the dbt project graph — model definitions, tests, sources, and metadata — to understand what tables exist and what they mean. dbt's manifest.json is one of the richest semantic artifacts in the modern stack, and TextQL leans on it heavily to ground LLM-generated SQL in the actual shape of your warehouse. For ingestion tools like Fivetran and Airbyte, TextQL connects downstream of the warehouse, so the integration is implicit: if your data is in Snowflake, TextQL can query it regardless of how it got there.

See TextQL in action

ETL & Data Integration

Category Data movement & transformation

Also called ETL, ELT, data pipelines, ingestion

Modern leaders Fivetran, dbt, Airbyte

Legacy leaders Informatica, IBM DataStage, Talend, SAP Data Services

Sits between Source systems and the warehouse

Typical buyer Data engineering, analytics engineering

Monthly mindshare ~2M · every analytics team does some form of ETL/ELT; foundational layer