Amundsen | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Contents

Amundsen

Amundsen is the open-source data catalog originally built at Lyft in 2019. It pioneered the 'search-first' discovery UX that reshaped the entire category, but is now essentially abandoned as the community has migrated to DataHub and OpenMetadata.

Amundsen is the data catalog that changed the conversation and then, in one of the more bittersweet outcomes in the modern data stack, was quietly left behind. Built inside Lyft starting around 2018 and open-sourced in April 2019, it was named after the Norwegian polar explorer Roald Amundsen — "the explorer for your data" — and was the first catalog to treat search as the primary UX rather than as a bolt-on to a governance tree.

The Lyft data platform team — Mark Grover (previously at Cloudera), Tao Feng, and Jin Hyuk Chang among others — built Amundsen to solve a very specific internal problem: Lyft had thousands of Hive tables and Airflow DAGs, and analysts were spending hours a day asking "does anyone know which table has X?" in Slack. The team's insight was that discovery should feel like Google for your data warehouse: type a word, see ranked results, click through to context.

That insight is now the dominant assumption in the entire catalog category. Atlan, DataHub, Select Star, Secoda, OpenMetadata, and even the legacy vendors have all adopted search-first UIs. Amundsen is the reason they did.

What Amundsen Actually Did

At release, Amundsen was a clean, focused product with three components:

Frontend service — a React-based UI centered on a large search bar and a table detail view with descriptions, owners, top queries, and tags.
Metadata service — a Flask backend that served table metadata from a graph store (originally Neo4j, later Apache Atlas or AWS Neptune as alternatives).
Search service — Elasticsearch-backed search with relevance ranking tuned by usage signals from query logs.

Ingestion was done by a separate library called Databuilder, a set of Python extractors that connected to Hive, Presto, Redshift, Snowflake, BigQuery, Airflow, and a handful of BI tools to populate the graph.

The product was deliberately minimal. There was no business glossary. No workflow engine. No policy and access control. No column-level lineage (table-level only, and even that depended on external lineage sources). The Amundsen team's argument was that most of what legacy catalogs shipped was governance theater analysts never used, and that a ruthlessly focused search-first tool would deliver most of the real value.

They were right about the direction and wrong about the staying power of the minimalism.

Why Amundsen Mattered (and Why It Stalled)

Amundsen's historical importance is hard to overstate. It was the first open-source catalog anyone in the modern data world actually ran. It showed that catalogs could be lightweight, engineering-led, and built bottom-up rather than sold top-down to a governance committee. It shaped the design language of every modern catalog that followed. Mark Grover and the Lyft team did more speaking, evangelism, and community-building in 2019–2020 than any other catalog project of the era.

And then it stalled. A combination of factors slowly drained the project:

No commercial steward. Unlike DataHub, which found a home with Acryl Data, Amundsen never spun out a company. Mark Grover left Lyft in 2021 to co-found Stemma, which was explicitly positioned as the commercial company behind Amundsen — but Stemma was acquired by Teradata in 2023, at which point the commercial path for Amundsen effectively ended. Without a well-funded maintainer, the pace of development dropped sharply.

The feature gap widened. As DataHub, Atlan, and OpenMetadata shipped column-level lineage, data-mesh primitives, dbt-native integration, policy engines, and AI copilots, Amundsen's minimalism stopped feeling clean and started feeling behind. Contributors moved to the projects that were shipping.

Lyft's own investment shifted. Lyft continued to use Amundsen internally for some time, but the platform team's attention moved to other priorities, and the volume of Lyft-authored PRs declined.

Community migration. By 2023–2024, most of the open-source catalog energy had consolidated around DataHub (for teams wanting a commercial-backed project) and OpenMetadata (for teams wanting a lighter, pure-open alternative). New adopters picking an open-source catalog in 2025 rarely consider Amundsen.

The Opinionated Take

Amundsen is essentially abandoned as a going project, even though its ideas won. This is the honest 2026 take, and it is worth stating without hedging. The Linux Foundation AI repo still exists, occasional PRs still land, and a handful of large companies (including Lyft itself) still run it because the cost of migration is higher than the cost of staying. But no serious data team in 2026 picks Amundsen as their new catalog. The product has not meaningfully closed the feature gap with DataHub in several years, there is no company paying a team of engineers to develop it, and the documentation and integrations have drifted out of date.

The Stemma-to-Teradata acquisition is the turning point most people point to, and it is a genuinely sad story. Stemma was the right commercial vehicle, Mark Grover was the right person to run it, and an Atlan-sized outcome was plausible if the timing had been different. Instead, the commercial path evaporated into a strategic acquihire, and the community's center of gravity moved on.

The legacy is real, though, and should not be forgotten. Every modern catalog that puts a search bar front and center — which is all of them — owes the design to Amundsen. Every catalog that treats the analyst as the primary user rather than the governance committee owes the philosophy to Amundsen. Every engineer who has argued that a catalog should be a tool people voluntarily open rather than a system they are forced into is continuing an argument Amundsen made first.

Practical recommendation for 2026: if you are still running Amundsen, plan a migration to DataHub, Atlan, or OpenMetadata on your own timeline. If you are evaluating Amundsen for a new deployment, don't — the ideas have propagated into better-maintained products. Read the 2019 Amundsen blog posts for the intellectual history, then go pick one of the catalogs that inherited its DNA.

TextQL Fit

TextQL can read from an Amundsen deployment via its metadata service API for customers who still run it, pulling table descriptions, tags, and ownership information. In practice, Amundsen customers doing a modern AI-analytics deployment are usually also planning to replace the catalog, and the typical path is to migrate metadata to DataHub or Atlan and point TextQL at the new system.

See TextQL in action

Amundsen

Open-sourced April 2019

Origin Lyft (internal project started ~2018)

Original authors Mark Grover, Tao Feng, Jin Hyuk Chang, and the Lyft data platform team

Governance Linux Foundation AI & Data (LF AI) since 2020

Named for Roald Amundsen, Norwegian polar explorer — "explorer for your data"

Category Data Catalog

License Apache 2.0

Status Effectively in maintenance; community migrating elsewhere

Monthly mindshare ~5K · Lyft project; essentially abandoned; ~4K GitHub stars but inactive