NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Databricks ML
Databricks ML is the machine learning side of the Databricks lakehouse. It bundles MLflow, AutoML, model serving, feature stores, and (post-MosaicML) LLM training into the same platform as the data.
Databricks ML is the machine learning side of the Databricks lakehouse. It is not a separate product so much as a collection of features (notebooks, MLflow, AutoML, model serving, feature store, vector search, Mosaic AI) bundled into the same workspace where customers do their data engineering and SQL analytics. The strategic logic: if your data already lives in Databricks, training and serving models against that data is one click away, with no separate platform to wire up.
This proximity-to-data is the single biggest reason Databricks won the ML platform category for data-team-led organizations. Most other ML platforms (SageMaker included) require you to move or copy data into the platform before you can train. In Databricks, the data is already there, governed by Unity Catalog, and the same notebook you wrote yesterday for SQL analytics can train a model today.
Databricks was founded in 2013 by Ali Ghodsi, Matei Zaharia, and the rest of the Berkeley AMPLab team that created Apache Spark. The original company pitch was "Spark in the cloud" — a managed environment to run distributed data processing without managing clusters. ML was always part of the vision because Spark MLlib was a first-class library, but for the first few years Databricks was primarily a data engineering and data science notebook company.
The pivotal ML moment was MLflow, open-sourced in June 2018. Matei Zaharia and team built MLflow as an open project for experiment tracking, model packaging, and model registry — the three things every data scientist desperately needed and nobody had standardized. MLflow was deliberately framework-agnostic (it worked with TensorFlow, PyTorch, scikit-learn, XGBoost, anything) and deliberately open. Within two years, MLflow was the de facto standard for experiment tracking in the ML community.
This was a brilliant strategic move. By owning the open-source standard, Databricks made every ML team that used MLflow a candidate to upgrade to the managed Databricks version. The OSS-to-commercial funnel that worked for Spark also worked for ML.
Through 2019-2022, Databricks layered on more ML features: AutoML (one-click model training for tabular data), Feature Store (centralized feature definitions, the first feature store integrated with a lakehouse), Model Serving (REST endpoints for trained models), and Lakehouse Monitoring for drift detection. By 2022, Databricks ML was a credible end-to-end alternative to SageMaker for any team already on Databricks.
Then 2023 happened, and Databricks bet the company on generative AI by acquiring MosaicML.
In June 2023, Databricks acquired MosaicML for $1.3 billion. MosaicML was a roughly 60-person startup founded by Naveen Rao (who had previously sold his earlier startup Nervana to Intel) that built optimized infrastructure for training and fine-tuning large language models. MosaicML had released MPT-7B and MPT-30B, two well-regarded open-source LLMs, and had a reputation for making LLM training dramatically cheaper and faster.
The strategic logic was straightforward: Databricks wanted to be the platform where enterprises built their own LLMs, fine-tuned foundation models on their proprietary data, and deployed them. MosaicML had the team and the infrastructure to make that real. The acquisition was unusually expensive for a 60-person startup, but Databricks (and their CEO Ali Ghodsi) clearly viewed it as existential — if Databricks did not become the LLM training platform, AWS Bedrock or OpenAI or some other competitor would.
The MosaicML team became the foundation of what Databricks now calls Mosaic AI, the GenAI side of the platform. It includes Foundation Model APIs, Model Training (fine-tune your own LLM on your own data), Model Serving (host any model behind a REST endpoint), Vector Search (built into the lakehouse for RAG), and AI Agent Framework. In 2024-2025, Databricks has shipped Mosaic AI features at a furious pace.
A rough inventory of the platform as of 2026:
Databricks ML has the best data-gravity story in the ML platform category. If your data is already in a Databricks lakehouse, there is no other ML platform that gets you from raw data to deployed model with less friction. The Unity Catalog governance, the Delta Lake storage, the MLflow tracking, and the model serving are all wired together by the same vendor with the same identity, audit, and lineage.
The weaknesses are real but improving. Databricks notebooks are not as polished as VS Code. The model serving infrastructure has historically been more expensive than rolling your own on Kubernetes. The Mosaic AI features are powerful but evolving fast, which means breaking changes and uneven documentation. And for teams whose primary cloud is not AWS or Azure, Databricks ML on GCP has historically lagged.
The single biggest threat to Databricks ML is the LLM-era feeling that the classical ML platform is becoming less central. If most enterprise AI value moves toward fine-tuning foundation models and building agents, the "feature store + model registry + serving" stack becomes less differentiated, and the LLM-specific tooling (which Databricks is racing to build) is the new battleground. Databricks understands this — the MosaicML acquisition is exhibit A — but the race is genuinely contested with AWS, Google, Microsoft, and pure-play LLM platforms.
The honest prediction: Databricks ML stays a top-2 choice for any data team on the lakehouse, and Mosaic AI either becomes the dominant enterprise LLM training platform or gets out-shipped by AWS Bedrock. The next two years will decide.
Databricks ML lives inside the Databricks lakehouse:
A typical user is a data scientist or ML engineer at a company that has standardized on Databricks for its data platform.
TextQL Ana connects to Databricks SQL Warehouses to answer business questions in natural language. When customers also use Databricks ML, the outputs of their models (churn scores, fraud flags, customer segments) live in Delta tables that Ana can query directly. A business user can ask "show me high-churn customers in the West region" and Ana resolves the answer against the same model output table the data science team produced. TextQL is complementary to Databricks ML: Databricks builds and serves the models, Ana exposes their outputs to business users in natural language.
See TextQL in action
Related topics