NEW: Scale AI Case Study — ~1,900 data requests per week across 4 business units Read now →
Contents
Azure Blob Storage
Azure Blob Storage is Microsoft Azure's object storage service, launched in 2010. With ADLS Gen2 it became the third pillar of cloud object storage alongside S3 and GCS, and is the storage substrate for Azure-resident data lakes and Databricks workloads.
Azure Blob Storage is the third major cloud object store, alongside Amazon S3 and Google Cloud Storage. It launched in 2010 as part of the original Microsoft Azure services and is the storage substrate for any data lake, lakehouse, or large-scale analytical workload that lives on Azure — which includes a meaningful share of Databricks deployments, since Databricks on Azure stores its tables in Azure Blob Storage / ADLS Gen2.
The most important thing to know about Azure Blob Storage in 2026 is the distinction between Blob Storage (the original, flat object store) and Azure Data Lake Storage Gen2 (ADLS Gen2). They are technically the same underlying service, but ADLS Gen2 is Blob Storage with a hierarchical namespace enabled — a flag you set when creating a storage account that turns the flat key namespace into a real directory structure with atomic rename semantics. ADLS Gen2 is what you want for analytical workloads. Plain Blob Storage is what you want for static asset hosting.
Azure Blob Storage is an object store. The primitives are the same as S3 and GCS:
There are three blob types:
ADLS Gen2 layers a hierarchical namespace on top of block blob storage. The difference is real: with the hierarchical namespace enabled, directory operations become atomic and metadata operations become much faster. This matters for big data engines (Spark, Hive, Trino) that historically relied on file-system semantics like rename and that performed poorly on flat object stores.
Azure Blob Storage launched in 2010 as part of the first wave of Microsoft Azure services. For most of the early 2010s, it was a perfectly fine object store that nobody outside the Microsoft ecosystem talked about, because the data lake conversation was happening on AWS.
In 2016, Microsoft launched Azure Data Lake Storage Gen1 — a separate service, with its own API, designed specifically for big data workloads. ADLS Gen1 had the hierarchical namespace and POSIX-like permissions that Hadoop/Spark wanted, but it was a totally separate product from Blob Storage, with its own pricing, its own SDK, and its own SLAs. This was confusing.
In 2018, Microsoft fixed the confusion: ADLS Gen2 was released as an option on top of Blob Storage. Same underlying service, same SDKs, same pricing model, plus a hierarchical namespace flag. Existing Blob Storage features (lifecycle policies, tiers, replication) carried over. ADLS Gen1 was deprecated and ultimately retired in 2024. The result, in 2026: Azure Blob Storage and ADLS Gen2 are the same thing, with ADLS Gen2 just being "Blob Storage with the hierarchical namespace flag turned on."
Azure Blob Storage matters because of two specific gravitational pulls:
1. Microsoft Fabric and the Microsoft analytics stack. Microsoft's modern analytics platform — Fabric, OneLake, Synapse, Power BI — all live on top of ADLS Gen2 internally. OneLake, Microsoft's "single SaaS data lake" launched with Fabric in 2023, is essentially a managed multi-tenant ADLS Gen2 that uses Delta Lake as the table format. If you are an enterprise standardizing on Microsoft data tools, you are using Azure Blob Storage whether you realize it or not.
2. Databricks on Azure. Databricks has a huge installed base on Azure (it's offered as a first-party service: "Azure Databricks"), and every Databricks-on-Azure deployment stores its Delta tables in ADLS Gen2. The Databricks/Microsoft partnership is one of the deepest in cloud, and ADLS Gen2 is the underlying storage layer that makes it work.
Azure Blob Storage exists to be the storage layer for the Microsoft and Databricks ecosystems on Azure, and that is enough. It is not the technical leader. It does not have the developer mindshare of S3 or the BigQuery integration of GCS. But it is the only answer for enterprises that have committed to Microsoft's data stack — and for the very large set of Databricks customers who run on Azure rather than AWS or GCP, ADLS Gen2 is the default.
The other honest take: the historical confusion between Blob Storage, ADLS Gen1, and ADLS Gen2 cost Microsoft real mindshare. For years, anyone trying to write a data lake tutorial on Azure had to disclaim "depending on which version of ADLS you're using..." That's now resolved (Gen2 won, Gen1 is gone), but the product naming is still messier than S3's, and Microsoft documentation still uses "Blob Storage" and "ADLS Gen2" somewhat interchangeably in ways that confuse newcomers.
For raw object storage needs, all three clouds are essentially equivalent in 2026. Pick the one that matches the rest of your stack.
Same model as S3 and GCS:
Azure Blob Storage / ADLS Gen2 sits at the bottom of the analytics stack on Azure, beneath warehouses, lakehouses, and query engines. The typical pattern:
Source data
↓
ADLS Gen2 (raw zone, often Parquet or Delta)
↓
Databricks / Synapse / Microsoft Fabric
↓
Power BI / Hex / dashboardsTextQL Ana connects to ADLS Gen2 indirectly through the warehouse or lakehouse engine that sits above it — typically Databricks or Microsoft Fabric. When a business user asks Ana a question, Ana sends it to the connected engine, which reads the underlying data from ADLS Gen2 and returns the result. Ana operates at the question-and-answer layer, not the storage layer.
See TextQL in action