Azure Blob Storage | Data Ecosystem Wiki

Thirty Launches in Thirty Days · Read the recap →

Wiki Storage Azure Blob Storage

Contents

Azure Blob Storage

Azure Blob Storage is Microsoft Azure's object storage service, launched in 2010. With ADLS Gen2 it became the third pillar of cloud object storage alongside S3 and GCS, and is the storage substrate for Azure-resident data lakes and Databricks workloads.

Azure Blob Storage is the third major cloud object store, alongside Amazon S3 and Google Cloud Storage. It launched in 2010 as part of the original Microsoft Azure services and is the storage substrate for any data lake, lakehouse, or large-scale analytical workload that lives on Azure — which includes a meaningful share of Databricks deployments, since Databricks on Azure stores its tables in Azure Blob Storage / ADLS Gen2.

The most important thing to know about Azure Blob Storage in 2026 is the distinction between Blob Storage (the original, flat object store) and Azure Data Lake Storage Gen2 (ADLS Gen2). They are technically the same underlying service, but ADLS Gen2 is Blob Storage with a hierarchical namespace enabled — a flag you set when creating a storage account that turns the flat key namespace into a real directory structure with atomic rename semantics. ADLS Gen2 is what you want for analytical workloads. Plain Blob Storage is what you want for static asset hosting.

What It Actually Is

Azure Blob Storage is an object store. The primitives are the same as S3 and GCS:

A storage account is the top-level container (rough analog to an AWS region+account, more granular than an S3 bucket).
Inside an account, you have one or more containers (the analog of S3 buckets).
Inside a container, you have blobs — arbitrary objects with a key and a binary payload.
The operations are PUT, GET, LIST, DELETE, plus the usual lifecycle and access control surface.

There are three blob types:

Block blobs — the normal kind, for files. This is what 99% of analytics workloads use.
Append blobs — optimized for appending, useful for log files.
Page blobs — random-access blocks, used for VM disks (basically Azure's version of EBS).

ADLS Gen2 layers a hierarchical namespace on top of block blob storage. The difference is real: with the hierarchical namespace enabled, directory operations become atomic and metadata operations become much faster. This matters for big data engines (Spark, Hive, Trino) that historically relied on file-system semantics like rename and that performed poorly on flat object stores.

The Origin Story (and the ADLS Saga)

Azure Blob Storage launched in 2010 as part of the first wave of Microsoft Azure services. For most of the early 2010s, it was a perfectly fine object store that nobody outside the Microsoft ecosystem talked about, because the data lake conversation was happening on AWS.

In 2016, Microsoft launched Azure Data Lake Storage Gen1 — a separate service, with its own API, designed specifically for big data workloads. ADLS Gen1 had the hierarchical namespace and POSIX-like permissions that Hadoop/Spark wanted, but it was a totally separate product from Blob Storage, with its own pricing, its own SDK, and its own SLAs. This was confusing.

In 2018, Microsoft fixed the confusion: ADLS Gen2 was released as an option on top of Blob Storage. Same underlying service, same SDKs, same pricing model, plus a hierarchical namespace flag. Existing Blob Storage features (lifecycle policies, tiers, replication) carried over. ADLS Gen1 was deprecated and ultimately retired in 2024. The result, in 2026: Azure Blob Storage and ADLS Gen2 are the same thing, with ADLS Gen2 just being "Blob Storage with the hierarchical namespace flag turned on."

Why It Matters

Azure Blob Storage matters because of two specific gravitational pulls:

1. Microsoft Fabric and the Microsoft analytics stack. Microsoft's modern analytics platform — Fabric, OneLake, Synapse, Power BI — all live on top of ADLS Gen2 internally. OneLake, Microsoft's "single SaaS data lake" launched with Fabric in 2023, is essentially a managed multi-tenant ADLS Gen2 that uses Delta Lake as the table format. If you are an enterprise standardizing on Microsoft data tools, you are using Azure Blob Storage whether you realize it or not.

2. Databricks on Azure. Databricks has a huge installed base on Azure (it's offered as a first-party service: "Azure Databricks"), and every Databricks-on-Azure deployment stores its Delta tables in ADLS Gen2. The Databricks/Microsoft partnership is one of the deepest in cloud, and ADLS Gen2 is the underlying storage layer that makes it work.

The Opinionated Take

Azure Blob Storage exists to be the storage layer for the Microsoft and Databricks ecosystems on Azure, and that is enough. It is not the technical leader. It does not have the developer mindshare of S3 or the BigQuery integration of GCS. But it is the only answer for enterprises that have committed to Microsoft's data stack — and for the very large set of Databricks customers who run on Azure rather than AWS or GCP, ADLS Gen2 is the default.

The other honest take: the historical confusion between Blob Storage, ADLS Gen1, and ADLS Gen2 cost Microsoft real mindshare. For years, anyone trying to write a data lake tutorial on Azure had to disclaim "depending on which version of ADLS you're using..." That's now resolved (Gen2 won, Gen1 is gone), but the product naming is still messier than S3's, and Microsoft documentation still uses "Blob Storage" and "ADLS Gen2" somewhat interchangeably in ways that confuse newcomers.

For raw object storage needs, all three clouds are essentially equivalent in 2026. Pick the one that matches the rest of your stack.

Storage Tiers

Same model as S3 and GCS:

Hot — frequent access, default tier.
Cool — accessed less than once a month. Lower storage cost, higher access cost.
Cold — accessed less than once a quarter. Even lower storage cost.
Archive — offline, retrieval takes hours, used for long-term backup.

Where Azure Blob Storage Fits in the Stack

Azure Blob Storage / ADLS Gen2 sits at the bottom of the analytics stack on Azure, beneath warehouses, lakehouses, and query engines. The typical pattern:

Source data
        ↓
ADLS Gen2 (raw zone, often Parquet or Delta)
        ↓
Databricks / Synapse / Microsoft Fabric
        ↓
Power BI / Hex / dashboards

How TextQL Works with Azure Blob Storage

TextQL Ana connects to ADLS Gen2 indirectly through the warehouse or lakehouse engine that sits above it — typically Databricks or Microsoft Fabric. When a business user asks Ana a question, Ana sends it to the connected engine, which reads the underlying data from ADLS Gen2 and returns the result. Ana operates at the question-and-answer layer, not the storage layer.

See TextQL in action

Azure Blob Storage

Launched 2010 (general availability)

Parent Microsoft Azure

HQ Redmond, WA

Category Storage / Data Lake

Storage type Object storage (Azure REST API; ADLS Gen2 adds hierarchical namespace)

Tiers Hot, Cool, Cold, Archive

Lakehouse variant Azure Data Lake Storage Gen2 (ADLS Gen2)

Monthly mindshare ~600K · Azure equivalent; #2 cloud by market share; many enterprise users