The Storage Pool Column You Never Configured

Nick Thompson

Summary

Storage pool column count is locked at creation and invisible in the Azure portal. A single-disk deployment means a single-column pool — no striping, a fraction of the IOPS you're paying for, and no way to fix it without a full rebuild. Most admins don't know this until they're already troubleshooting.

You're on Premium SSDs. It should be fast. It isn't.

A SQL Server on an Azure VM is running slow. The application team says queries are timing out. You check the VM — CPU is fine, memory is fine. You pull up the disk metrics and see IOPS consistently hitting the ceiling. The obvious move: bigger disks, more throughput. So you resize the disks in the storage pool, and nothing changes. You add another disk to the pool, extend the volume, and the IOPS ceiling barely moves. You're throwing money at the problem and the problem doesn't care.

Then someone who's been through this before asks you a question you've never considered: how many columns does your storage pool have?

You open PowerShell, run Get-VirtualDisk | Format-List FriendlyName,NumberOfColumns, and see the answer. One. One column. One disk's worth of I/O performance, regardless of how many disks you've added since.

That number was set when the storage pool was created — probably during initial deployment, probably by the Azure portal, probably without anyone thinking about it. And it can never be changed.

What Columns Actually Do

Azure SQL Server VMs use Windows Storage Spaces to combine multiple managed disks into a single logical volume. The critical parameter is the column count — the number of physical disks that data is striped across simultaneously.

When you write data to a storage pool with four columns, the data is interleaved across all four disks in 64KB stripes. Each I/O operation gets distributed, and the aggregate IOPS and throughput scale linearly with the number of columns. Four disks, four columns, four times the I/O bandwidth of a single disk.

When you write data to a storage pool with one column, all data goes to one disk. You get exactly that disk's IOPS and throughput ceiling — regardless of how many other disks exist in the pool. Additional disks provide capacity. They do not provide additional striping.

The math is not subtle. Take four terabytes of storage configured two ways:

A single P50 disk (4TB): 7,500 IOPS, 250 MB/s throughput.

Four P30 disks striped with four columns (4TB total): 20,000 IOPS, 800 MB/s throughput.

Same capacity. The striped configuration delivers 2.7 times the IOPS and 3.2 times the throughput. And the four P30s cost less than the single P50 — before you factor in the performance difference.

The Portal Doesn't Teach You This

When you deploy a SQL Server VM from the Azure Marketplace, the portal asks you how many data disks you want. Pick one disk, you get a one-column storage pool. Pick four disks, you get four columns. The column count is set to match the disk count at creation time, which is the correct behavior — the problem is that nothing in the deployment flow tells you this is happening, why it matters, or that it's permanent.

An admin deploying their first SQL VM sees a disk count slider, picks a reasonable starting point — one or two disks, they'll add more later if needed — and moves on. The word "column" doesn't appear in the portal. Storage Spaces configuration happens behind the scenes. The pool is created, the volume is formatted with 64KB allocation units, and the deployment completes successfully. Everything looks right.

The decision that locked in their I/O ceiling for the life of that workload happened in a dropdown they spent three seconds on.

Why Bigger Disks Make It Worse

The instinctive response to I/O problems is bigger disks. This is where Azure's disk tier architecture works against you in a way that isn't obvious.

Premium SSD disks at P40 (2TB) and below support host-level ReadOnly caching. This is critical for SQL Server data files — cached reads bypass the disk entirely and serve from the VM's local SSD cache, delivering dramatically lower latency. Microsoft's own best practices checklist explicitly recommends P30 and P40 disks for data files to ensure cache support.

Disks at P50 (4TB) and above do not support ReadOnly caching. The feature is silently unavailable. So the admin who upgrades from multiple P30s to a single P50 for more capacity has done two things: eliminated their striping benefit (one disk, one column) and lost their read cache. They're paying more for objectively worse performance characteristics.

This is the kind of platform behavior that doesn't show up in a dashboard. The metrics will show you're hitting IOPS limits. They won't tell you why the limits are where they are, or that the fix isn't more disk — it's more columns.

You Can't Fix This in Place

Here's the part that makes the planning conversation matter: column count is permanent. Microsoft's documentation is explicit — after the storage pool is created, you cannot alter the number of columns.

Adding disks to an existing pool gives you more capacity but not more stripe width. If you created a pool with one column, every disk you add writes data through that same single-column path. The pool gets bigger. It doesn't get faster.

To actually increase the column count, you have to rebuild from scratch. Back up all data. Delete the existing storage pool and virtual disk. Create a new pool with the correct number of disks and columns. Restore the data. For a production SQL workload, that's a maintenance window, a migration plan, and a change request — not a quick fix.

It gets worse. If you deployed from the Azure Marketplace, the storage pool is managed by the SQL IaaS Agent extension. If anyone renamed the default pool, virtual disk, or volume names from their defaults — SQLVMStoragePool1, SQLVMVirtualDisk1 — the extension breaks. The portal's storage management options grey out. You've lost the ability to manage the storage through Azure entirely and you're doing everything in PowerShell.

And one more gotcha that catches people: resizing disks that are already in a storage pool does not reclaim the new space for the volume. Microsoft documents this directly — the increased disk space is unused and wasted inside the pool. You have to add new disks instead. So the admin who resized their P30s to P40s wondering where their extra terabytes went? They're there. They're just inaccessible.

When You Extend the Pool, Think About What You're Extending With

Say you did the right thing at deployment — four P30s, four columns, proper striping. The workload grows. You need to extend. The SQL IaaS extension requires you to add disks in multiples of your column count, so you're adding four more disks. Good. But which four?

Nothing in the portal stops you from adding four P20s (500GB, 2,300 IOPS each) to a pool that was built on P30s (1TB, 5,000 IOPS each). Maybe someone picked the P20s because they were cheaper, or because 500GB was "closest to what the customer asked for." The extend completes successfully. The volume grows. And now you have a performance split in your data that doesn't show up in any monitoring dashboard.

Storage Spaces creates new allocation slabs on the new disks. Existing data stays on the original disk set. New data writes to the new disks. So your older data — likely your most queried historical data — is striping across P30s at 5,000 IOPS per disk. Your newer data is striping across P20s at 2,300 IOPS per disk. Some queries are fast. Some are slow. The difference depends entirely on which physical stripe set the data landed on, and nothing in the SQL Server metrics will tell you that's why.

Even if you add four matching P30s, the same data distribution pattern applies. Old data lives on the original four disks. New data lives on the new four. You're striping across two independent sets of four — not across all eight. The aggregate IOPS of the pool increased, but the I/O isn't evenly distributed across all disks. A query hitting old data is still constrained to the original four-disk stripe, and a query hitting new data is constrained to the new one.

Storage Spaces does have an Optimize-StoragePool cmdlet that can rebalance data across all physical disks, but it's I/O intensive, it's not commonly known, and running it on a production SQL workload isn't something you do casually. Most admins don't know the rebalance option exists, so the uneven distribution just sits there — technically working, silently underperforming.

This is another place where the considerations discipline pays for itself. Before extending, ask: what disk tier am I adding, and does it match what's already there? What happens to data distribution? Do I need to rebalance? Is the performance I'm expecting from "eight disks" actually being delivered as "two sets of four"?

Resize Is Re-evaluate

Resizing storage in the lifecycle of an application is normal. It's expected. The workload grows, the data accumulates, the performance requirements shift. None of that is a problem.

What's a problem is resizing because you didn't evaluate the storage layout at deployment time. Every resize is a re-evaluation — a moment to check whether your original assumptions still hold. If you're resizing quickly after initial deployment or doing it repeatedly, that's a signal that the initial planning didn't account for growth, and the storage pool's permanent column lock means the initial planning has to be right.

This is where the "considerations" discipline shows up in a concrete, technical scenario. Before deploying a SQL Server VM, the questions that prevent the rebuild six months later take about ten minutes to ask: What's the expected data growth over the next two years? What are the IOPS and throughput requirements under peak load? How many disks do I need to stripe across to meet those numbers? What happens when we need to scale beyond this?

Microsoft published a best practices checklist for SQL Server on Azure VMs that I genuinely wish existed for more resource types. It covers VM sizing, storage configuration, security, HADR — it's thorough. But even that checklist assumes you know what a storage pool column is, why the count matters, and that it's permanent. If you don't know those things, the checklist's recommendation to "stripe multiple Azure data disks using Storage Spaces" reads like general guidance, not the critical deployment decision it actually is.

What to Actually Do

If you're managing SQL Server VMs on Azure, check your storage pools. Open PowerShell and run Get-VirtualDisk | Format-List FriendlyName,NumberOfColumns. If any pool shows a column count of one, or a column count lower than the number of physical disks, you have a striping gap — capacity that exists but can't deliver the I/O performance it should.

For new deployments, decide the column count before you deploy. Determine IOPS and throughput requirements under peak load, add a 20% buffer for growth, and deploy with enough disks to meet that target from day one. The column count equals the disk count at creation — there's no opportunity to adjust it later.

Prefer multiple smaller disks over fewer large ones. Four P30s outperform a single P50 in every dimension that matters for SQL Server: aggregate IOPS, aggregate throughput, cache support, and cost. The P30/P40 tier is the sweet spot — large enough to be practical, small enough to support ReadOnly caching.

Separate your pools. Data files on one pool with ReadOnly caching, log files on another with no caching, tempdb on the local ephemeral SSD when possible. Each pool gets its own column count and caching policy.

And when it comes time to resize — and it will — treat it as a re-evaluation, not just a capacity bump. Check the column count. Check the cache configuration. Check whether your original deployment assumptions still hold. If they don't, plan the rebuild deliberately. The alternative is throwing money at bigger disks and wondering why the performance doesn't move.

More from Nick