Bugs

Tag: bugs

All Eyes on the Wrong Problem: How Mitigations Distract from Real Performance Pain

All Eyes on the Wrong Problem: How Mitigations Distract from Real Performance Pain

๐Ÿ”ฅ UPDATE: Microsoft has announced the general availability of the Next-gen General Purpose service tier for Azure SQL Managed Instance, which includes improvements to I/O latency, IOPS, and transaction log throughput. This post describes the original General Purpose blob storage. You don't want that.

The biggest lesson I’ve learned from helping folks manage data in Azure is this: if you’ve got a truly terrible problem you’d rather people didn’t notice, a great way to hide it is by educating your support staff and users about something bad but not AS terribleโ€” something with a small mitigationโ€”and constantly refocusing them on that.

The user baseโ€” and even your own support staffโ€” will think that anyone who talks about the bigger issue just doesnโ€™t understand how to fix the โ€œknownโ€ problem.

This is the story of Azure General Purpose storage for Azure SQL Managed Instance and Azure SQL Database.

Continue reading

What the Decline of SQL Server Quality Means for Developers and DBAs

What the Decline of SQL Server Quality Means for Developers and DBAs

‘Is it just me, or is SQL Server quality slipping?’

I asked myself that question for couple/few years until I faced up to it: SQL Server is well into a period where Microsoft investment is waning, and Microsoft regularly isn’t able to deliver the features they promise.

Continue reading

Buyer Beware: Azure SQL Managed Instance Storage Is Regularly as Slow as 60 Seconds

Buyer Beware: Azure SQL Managed Instance Storage Is Regularly as Slow as 60 Seconds

๐Ÿ”ฅ UPDATE: Microsoft has announced the general availability of the Next-gen General Purpose service tier for Azure SQL Managed Instance, which includes improvements to I/O latency, IOPS, and transaction log throughput. This post describes the original General Purpose blob storage. You don't want that.

What are your stories of unbelievably bad performance from cloud vendors? I’ll go first. For years, Azure SQL Managed Instance’s General Purpose Tier has documented ‘approximate’ storage latency as being “5-10 ms.” This week they added a footnote: “This is an average range. Although the vast majority of IO request durations will fall under the top of the range, outliers which exceed the range are possible.”

Continue reading

Query Hash Values Are Meaningless in SQL Server: They May Be Reset to Be the Same Value as the Query Plan Hash

Query Hash Values Are Meaningless in SQL Server: They May Be Reset to Be the Same Value as the Query Plan Hash

This is the worst bug I’ve found in SQL Server to date. Previously, my top find was SQL Server Online Index Rebuild sometimes happens offline without warning. This one has taken top slot because it makes my life more difficult on a daily basis.

Background: SQL Server generates a query_hash for each query. This is stored in sys.query_store_query and it’s one of the primary ways you can identify what a query is across different Query Stores, or even the same Query Store over time, as surrogate query_id values get reset if Query Store is cleared or data ages on. The query_hash is a ‘Zobrist hash over the shape of the individual query, based on the bound (input) logical query tree. Query hints aren’t included as part of the hash.’ (Source)

Continue reading

General Failure Failed Forced Plans in Query Store Cause Even Slower Compile Times

General Failure Failed Forced Plans in Query Store Cause Even Slower Compile Times

๐Ÿ”ฅ UPDATE: Microsoft has announced the general availability of the Next-gen General Purpose service tier for Azure SQL Managed Instance, which includes improvements to I/O latency, IOPS, and transaction log throughput. This post describes the original General Purpose blob storage. You don't want that.

This post demonstrates two related bugs with plan forcing in Query Store which increase the likelihood of slower query execution and application timeouts in SQL Server environments.

These bugs are most likely to impact you if:

  • You use the Automatic Plan Correction feature in SQL Server, which automatically forces query plans.
  • Anyone manually forces query plans with Query Store.
  • You have slow storage, which can increase your likelihood of having longer compilation times.

The General Purpose tier of Azure SQL Managed Instance and Azure SQL Database feature both slow storage and Automatic Plan Correction enabled by default. So, weirdly enough, your risks of suffering from this problem are high if you are an Azure SQL customer.

Thanks to Erik Darling for his help in diagnosing and reproducing these issues– and his ‘slow compiler’ query used in this post was incredibly helpful to isolate and narrow down these problems.

Continue reading

Should You Use SQL Server Readable Secondaries If Queries Can Fail Repeatedly at Any Time?

Should You Use SQL Server Readable Secondaries If Queries Can Fail Repeatedly at Any Time?

If you use readable secondaries in Availability Groups or Read-Scale out instances in Azure SQL Managed Instance, you may have queries fail repeatedly if there is a glitch and statistics are not successfully ‘refreshed’ on the secondary replica. Those queries may keep failing until you manually intervene.

It’s unclear if Microsoft will ever fix this. There is a well established support deflection article which documents the issue and provides ‘workarounds’.

Continue reading

Ugly Bug: SQL Server Online Index Rebuild Sometimes Happens Offline Without Warning

Ugly Bug: SQL Server Online Index Rebuild Sometimes Happens Offline Without Warning

๐Ÿ”ฅ UPDATE: This issue has now been documented. A note has been added to the Perform index operations online documentation page, stating: "Index rebuild commands might hold exclusive locks on clustered indexes after a large object column is dropped from a table, even when performed online."

I found a nasty bug in SQL Server and Azure SQL Managed Instance recently: sometimes an ‘online’ index rebuild of a disk-based rowstore clustered index (basically a normal, everyday table) isn’t actually ‘online". In fact, it’s very OFFLINE, and it blocks both read and write queries against the table for long periods.

If you manage to make it through a rebuild successfully, the problem goes away for future rebuilds of that clustered index – likely leaving you bruised and bewildered.

Continue reading