The Case of the Blocking Online Index Create (A Shared Lock that Wouldn't Quit)

Page content

I recently got an interesting question from a reader about running a CREATE INDEX statement with DROP_EXISTING

Here’s the question:

I always take advantage of creating/modifying nonclustered indexes online since we are running SQL Server 2014 Enterprise Edition. This weekend I ran into an issue while modifying an existing nonclustered index (added a new column to INCLUDE): I was blocking any write operations on the table. After digging into this issue, I noticed that there was a Shared lock on the table.

From my knowledge, at the very beginning a shared lock needs to be acquired but it’s should be very fast one – https://msdn.microsoft.com/en-us/library/ms191261.aspx.

I ran the index creation for more than an hour and had to kill it because it started to significantly affect clients. I just don’t understand why the shared lock would be held for such a long time.

I can reproduce it on my test server with no activities on this database.

I thought this was a really interesting question, because our reader is correct: an online index creation statement should only need a shared lock, briefly, at the beginning of execution.

Ruling out the simple things first: is the shared lock getting blocked?

My first thought was that perhaps there is some process that runs against the production system and the test system that goes to sleep with an open transaction, holding an X or an IX lock against this table. If the index create can’t get its shared lock, then it could be part of a blocking chain.

So I asked first if the index create was the head of the blocking chain, or if it was perhaps blocked by something else. The answer came back that no, the index create was NOT blocked. It was holding the shared lock for a long time.

My new friend even sent a screenshot of the index create running against the test instance in sp_WhoIsActive with blocking_session_id null.

And it wasn’t just one shared lock. It was 120!

To deepen the mystery, he ran sp_WhoIsActive with @get_locks=1,  and the locks for the running, non-blocked index create showed:

<Lock resource_type="OBJECT" request_mode="IS" request_status="GRANT" request_count="1" />
<Lock resource_type="OBJECT" request_mode="S" request_status="GRANT" request_count="120" />
<Lock resource_type="OBJECT" request_mode="Sch-S" request_status="GRANT" request_count="1" />

Yep, that’s 120 OBJECT level shared locks, just hanging out while the online create is running.

I was curious if the number 120 was meaningful to this table somehow. Were there 120 partitions? 120 indexes? 120 triggers? (I know it sounds weird, but I’ve found weirder.)

Nope, there weren’t 120 of anything identifiable.

We did a little tracing, and a little discussion in email. And just as soon as I sent an email that said, “I can’t reproduce this on SQL Server 2016 at all, I think you have a bug!” I got an email back that said, “This is a bug!”

The bug: Rebuilding a nonclustered index to add columns by using CREATE INDEX with DROP_EXISTING=ON and ONLINE=ON causes blocking (KB 3194365)

I’d actually done a little bit of searching when I first got this question by email, and I didn’t find this bug at all. But my keywords weren’t perfect.

And if you’ve ever searched for SQL Server bugs, you know that your keywords have to be exactly perfect, and it has to be your lucky day for you to find the bug.

The bug is here. If you’re running SQL Server 2012, 2014, or 2016 Enterprise Edition, check your patch level if you ever run CREATE WITH DROP_EXISTING: because you probably would like to have the option for it to be at least mostly online.

Thanks very much to Denis for emailing about this issue! The story does have a happy ending: Denis found that the issue was resolved by patching the instance, as described in the KB.