A common error with NOLOCK - and more alternatives (6 minutes)

Notes

Regarding Error 601, when I’m talking about modification to page “structure”, I mean which pages are in use by the index or heap. It’s always 8K pages in these disk based indexes that we’re looking at, I don’t mean to imply that the page structure itself changes.

Transcript

There is a common error you can get with allocation order scans: error 601

This error says: I could not continue this scan with NOLOCK due to data movement.

Back in the SQL Server 2005 days, I used to get this error all the time. It has become much less common, and I was doing some searching on it at one point. I came across this Stack Exchange question. In this Stack Exchange question, in a comment, Remus Rusanu, who at the time was working on the SQL Server engine group, explained that there are changes that the SQL Server team made in SQL Server 2012 that make this error much less common. It is still possible for it to happen, but it’s much less common in SQL Server 2012 and higher.

If you think about what we were talking about when we looked at that IM page, it has to do with: okay, even if I’m saying it’s okay if the data is garbage, there could be changes that are being made that impact {which pages the index is using} that make me confused, and okay, I don’t even know what page to go to next from the IM page. I might lose my place, for example, which is what that means.

When you are NOLOCKing a query, you have to consider that you potentially may need to catch and handle error 601

You may need to retry your statement if SQL Server just loses its place.

A recap of what we have talked about

When do we use NOLOCK? Well, ask yourself, is it okay if the data is just crap? And I actually use the crass term in this, because I think NOLOCK or read uncommitted is a pretty crass thing. Sometimes we need to be crass, and I think about, okay.

What if the CTO or the CIO of this company, what if they see wrong data in this and they ask me about it? Is that going to be okay there? Also, does using NOLOCK here solve a specific, important problem? Does it have a reason? What often happens is that, in cases where NOLOCK solves one problem, it starts being added to lots of things as preventive medicine. And it seems like, okay, what’s the big deal? Because people don’t understand that the data can just be wrong. Then later, when weird data starts showing up, it’s very hard to troubleshoot and there’s a lot of code that has this in it.

Also, there’s an alternative I’m going to talk about called read committed snapshot isolation

Read committed snapshot isolation affects your default database isolation level and can help fight blocking.

There are certain race conditions you have to worry about, but the thing is, even if you enable this, NOLOCK hints are still honored, and you’ll still get dirty reads because SQL Server says: oh, well you want to do a dirty read! Here, I’ll do it. You’ve hinted it. Yeah, your default isolation level is optimistic, but you’ve said you want dirty reads, so I’ll do that for you.

So don’t use NOLOCK “just in case.”

I always think about is there a cleaner solution to the problem?

Here is a list of cleaner solutions.

Instead of using NOLOCK, instead of using dirty reads, I would rather, if my problem is blocking in deadlocks, I would rather solve or prevent the problem with indexes that help the queries get to the data in a targeted efficient way. The faster our queries are, the less surface area there is for blocking. If I hold my locks for less time, or if I need to read for less time, potentially I can reduce locking.

Optimistic locking, with either read committed snapshot isolation, or snapshot isolation. There is versioning overhead for these. For snapshot isolation, you do have to explicitly change isolation level for, say, your reports, and say: I want you to use snapshot isolation level. For read committed snapshot isolation, you have to make sure you don’t hit race conditions. That’s a whole ‘nother course on okay, how do I use these? It’s more of a larger, longer term solution, but it can- because in optimistic locking, readers don’t block writers and writers don’t block readers- we can get consistent results without all this blocking. So it can absolutely be worth the work to get that in there instead of just NOLOCKing everything.

If I want the allocation order scans, I would much rather use a TABLOCK hint. If you are using optimistic locking, if you are using RCSI, you may need to use a TABLOCKX hint. There are details on why that is in Paul White’s blog on the allocation order scans, it’s a great post.

Whenever it is customer data that we are using NOLOCK on, I always want something signed off from a very important person saying yep, it’s okay if this data is just plain wrong. It’s okay for that situation. And I have actually gotten that signature before in some cases, where it really legitimately is okay. That very important person will review it and say, and they’re usually glad to be asked, they’re very glad that if there is a question about returning inconsistent data, that you’re having it validated at higher levels of the company, because it isn’t trivial when your customers are seeing it.

Thanks for taking this course about the dirty secrets of NOLOCK. If you’ve got questions or comments, I would love to get them on the course lessons, and I would love to hear feedback from the course in the survey as well. Hope to see you in another course soon!