Automatic Plan Correction Can Re-Correct! Plus: Our Correction Survives a Plan Cache Flush (10 minutes)

WARNING: I do NOT currently recommend enabling the Automatic Plan Correction feature due to issues described in my post, Automatic Plan Correction Could be a Great Auto Tuning Feature for SQL Server. Here’s why it Isn’t.

Instance restarts and statistics updates may invalidate automatic plan corrections (at least temporarily)– but automatic plan corrections DO survive if the execution plan cache gets flushed.

Transcript

The current state that we’re in is: we had a plan be auto-tuned.  This was the “last good plan I’m going to stick to this plan.”

Then I ran that statistics update that invalidated it, and we’re back to having slow plan in cache.

What I’m doing right now is: I have slow plan in place, and I’m running the procedure twenty more times, slowly.

At a certain point, not right away, but at a certain point auto-tuning is going to kick in and is going to say “Hey, I can make this faster again.”

Now, it hasn’t happened yet. I don’t see any check marks.

Notice that I now have a blue square for plan 6

That means that there was a cancellation. When you see the squares on there that means that I stopped it because I had those actual plans enabled. It’s like, okay this one actually didn’t complete, so now I get a square.

Notice that I have this period of time where I have the procedure being slow again

For those of you who might worry about “Is this feature going to replace DBAs?” what this means, against a really critical system, we don’t want any of these periods where execution is slow.

This auto-tuning feature, it is a safety net that can kick in after a while, after a certain number, a certain sample size is collected. It can’t prevent this thing from being slow.

You, the DBA, possibly, if we analyze what this query is and ways to stabilize the plan – when I see a query being auto-tuned, what I want to start thinking about is, why are you getting such a variation in plans, and what can I do to stabilize you? Do you need a different index? Do you need a T-SQL rewrite? Why are we ever getting a slow plan, and how can I prevent it?

I have great information that’s been collected in Query Store for me where I can see.

I’m going to open up a different report.

I’m going to open up Top Resource Consuming Queries, which is also useful

You could have T-SQL that does something similar. On Top Resource Consuming Queries, I’m going to change it to be by CPU time and then by average CPU time. And again, I’m going to click this top button and say I want to see it as a grid on the top left.

So, our number one query on this is also Popular Names and on this graph I can see much more clearly, here is where my CPU time averages higher, here is where my CPU time averages lower and I can very easily click around and say, okay, when you’re slower, what is the parameter that you are compiled with? I can go into the parameter list and say oh, you’re slow when your compiled for threshold = null.

Whereas, here’s the one where the average CPU time is much lower. Oh, when the average CPU time is much lower, let’s actually click on the plan here and make sure I’m seeing the, okay, plan id 15 has an average CPU time of 2 seconds.

Which plan shape is this here? This is the fast plan. Okay. Okay, this is fun! This is actually a gotcha. This is something to look out for.

This fast plan was compiled for at threshold equals null… Wait, it was compiled for this when it was slow.

The thing we have to also look for is, oh, use plan was true.

When use plan is true, it didn’t get this plan shape because of what it was compiled for

What we need to find is a case where it was faster and use plan wasn’t true.

So, let’s go back.

DBAs will definitely be needed just to understand what happened with the auto-tuning and figure out what to do from there because it is so confusing.

I’m going to change this report to the last 12 hours. Again, I can’t see the okay button so I’m taking it on faith and I’m going back farther.

We have some create index statements in here now. I’m going to look at the query that we’re looking for and here is, this is, it changes the colors on me, which unfortunately changes my memory. So now we have this orange one.

I have plan id 5 highlighted. I’m looking at its properties on the select and this one, notice there is no use plan down here. This is one whose average time was fast and it wasn’t forced and this one was compiled for threshold equals 500,000.

So when I’m trying to figure out how do I compile fast plan I have to look for and make sure that if there’s an absence, if use plan equals true and then pick up the parameter compiled value. So my job gets extra, extra fun.

Alright, well let’s refresh this and see I still don’t have any check marks here. I still haven’t gotten to a point where that auto-tuning has kicked in again. I’m still running that slow query. It’s only on one thread so it doesn’t overwhelm my laptop. And it’s going and going and going.

John asks, “How can the plan get applied if the server is rebooted, statistics updated, etc?”

Things get invalidated.

The statistics updated did, well it didn’t, it said hey, I’m trying to remember the exact wording that it did with the statistics update. It was like “Hey, no, statistics have updated. I can’t just apply this plan.”

So it will have these periods where we’ve backed off and auto-tuning isn’t happening, so it will be sporadic.

Great question! Yes, even if I clear my procedure cache, that isn’t just going to make the auto-tuning go away.

We can demonstrate that once I can get the auto-tuning back on. I think that would be a really fun demo.

Let’s see. Let’s just look for a check. Yes! So, one of the reasons I like the graphical report is I can easily just see if a check mark is in place. I could also query that recommendations DMV and just see when the status changes too. So I’m going to go back to our demo and it finished here. It does take 18 to 20 times for it to kick in.

What we’re going to do now is look at the recommendations view again. So I had updated statistics, which it said “Oh, oh things have changed. This might not be faster anymore. I can’t auto-tune it.” And then I ran in enough again that it was like “Oh no, you’re slow. I’ve had a better plan in the past “I’m going to go ahead and verify this again.”

So we are now, last good plan is on and it’s in the verification phase now, making sure, okay, I am going to make sure this is faster, looking out for maybe it’s more complicated than I thought but I’m going to go ahead and auto-tune this.

And let’s go back up and let’s run this guy just once, not twenty times.

We’re going to run it once and then I’m going to clear the procedure cache.

I’m going to turn on actual plans. I’m going to give it one run. Make sure we get our fast plan right? I

t is being auto-tuned so, yes, in fact we are getting our fast plan with the window aggregate operator. And now, let’s just go nuclear, let’s to DBCC FREEPROCCACHE. We’re going to clear not just the cache for this database, but we are just, we can’t be stopped. Nobody else is using this instance, it’s just me. We’re clearing out the whole procedure cache, right?

But, will auto-tuning still work? We’re going to go ahead and make sure we highlight only the plan and I’m going to go ahead and run this and dun-du-du-dun, yes! The fact that Query Store is separate than the procedure cache is part of this magic. It’s saying, based on your aggregates in Query Store, I am auto-tuning this plan and that isn’t just grabbing the plan in the cache.

It’s its own thing. So just clearing out the cache doesn’t make the auto-tuning go away.

I got my fast plan here and if I look at the properties use plan is true. If you don’t want auto-tuning just don’t turn on the auto-tuning. You can still look at these suggestions. And I actually didn’t realize until very recently that the suggestions would still show up if you didn’t have the auto-tuning turned on in Enterprise Edition. And I think that’s just awesome that even if I don’t have the auto-tuning turned on I can see those suggestions and act on them myself because that is definitely a feature that I am excited about using.

John says “Pay attention to the join type and force it in the SQL Server query.”

That is an option that you have of using a join hint. Yourself, I do use join hints sometimes. They are– join hints I kind of see as my last resort, like a recompile hint. It’s kind of on the end of my, because if I use a join hint in the T-SQL it limits my flexibility over time. Like, maybe a nested loop is better now but if my data set changes significantly maybe it won’t be in the future. So I usually try to look for other options first but that is one of the options on my list to consider.