Comparing execution plans (7 minutes)

How can I see that IO information in actual plans?

The per-operator performance statistics I show in this video, focusing on IO statistics information, is available in SQL Server 2014 SP2 and higher, using current versions of SQL Server Management Studio. Read more about it here.

Transcript

When it comes to performance tuning, one of my questions is always, “What can I learn from the execution plans for this query?”

Let’s take a look now at what plans are we getting when this is slow and when it’s fast, and what we can learn from them.

We’re going to first dig in to the execution plan for the query using the user database table

I’ve got actual execution plans turned on using this button. You can also do control + M to enable that.

Now I’m going to execute the query that loads a million rows into a user database table and then updates 800,000 of them based on the parameter I’ve passed in. This did have the expected duration. We have about 1.3, 1.4 seconds of duration for the update statement this time.

Looking at the execution plan for the procedure, at the top I’ve got a plan for the insert– for inserting the million rows. That makes sense, that’s not really what we’re concerned with.

For the bottom query, I got what’s called a wide plan for the update of the rows. For a wide plan I have one step here that’s doing a clustered index seek to find the rows it’s going to update.

When I look at the properties of this, in my actual plan I’ve got the clustered index seek highlighted there, I can see actual IO statistics for this. I did my 53 and a half thousand logical reads on that operator doing the clustered index seek. Okay that finds the rows.

Here’s the clustered index update. Looking at the actual IO statistics for that operator, I have already found the rows. It did not have to do any additional logical reads for it. On this fast plan for the user database table, all of my IO shows up on that clustered index seek.

Do we see anything different when we use the temporary table?

Well let’s find out! I’ve still got my actual plans on, and I run the procedure again.

It inserts a million rows into the temp table then updates 800,000 based on the @RowID value I passed in.

And once again the duration of the update of the temp table is much slower. Five and a half seconds there.

Looking at the execution plan for this, I do have a similar looking plan. I have a clustered index seek broken out here as its own operator. Well let’s look at how many logical reads that operator had to do!

Its actual IO statistics– this isn’t the same at all. We did 2.4 and a half million logical reads. Like wow, we did way more logical reads finding the rows to update in the temp table.

I did run some additional queries to say– do I just, do I have way more pages in this? Somehow is my temp table getting a lot of empty pages? And it wasn’t. I found that my temp table was in fact getting the expected amount of pages, it is very similar to the amount of pages in the user database table.

But wait, there’s more. If I click on the clustered index update, and I look at its IO, this is different as well.

The clustered index update operator is also showing that it had to do logical reads. And not just a few! Millions of logical reads. We already found the row, why am I having to reread the rows again? There’s definitely something different when we dig into the IO characteristics of the plan.

The table variable is also fast

Let’s look at the table variable, who as long as we don’t have a recompile hint, is just as fast as the database table.

And its update execution plan is different. It has a narrow plan. Looking at this you’re like, “The table variable is so fast, it doesn’t even have to find the rows!” It just mystically knows where they are.

Well what it has done is: it’s done a narrow plan and collapsed the step where it finds the rows into the clustered index update operator. If I look at my tool tip on this, notice that I do have a seek predicate on my clustered index update operator.

It does still have to find the rows. It just has simply taken that operation and it’s not breaking it out as a separate operator with that compute scalar in the middle.

Instead it’s like, “No, I’ll just seek in the update operator itself.” It’s chosen a different style of execution plan based on different row estimates because it’s a table variable.

What about the IO? Looking at the properties of this operator, the actual IO statistics are very similar to what we get for the user table. We saw that before in the statistics IO output, but now that I’m looking at the actual plans I’m able to see it on a per operator basis, which is really cool.

Once I add that recompile hint in…

and my update of the table variable says “with recompile,” This gives SQL Server more insight into how many rows are actually in that table variable, and this changes its decision. It decides: I am going to go to a wide plan.

Our clustered index seek does 2.5 million logical reads / 2.4 and change logical reads. When I go back and look at my clustered index update, yeah we’re getting that extra IO. Why are we having you do all this logical read IO again in this case?

We can learn a lot from these execution plans, but the information is interesting because this wide plan is sometimes doing extra IO. We also have a narrow plan we saw with the table variable that is very fast.

What we’ve got here is a complicated problem.