The simplest query isn’t always the least work
I know, it sounds crazy.
Let’s dig in a little more to Query Time Stats, and I think this is a really fun example, because it shows how a query that looks like it would be more work might actually be a little more efficient.
I have two queries here
The first one is just saying: I want to get a list of the StateCodes, and I am throwing them away
I don’t want to have the overhead of Management Studio actually displaying results, so I’m declaring a local variable of StateCode and then assigning the values to it, which is essentially the same as saying, “throw away the results.”
I do still want to get things like my STATISTICS TIME information on those, so I’m letting Management Studio still output things.
So, my first query says, just get the StateCode and group by StateCode.
This is effectively the same as a distinct type operation.
My second query’s very similar, and I’m still throwing away the results using local variables, but I’m saying I want the StateCode as well as the count per state
So, really the difference is in the second query I also added a count star that I just didn’t even ask for in the first query.
I’ve got my actual execution plans on, and I’m going to go ahead and run this
What you’ll see pretty consistently with this is:
- My first query that didn’t ask for the count star, I used about 1.5 seconds of CPU time and it took about 580 milliseconds.
- My second query, which did ask for the count star, used only 578 milliseconds of CPU time compared to 1.5 seconds.So, it used way less CPU time and the elapsed time was faster.
This is, if we look at the execution plan, I’m going to show you the Query Time Stats for them.
- The first query– I click around so it shows me what I want– my Query Time Stats for the first query, they match up, right. It sees and agrees with the STATISTICS TIME output. Elapsed, it’s close. So it agrees-ish. My CPU time is 1.688 seconds and my elapsed time is about a little over half a second for the first one.
- For the second one, we have just 605 seconds of CPU time, so “ish”, and then 287 seconds elapsed time.
Why is the second query faster? This second plan was even estimated to be more expensive, right?
Our first plan was estimated to be cheaper than our second plan. It just didn’t quite turn out that way. Well, why is that?
Looking at the properties of the columnstore index scan, this columnstore index scan, if we go up there, we’ve got our funky Actual Partition Count of zero.
Notice that there isn’t any magical predicate pushdown, there’s no locally aggregated rows on the first query. It just pushed all the rows that it had over into the hash match aggregate.
Now, the hash match aggregate, it is in batch mode, and it did a great job. I mean, you can’t fault the hash match aggregate! But, it did take up some elapsed time and some CPU because it had to do some work. In the second query, the columnstore index scan, when we look at the properties of it, because we added that count star, that triggered SQL server to be like: oh you’re doing a count!
We are going to do some locally aggregated row magic, and it only had to push the rows in the delta store over to the hash match operator. So, the hash match operator actually had less to do this time. Our hash match operator, hey, you know, we only have to count the 800,000 of them.
And when this all added up, the query– the second query who did the count star– actually because it had the magic conjured of doing the locally aggregated rows, it was able to offload more work into the columnstore index and have less done by the hash match operator– which even though it’s in super cool batch mode, if you actually have to have it do less, it turns out that’s a little bit faster.
So, we have a case here where the more complex query, or the query that asked for a little more, was faster than the query that didn’t ask for as much
It’s because just the syntax of count star triggered that aggregate predicate pushdown.
So, I think that’s very interesting, and I do love that we now have Query Time Stats available in our actual execution plans. I think it’s a great, great way to see: how much time did you take in this case?