Tuning the aggregator query (12 minutes)

Let’s speed up that partitioned query

There’s a couple of different ways to get the speed of the original query.

Transcript

I’ve got a couple of different options for fixing this. There are some trade-offs.

Option 1: a non-aligned index

The first option is to create an index on my partitioned table and make that index not partitioned. I have already run the command that does this, because it takes about a minute and a half. The index has the name “nonaligned”, which is not really a good way to name an index, but it describes it in that this index is going to be a nice, single rowstore B tree that is not aligned with the base table, the base table is partitioned by year.

It has the exact same key column as our partitioned nonclustered index.

The way that I made it not partitioned is I told it, told SQL Server, “hey, go create this on a filegroup”. Doesn’t have to be primary, putting all your stuff in primary is NOT a best practice, but I said, “don’t put it on the partition scheme that the base table is on.”

Now that I’ve done that, if I run my query again, and I’m not even going to put in the ‘IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX’ hint. Let’s see what it just does when left to its own decision now against the partitioned table. I didn’t even highlight that part, let’s see what it decides to do.

It’s executing, it’s executing, oh, I didn’t turn on actual execution plans.

Let’s click that button, turn the plans on, say “will you please show your work, SQL Server? Show me what you did.” And let’s [run it again and] take a look.

It is really excited about that columnstore index

It’s like “really? I want to use it and scan it and figure out that value for every single row, it’ll be great!”

[But it’s slow again.]

Again, we hint away the nonclustered columnstore index

So let’s run it again, and say, “I know you love your columnstore, but maybe just don’t use it right now. Maybe just not right now.”

Looking at our plan, I got an index seek. I’ll click on this guy, look at the properties pane and let’s see if we can find the name.

Here is the object, and sure enough, it’s the nonaligned index.

The other way I can tell is that I got my stream aggregate operator.

Well how long did the query take? Let’s look at our select operator, look at the properties and go into our query time stats. It took just over three seconds. Let’s give it another run, give it another chance, because I’m not totally sure how warm the cache was for it. It’s doing the exact same operation that our query was against the nonpartitioned table, and sure enough, it either was an IO issue or just it had to wait to get to some of the CPUs on my VM. On my second run I’m faster than my original, nonpartitioned table query was.

We’re within margin of error here, this is also the same plan I was getting against the nonpartitioned table, because I’m using a nonpartitioned index on my partitioned table.

There are downsides to adding non-aligned indexes

I don’t actually love this fix, because as soon as I put a nonaligned index on my partitioned table, while that index is there, I cannot switch partitions in, switch partitions out, or truncate any partitions.

That makes sense because when I have all aligned indexes, when everything is, every index on the table is divided into chunks in the exact same way, I can do things like say, transfer this chunk of data out to somewhere else, or transfer a chunk of data in. It’s all lined up. But now that I have this nonaligned index, let’s say I try to run this command, that says “I want to truncate just partitions one to four,” and I did like one to two on my hands instead of one to four the ability to truncate specific partitions, we got that in SQL Server 2016. Very cool feature.

When I try to say, “hey, let’s truncate some partitions,” it says, “no can do.”

My nonaligned index, which I so beautifully named, is not partitioned. But the table uses this partition function. And it’s kind of nice that it tells me what it is, you know, just in case I didn’t know. They have to use an equivalent partition function. Technically it doesn’t have to be the exact same function, but it’s got to have the same boundary points in order to be able to do these partition level operations.

I don’t love adding nonaligned indexes, because it means that while that index is enabled you can’t do this cool stuff, like switching, that is one of the biggest reasons we want table partitioning. I would rather have a fix that doesn’t require this, if at all possible.

Fix one, don’t love it. In some situations, it might work, good to know about, but I would rather fix the query if I can.

Option 2: Can we rewrite this query to get partition elimination?

I was using that BirthYear column before in my where clause, and we weren’t getting partition elimination.

What if I rewrite my query, and say, “I’m just going to go ahead and translate this into the column that is the partitioning key.” I want the rows that are on or after January 1, 2001 and before January 1, 2016.

And I’m kind of lazy, so I haven’t even changed my BirthYear in other places.

I really just want to show you that this where clause – adjusting the predicate is important. I’m going to say, on this first one, “What performance can I get using the partitioned rowstore index on BirthYear?”

Because I dropped that nonaligned index. I got rid of it.

And I’m telling it to ignore the nonclustered columnstore index. Let’s see how well that partitioned rowstore index does if I try to get partition elimination in the where clause.

My query’s still running… my query’s still running… That wasn’t so awesome.

If I look at my properties here, and I look at my elapsed time, we’re at six seconds.

If I go to the right in my query, we are still doing that hash match aggregate.

If I highlight my index scan here, I can see that my predicate rewrite worked in that I did get partition elimination now.

My where clause rewrite. I only touched 15 of the partitions in the table and it lets me know that they were numbers 37 to 51, but this is still slower. You know, I didn’t fix the overall sorting issue in the index, where it’s like, yeah, I’m not confident that the data between all the partitions is in the same order as BirthYear, so we still are using the hash match aggregate which, in this plan, is slower.

What if we let it use the nonclustered columnstore with the rewrite that eliminates partitions?

What if I adjust the query so that I am getting partition elimination, and I let it use the columnstore index?

It’s still going to have to compute BirthYear, for every row, but there’s going to be a LOT LESS rows because it’s going to eliminate a bunch of partitions. Let’s see how this works.

Execute this guy, and that came back awfully fast. Let’s look at our properties here. Scrollin' on down.

Our ElapsedTime, this is the fastest one that we have seen to date. We are just over two seconds, our CPU time is around eight seconds, and going to the right here, we did get a columnstore index scan.

However, we now get information about partition elimination on it. It only touched 15 partitions, 37 to 51, and it’s fast.

Columnstore indexes are really cool. It makes up so much speed on scanning the data out of the columnstore, and now really only scanning the data that it needs, that in this case it’s still overall faster although it’s got to do the hash match aggregate operator.

That’s why I don’t want to make it sound like “ooh, if you see a hash match operator, it’s always bad,” It’s not! In the case of our original query, the stream aggregate operator was just really awesome.

In the case of the query we’re looking at here, our nonclustered columnstore index with partition elimination is really awesome. We are actually a little bit faster than our original query, but it did take rewriting our query to help it get partition elimination. Part of this is also to consider that we currently, in columnstore indexes, we can’t put the BirthYear column in there.

It would be even faster if we could put our computed column into that nonclustered columnstore index of course too, we don’t have that magic with computed columns at this point. Quick summary of the options for tuning the problem aggregator: whenever you can look at the query and say, “Did it get partition elimination?” it is absolutely worth asking that question, because if your query can get partition elimination– and it’s not always suitable for every query, but if it can– you will often get a nice speed boost for that query.

I like this option better than the nonaligned index option, just because nonaligned indexes can literally be a drag.

If you want to switch in or switch out or truncate a partition, you have to drop or disable all the nonaligned indexes. That can make things slow for anything who depends on those indexes who might run before you can recreate or rebuild them, and the process of recreating them or rebuilding them can be slow and painful, use a lot of CPU and IO resources on the instance.

Still, it’s good to have multiple options, because sometimes you can’t rewrite the query.