Let's draw out the problem with this partitioned index (6 minutes)

The physical structure of the indexes are different

Some drawings and dinosaurs explain why the partitioned index is used differently in our execution plan.

Transcript

A breakdown of how the non-partitioned index on BirthYear works

Here’s a drawing of our nonclustered index on BirthYear on the table that doesn’t have any partitioning. Just a plain old rowstore index with one key column on BirthYear.

The drawing is simplified. There are way more pages in the actual index than I can draw here. I’ve drawn little pages to represent 8KB pages, but this isn’t the exact number of pages because I’ve got high definition video, but you know it’s not that great ;)

It does have this number of levels though, it has two intermediate levels between the root index page and that leaf of the index that has the largest amount of pages in it.

Since this index is sorted by BirthYear, when we run our query and say we want BirthYear between 2001 and 2015 SQL Server can very efficiently go to where the data it needs starts. It was enough pages that it decided to go parallel, it can very efficiently divide the work up between those threads.

We’ve got a Stream Aggregate operator for the non-partitioned index

One really cool thing about our query is we’re grouping by BirthYear and we want to do a count. SQL Server figures out that the way this index is laid out, it is always going to go from low to high. So, if I am reading the year 2003, if I’m a thread and I’m picking up all the data for 2003, as soon as I start to hit data for 2004 I am done counting for 2003, it’s complete.

That information that the data is physically sorted in the index is extremely useful. SQL Server takes advantage of it for counting and it counts the data as it streams through and it flows through the query plan. The stream aggregate operator counts and data can pass through it, forward in the plan, because it knows when it has finished up certain years.

SQL Server can only use the stream aggregate operator for things like this when it knows the data coming in to it is sorted.

We changed that sorting when we partitioned the index

On our partitioned table, our index is on BirthYear, and this was created as an aligned index.

When you create an index against a partitioned table if you don’t specify differently – if you don’t specify, “hey go create this on a specific filegroup or a different partition scheme,” SQL Server will just assume you want it on the same partition scheme that the table is on and it will align and partition the index so that it matches the base table.

This is really useful because then we can do things like switch [partitions] in, switch [partitions] out. Great. Our index is keyed on BirthYear but it is partitioned on a different column.

The different column is FakeBirthDateStamp.

Essentially every partition is a little balanced tree of pages. It’s no longer one big tree. We’ve got a lot of little trees and our partitioning column is different from our index key column. BirthYear is a different column and even though BirthYear is a computed column that is based on FakeBirthDateStamp– who is the partitioning column– SQL Server doesn’t understand the relationship between them in this context.

When it is querying this table we did not get partition elimination. It did not figure out, you’re a computed column so you’re only going to read certain partitions. It does not know that.

It just sees… Okay well your index is by BirthYear, so each of these little B-trees is sorted by BirthYear, but it’s partitioned by a different column. It thinks it might find data in any partition.

Luckily, it is pretty efficient when it goes into a partition and it doesn’t have data that qualifies, it doesn’t have to seek in this case. It doesn’t have to look very far. It very quickly is like, you are not the data I am looking for. But it’s still accessing that partition and it thinks there could be data there.

We can’t use that stream aggregate operator against this partitioned index on BirthYear

Because it’s not sure of the overall sort order of the index, it can’t stream the counting process into that stream aggregator.

It’s not guaranteed a perfect sort because it just doesn’t get it. In this case, because it’s like, “I could find data in any partition,” the individual partitions are sorted but there could be data over there, it puts data in to buckets in memory to be counted.

Once we have the data all bucketed out and we’ve sort of put the data in the buckets then we can proceed to count it but we can’t just let the data flow through because since we might find data for these buckets anywhere in the index. We can’t just pass that information on until we’re all done. This aggregate looks a little different in the plan. It’s the Hash Match partial aggregate.

We’ve got to fill all of the buckets before we let any data pass through the operator further into the plan. The Hash Match partial aggregate operator is a blocking operator.

We’ve got to finish it before we can move on and that’s part of why our query isn’t as efficient as it was when we had that guaranteed sort coming in to the stream aggregate operator.

All hope isn’t lost for our query against the partitioned table

We can tune this guy and we can make it faster. We’ve got a couple different options. Let’s take a look next.