Partitioning and columnstore concepts (10 minutes)

Two important columnstore facts to go along with the video

  1. The maximum rowgroup size is 1,048,575 rows.
  2. The threshold for bulk inserts is 102,400 rows. When a bulk insert is smaller than that, the rows go to the delta store. A bulk load this size or larger may be compressed right away.

Learn more from Books Online at https://msdn.microsoft.com/en-us/library/gg492088.aspx

Transcript

Welcome to the Weird, Wonderful World of Execution Plans: Partitioned Tables and Columnstore Indexes. I’m Kendra Little from SQLWorkbooks.com.

In this session we’ll be talking about weird little things in execution plans that can make a big difference in understanding what’s going on when you’re using table partitioning– whether it be a disk-based rowstore partitioned table, or perhaps if you are using a columnstore index on a partitioned table: maybe it’s clustered, maybe it’s nonclustered.

Here’s what we’ll cover

  • I’ll show you how to look at an execution plan and tell: “Did I get partition elimination in this query?”
  • I’ll show you weird cases where you may look at the partition count reported and say, “That can’t possibly be true,” and how to decode what’s really happening there.
  • We’ll talk about rowgroup elimination in columnstore indexes and some cases where you might see 0 rows passing around in an execution plan and it’s not really 0 rows: why that is, and why it’s showing you 0 rows.
  • We’ll also talk about batch mode operators versus row mode operators in Query Time Stats in execution plans: how it appears differently for these two kinds of operators, and when batch mode operators can give you really big advantages.

The sample database that we’re working with in this session is the BabbyNames sample database

It contains information on the names of babies born in the United States. We are using a larger version of the downloaded database. There’s a couple versions available online, and the demo scripts point you to the larger one because it comes with a pre-built large partitioned table that we can play around with and explore these execution plans.

Some quick background on partitioning and columnstore

When we partition a table, we are dividing it into chunks.

The table that we’ll be working with in our demos has a partition function that is ‘right based’

In other words, the boundary points defined in the partition function use ‘lower’ boundaries. You decide this when you create the partition function: whether you are using lower or upper, or right or left in SQL Server’s terminology. We have a right based function in this case, in other words, a lower boundary.

Our partitioning column is a datetime2 column

It’s called FakeBirthDateStamp, because our partitioned table is actually faked out data. I don’t have the real birth dates for all the baby names, but I created a table where there is one row for every baby born in the given range. Since I faked out their birth dates, this column is named FakeBirthDateStamp.

This table is partitioned into years

The boundary points are January 1st, 1966, January 1st, 1967, and so on, up through 2015. The table’s clustered index is disk-based rowstore, so each chunk of the clustered index is a b-tree. Now these are all made up pages, and of course the number of pages you see in the diagram on here is not representative – the table is much larger, but each chunk is a nice, balanced b-tree.

If we are querying this table based on the FakeBirthDateStamp column and we are able to get partition elimination, that means that SQL Server is able to say: I don’t even need to look at some of these partitions, I only need to look at the ones that are relevant to my query, when we are able to eliminate partitions.

We have aligned an nonclustered index

I have in our example created a nonclustered columnstore index on the table and it is partitioned using the same partition function. It gets aligned with the base table.

I’m using SQL Server 2016, so this means my nonclustered columnstore index has some really cool features: it’s writable

When data comes in– let’s say in our example, let’s say that we did get some corrected data from 1966. There’s a whole bunch of babies from 1966 that we didn’t have rows for, and we’ve just inserted them into the table. SQL Server is able to use something called the delta store.

If we’re just inserting a few rows, it will put them in a little b-tree structure. It will start out little, a little b-tree structure, and that will hold the rows until it has enough rows – if it hits the maximum size for the rowgroup, it will then say, oh I’ve got so much in this delta store that I’m going to go ahead and compress it into a compressed rowgroup.

These rowgroups– this is columnstore data, it’s based on nice, compressed columns, so that if we want to scan an entire column we have this compressed chunk of data that we can consume. Things won’t necessarily always be in the delta store. When we get a lot in the delta store, it will be compressed down into a rowgroup, and rowgroups are made up of column segments.

We’ve got a lot of terminology going on here.

Also, by the way, if we do insert– if we bulk load a big chunk of data, more than about a 102,000 rows {102,400 to be specific}, if we are over a threshold when we are bulk inserting data we may skip that delta store altogether. The data we put in is a big enough chunk that it gets compressed into a rowgroup right away. But if we’re just inserting a few rows at a time, that wouldn’t be that efficient to have this tiny compressed rowgroup. We don’t want tons of tiny rowgroups, we want nice, compressed chunks of data that are efficient to consume.

So we’ve got a delta store, we’ve got rowgroups, we’ve got segments, and it gets even more complicated. There is also, if we are changing data that exists already in the table– for updating data or deleting data: we can think of an update as being an insert and a delete combination.

When we are deleting, we have to track those too. If we have things in the columns being deleted, it doesn’t just go in and actually delete them from the columnstore right away. It will track those in a deleted row b-tree, the identifier of the rows that have been deleted. Now I haven’t pictured that on here, the drawing was already getting so complicated.

In our demos I’ll be talking about inserts…

I’ve done some inserts into the table, so we are not making it quite as complicated as we could make it in the demos today. I think it’s still complex and magical enough.

If you want to take the demo even further and start experimenting with, “how do all these queries look after I add in a bunch of updates and deletes?”, that is absolutely a fun way that you can extend the demo scripts you are going to get with this session.

We’ll also be looking at operators and talking about: are they in batch mode or are they in row mode?

Batch mode is a cool thing, a cool feature that we got when columnstore was introduced. Not all operators support batch mode, but the ones that do work on bigger bites of data, they can consume bigger chunks of data.

Batch mode operators can work with batches that are up to 900 values at a time, and they make more efficient use of metadata and can improve CPU utilization when we’re working with lots of data. When we want to scan, when we want do aggregates, when we are working on big chunks of data.

These operators, currently, right now my demo will be on SQL Server 2016 Service Pack 1. Right now batch mode is only available for operators in queries that reference a table that have a columnstore index on it.

It’s possible that SQL might choose not to use the columnsotre index, but still use batch mode for the operators, and we’ll have some examples when that happens. It’s really weird but that can actually happen. But there has to be a table with a columnstore index referenced in the query, currently.

Maybe that won’t always be the case in the future, we’ll have ways we can have batch mode more easily even if we don’t have a columnstore index around. We shall see. For now though we are going to dive right in and start looking at execution plans in this weird, magical world of columnstore indexes and partitioned tables.