Why hints aren't "suggestions", and a tour of our slow query (12 minutes)

Hints aren’t suggestions

“Hint” sounds like a suggestion, but usually SQL Server can’t just ignore your hint if it’s not a good idea.

We’ll be using hints to tune a slow query. In this video you’ll get to know our problem query and its execution plan.

Transcript

SQL Server offers a wide variety of query and table hints for you to use to influence how your query execution plans are generated.

The word hint sounds kind of squishy, and when I say influence, that sounds kind of squishy too.

Make no mistake: when you use hints, you are taking a very strong stance

Let’s say you use a join hint or an index hint, you’re really directing the SQL Server about what its options are. And let’s say I put an index hint in my code today, and everything works fine, it improves performance.

But, tomorrow, someone else comes along and they’re tuning indexes, and they drop the index that I had hinted. It’s not there anymore. The next time SQL Server goes to run my query, it’ll see, hey, you directed me to use an index with this name but I don’t see an index with this name. Query is going to fail. Because SQL Server cannot do what I directed with that index hint.

So I’ve got to be really careful when I put hints in place and make sure that something isn’t going to cause a problem with my hint down the road.

Be it someone dropping an index, or the data changing, so that the hint doesn’t make the query faster anymore. Now it makes it slower, or if I have a parameterized query, just running the query with different parameters, that hint may not do the same thing, or have the exact same effects as the way that I tested it.

So I’ve got to be very careful and test very carefully to make sure that I’m not hinting something that may become impossible in the future, or something that may not perform as well in the future.

These risks being known, even so, I think using query hints and being familiar with them is extremely valuable, especially as part of your query tuning process

Using hints in SQL Server can teach you a lot about how your query is being processed, and why SQL Server is making the decisions that it’s making in your query execution plans.

Let’s take a look at using hints while tuning a query. We’re going to get to know our slow query as well as the execution plan that’s being used when the query is slow.

The first thing I’m going to do is turn on some diagnostics

I’m going to make sure I’m using the right database, and then also, run this statement, set statistics time on, that tells SQL Server for my session, after I run a query, put information in the Messages tab about how much CPU is burned, and how long the query took. So let’s go ahead and execute that, so that I’ll get this information in Messages. I also have pushed this button to include actual execution plans, so that after I run queries I’ll see a map of how the query was processed, along with actual counts of rows, how much data actually flowed through different operators in the plan.

Here is our slow query…

At the top of the query is a CTE called NameRank, and this uses a windowing function, it uses the rank windowing function and it runs against the table agg.FirstNameByYearStateWide. This is a table with aggregate level data, broken down by report year, state code, and gender. For every given first name ID, it lists the total name count of babies with that name for that report year, state code, gender and first name ID.

And what we’re doing with rank here is we’re saying, we want, for those groups, for a given report year, state code, and gender, we want to rank the babies based on the name count column descending. So the babies who have the most names will be ranked number one. The most popular baby names for that given year, state in the United States, and gender. Later on in our query, we are pulling from the NameRank CTE and we’re saying, we only want the babies who are ranked number one. We only want that number one most popular name for each of this combination.

We’re also joining back to the agg.FirstNameByYearStateWide table. And we’re doing that because we have a requirement in this query to pull back a bunch of reporting columns. Now, these columns are just fake data that I put in this table to simulate a wide table with a bunch of descriptive columns, some of which are pretty big. S

o, we not only need to see what are these most popular names, we need to pull back all these descriptive data on them as well. We are also joining to the ref.FirstName table so that we can get the actual first name to display instead of that first name ID, so we have more meaningful information.

So, let’s get our query running, and we can see the data as it streams out, while it executes. We start getting data back, but this takes awhile to complete, but while it’s running we can browse the data a little bit and see that we have some kind of interesting data here. And it does allow for ties. So let’s say, in 1910 in Nevada, we didn’t have a lot of people in Nevada in 1910 in the United States, but there were some names reported. And for the boys, William was reported as having 10 names. That was the number one name. And there was a tie with John, who also had 10 names.

For the girls, there weren’t any ties. Mary had 10 names.

But you can see it’s kind of interesting looking at the different states, you know, there were a lot of similarities in the popular names with some exception, and then they change over the years. Looking at our Messages tab, we can see that we got 10,903 rows back, and that our query used about nine seconds of CPU time, and about 12, just under 13 seconds of elapsed time to execute this query. So, it was pretty slow.

Looking at our execution plan…

The first thing that I notice here with this shape is that I have a warning sign on my Select operator. We’re going to dig into that, but I have a Hash Match here, and then I have a Merge Join. So I have two kind of, two joins that are going to show up if SQL Server is handling kind of larger amounts of data, and needs to do a little bit of heavier lifting.

Looking over the right side of my plan, I can see that there are Segment operators, and a Sequence Project operator over here. Those are operators that I get with the windowing function. So my rank of the babies is being handled in this region, or zone of the plan, it’s feeding it into a Filter operator, and then a Sort, and when I join back to get those reporting columns, it’s decided to do a Clustered Index Scan of the whole table, and then pull that back in and do a Merge Join with the results of the windowing function. Let’s look at the Properties on this Select operator.

Now, I’ll have to click around to make sure that I’m seeing the properties of the Select. And if I scroll down here in my Properties pane, I can see what that warning is all about. Let’s pop up the warning and see what was happening. SQL Server says, hey, I notice, since this is an actual execution plan, I notice that I gave this query a really pretty big memory allocation, more than 600 megabytes, but I only ended up using about eight megabytes of that memory. So, I gave it more memory than I actually needed to in the end, and why is that?

Well, if I hover over this pipe, I can see that the estimated number of rows, at the end of the query, when it was about to return the data, it thought at that point that it was going to be returning about 5.7 million rows. But actually it was only like 10,903 rows. Where did that misestimate come from? If I look at this pipe, we still have the misestimate coming out of the Merge Join.

And if I go over here, coming out of the Sort, okay, we’ve still got the misestimate there, coming out of the Filter, we have the exact same problem, but if I go to the right of the Filter, it did have a really good idea of how many rows it was going to feed into that Filter operator. So, this Filter is around where things started going weird. And if I look at the Properties of my Filter, and scroll down, I can see the Predicate on that Filter. What it was filtering on. It was filtering on expression 1,001 equals one.

Well, who’s expression 1,001? I can click around and find the operator who has that name, but if we just think back to our query, our query is ranking the baby names based on a group of columns it’s partitioning by, and then ordering by name count descending, but it says we only want the names who were ranked number one.

This is where we calculate over here on the right, in this branch of the plan, we’re doing the windowing function, and then, in the Filter, it’s ranked everyone, and now it’s saying, I want the names who are all ranked number one. So, SQL Server is having a hard time estimating how many rows are going to get that number one rank, and that’s not a simple thing to estimate because we have to actually break things into those groups, we have to say, okay for a given report year, state code, and gender, and then name count descending, we’re going to do this rank, we have to perform a lot of gymnastics to do this rank.

So, guessing how many number ones we’re going to have, since we do allow for ties, is not a simple matter.

And SQL Server is guessing that way more things are going to be ranked number one than actually are ranked number one, resulting in a significant over allocation of memory, and also, a pretty darn slow execution.

Now, we’re going to use bunch of hints, to see how we can speed up this query and we will start digging into that next.