Understanding avg_total_user_cost and avg_user_impact in Missing Index Requests

on February 22, 2017

The hardest thing about looking at index requests in SQL Server is understanding what the column names mean.

Here’s a simple query that shows missing index requests for a database

When SQL Server optimizes a query, sometimes it guesses that an index might speed it up. A little green hint shows up in the execution plan for the query, and the information is recorded in the dynamic management views for the database

You can get to the aggregate information with a query like this:

SELECT 
    d.statement as table_name,
    d.equality_columns,
    d.inequality_columns,
    d.included_columns,
    s.avg_total_user_cost as avg_est_cost_of_requesting_plan,
    s.avg_user_impact as avg_est_cost_reduction,
    s.user_scans + s.user_seeks as times_requested
FROM sys.dm_db_missing_index_groups AS g
JOIN sys.dm_db_missing_index_group_stats as s on
    g.index_group_handle=s.group_handle
JOIN sys.dm_db_missing_index_details as d on
    g.index_handle=d.index_handle
JOIN sys.databases as db on 
    d.database_id=db.database_id
WHERE db.database_id=DB_ID();
GO

But notice that I renamed some columns, and added others together.

Here’s why.

avg_total_user_cost tells you how “expensive” SQL Server thought the queries were who wanted this index

When SQL Server optimizes a query, it does a quick estimate of how much work it’s going to be. This is called the query’s “cost”.

Cost is always an estimate in SQL Server. The cost isn’t updated after the query runs to reflect how expensive anything actually was. Even if you’re looking at an “actual” plan, the “actual” parts are just rowcounts.

And “cost” isn’t a measurement of anything we’re familiar with, like seconds. I like to think of it as just “QueryBucks”, the weird currency of the SQL Server optimizer.

The avg_total_user_cost column tells you whether SQL Server guessed that these queries were very cheap (say .0009 querybucks), or more expensive (say 500 querybucks).

It doesn’t tell you anything about whether or not that guess was correct. The optimizer sometimes vastly under or over guesses cost.

Hey, optimizin' ain’t easy! And it’s also got to get done fast.

avg_user_impact tells you a percentage that SQL Server guesses the index would reduce that estimated cost

This column is a guess about a guess!

Let’s say the query plan that SQL Server settled on for my query had an estimated cost of 500 querybucks. But SQL Server thought an index might make the query faster. It does a quick guess about how much faster it will make.

So if you see 99 for avg_user_impact, SQL Server thought that the index would drop the cost by 99%. Usually a high number like 99 in this column means that the query was scanning a large index, and the optimizer thought an index could allow it to do a very targeted seek operation instead.

I just add together user_seeks and user_scans

It’s not particularly meaningful to me whether the requesting query was doing a seek or a scan operation.

It sounds like a big deal, but you can have a seek operation that actually looks at every row in the index. You can also have a scan operator that looks at only a few rows. (It may feed into a TOP operation that stops after just a couple of rows.)

You can’t judge a book by it’s cover, and you can’t judge an index operation by its type.

However, the combination of seeks+scans is an interesting number! That’s the number of times a query ran that SQL Server thinks user queries would have benefitted from this index.

I don’t rely exclusively on the missing index DMVs, for good reason!

The missing index DMVs can give you a rough idea of the tables that may benefit most from index tuning, but you shouldn’t rely on them completely for designing indexes against a live database. The feature is just too limited.

The missing index feature in SQL Server does NOT:

  • Recommend key column order in an index (this is a big deal!)
  • Make any recommendation for queries that qualify for ‘trivial’ optimization - which means you could miss some dead simple opportunities to speed up frequent queries
  • Suggest filtered indexes, clustered indexes, columnstore indexes

You don’t have to take my word for this - all of this is right in the documentation on the feature.

For this reason, I don’t make decisions about index tuning just from the missing index DMVs. I always look at top execution plans before making recommendations.