Archive | Execution Plans RSS feed for this section

There’s more than one way to skin an eggplant: Using APPLY for calculations

Choices, choices

Here’s a little TSQL snack. I picked this up in a presentation by Itzik Ben-Gan at the PNWSQL user group recently, and it’s become a fast favorite.

CROSS APPLY and OUTER APPLY- another use

The APPLY operator is perhaps more flexible than  you think. You may already know that you can use it to inline a function, or to replace a join.

But wait, there’s more! You can also use APPLY to perform calculations and simplify your query syntax– this is because the APPLY operator allows you to  express a calculation that can be referred to:

  • in further joins (which may or may not use APPLY)
  • by columns
  • in the where clause
  • in the group by

This is really helpful, because you can’t refer to the results of a computation in one column from anywhere but the ORDER BY.  This is because of the order of evaluation of parts of the statement.

I know this sounds confusing. It’ll make more sense in an example.

A sample query– the ‘before’ version

Here is a query written for the AdventureWorks sample database. There’s all sorts of examples that are possible for this, but I decided to go with one grouping data by month, using my favorite formula to round dates.

It shows the total quantity of orders by Product for an entire order month, for orders placed on or after 2004-07-01.

SELECT  DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth,
        p.Name AS ProductName,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh.OrderDate >= '2004-07-01'
GROUP BY DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0),
        p.Name
ORDER BY OrderDateMonth,
        p.Name

Notice that to group the date at the month level, we need to include the calculation in the column in the column list, as well as in the group by clause.

The query rewritten using APPLY for the calculation

This can be rewritten with CROSS apply to move the calculation into the JOIN area and only specify it once.

The benefits: this will simplify your syntax and reduce the chance of typos and errors, particularly when you need to go in and change the calculation. In cases when you’re displaying a sum in one column and showing a percentage using it in another column, this trick is *fantastic*. (Query numbers from the DMVs a lot? you’ll love this.)

Here, the calculation on the date is moved into the cross apply. It can be referenced as oh1.OrderDateMonth in both the list of columns, and in the GROUP BY portion of the query without rewriting the calculation.

SELECT  oh1.OrderDateMonth,
        p.Name AS ProductName,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
CROSS APPLY ( SELECT    DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth ) AS oh1
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh.OrderDate >= '2004-07-01'
GROUP BY oh1.OrderDateMonth,
        p.Name
ORDER BY OrderDateMonth,
        p.Name

What does the execution plan look like?

Click for a larger image

The execution plan for these two queries are identical.

In this case, the optimizer looks at these two queries and realizes the activities it needs to do will be the same.

Other options

You can create further CROSS APPLY or OUTER APPLY joins that refer to computations in prior joins.

You can also refer to the resulting computation in the where clause.

But be careful….

As with anything, you want to make sure you’re getting a good execution plan, and not shooting yourself in the foot with a new trick.

One big area to watch: although you can refer to these computations conveniently in the WHERE clause, you still want to be careful you’re using appropriate criteria.

For instance, if we were to change the example above to refer to the result from the CROSS APPLY oh1 in the where clause like this:

SELECT  oh1.OrderDateMonth ,
        p.Name AS ProductName ,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
CROSS APPLY ( SELECT    DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth ) AS oh1
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh1.OrderDateMonth >= '2004-07-01'  ---Don't do this!
GROUP BY oh1.OrderDateMonth ,
        p.Name
ORDER BY OrderDateMonth ,
        p.Name

… then in this case the query would not be able to use an index on OrderDate on the sales.SalesOrderHeader table, if one exists.

This is not specifically because of the CROSS APPLY, but because we are forcing SQL Server to apply the functions to every value to identify if it satisfies the criteria. That prevents a seek.

Comments { 2 }

Read from the Right End of the Index: BACKWARD Scans

Optimizing queries is the most fun when you don’t need to add indexes. There’s nothing quite so nice as finding a way to make reading data faster, without slowing down writes or creating new data structures that need to be maintained.

Here’s one way you can use BACKWARD scans to do this.

The Scenario: Clustered index on an increasing integer, and you’d like recently created rows

This is a common enough situation: you have a table with a clustered index on an integer value which increases with each row. You have another column which records the date the row was created.

You’d like frequently query the most recently created rows over some period of time.

The table has very frequent inserts, so for performance reasons you want to use the minimal indexes required. (And in general, this is the best practice.)

Question: Do you need to add a nonclustered index on the column containing the date the row was created?

Answer: Maybe not!

Getting the right clustered index scan

Say we’re working with the following table, which we have filled with five million rows of Tweetie birds. (Note: This generation technique is a tally table population technique which I found on Stack Overflow, which is attributed to Itzik Ben-Gan.)

CREATE TABLE dbo.Birds (
    birdId INT NOT NULL ,
    birdName NVARCHAR(256) NOT NULL,
    rowCreatedDate DATETIME2(0) NOT NULL )
GO	

--Insert 5 million Tweetie birds
--Make them as if they were all created a minute apart.
;WITH
  Pass0 as (select 1 as C union all select 1),
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT dbo.Birds (birdId, birdName, rowCreatedDate)
SELECT Number AS birdId ,
    'Tweetie' AS birdName ,
    DATEADD(mi, number, '2000-01-01')
FROM Tally
WHERE Number <= 5000000

--Cluster on BirdId. We won't add any other indexes.
CREATE UNIQUE CLUSTERED INDEX cxBirdsBirdId ON dbo.Birds(BirdId)

Say we would just like to see the maximum value in the rowCreatedDate column.

The most basic way to get this row is with this query:

SELECT MAX(rowCreatedDate)
FROM dbo.Birds

However, that leads to a table scan. We get lots of reads: 22,975 logical reads and 201 physical reads.

If we know we have a strong association between the BirdId column and the RowCreatedDate column, and that the highest ID in the table is the most recent row, we can rewrite the query like this:

SELECT MAX(rowCreatedDate)
FROM dbo.Birds
WHERE birdId = (SELECT MAX(birdId) FROM dbo.Birds)

This query still does a clustered index scan. But yet it does only 3 logical reads and 2 physical reads.

Looking in the execution plan, our query was able to use the extra information we provided it to scan the index backwards. It stopped when it had everything it needed, which was after a short distance– after all, it only needed recent rows, and those are all at one end of the table.

This backwards scan can be very useful, and can make using the MAX aggregate very useful.

But you usually need more than just the max value…

To see a bit more about how you extend this logic, compare these three queries:

Query A

This makes you think you need that non-clustered index: it does 22,975 logical reads, 305 physical reads, and 22968 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE rowCreatedDate >= '2009-07-01 05:00:00'

Query B

We can introduce the backwards scan by adding an ORDER BY BIrdId DESC to the query. Now we get 23019 logical reads, 47 physical reads, and 22960 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE rowCreatedDate >= '2009-07-01 05:00:00'
ORDER BY birdid desc

Query C

The this last query gives the optimizer extra information about using BirdId to do a BACKWARD scan to grab the maximum BirdId, and then use that to do a BACKWARD seek of the clustered index in nested loops to get the data. It does only 50 logical reads, 4 physical reads, and 817 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE birdId >=
	(SELECT MAX(birdId)
	FROM dbo.Birds
	WHERE rowCreatedDate <= '2009-07-01 05:00:00')
AND rowCreatedDate >= '2009-07-01 05:00:00'
ORDER BY birdId DESC

Be Careful Out There

The examples I’m using work because there is a correlation between the integer field and the date field. Not all tables may be like this. As with all queries, you need to be familiar with your data.

Consider Your Options– Even the Ones You Don’t Think Are Great

I’m quite sure BACKWARD index reads are covered in some talks and publications on tuning.  But I learned about this by considering multiple approaches, even those I didn’t think would work at first. It pays to try things out, and you can look a lot by looking carefully at execution plans (including the properties) and your Statistics IO output.

What this means to me: it’s good to keep an open mind.

Comments { 4 }

The 9th Day of SQL: Things Aren't as Simple as They Seem

The 12 days of SQL

Brent Ozar (blogtwitter) had an idea: a group of people should blog about writing which they’ve loved this year by people in the SQL community. For each “day of SQL,” someone picks a blog which they thought was great and writes about it.

Yesterday was Day 8, when Karen Lopez talked about a post by Louis Davidson and asked “What is your over/under?”  Karen is a great speaker, an inspiring writer, and just an incredibly interesting person. Check out her post!

On the 9th Day of SQL the engine gave to me: Something a little different than I expected.

Day 9: The Day of Paul White

This day of SQL is not about nine ladies dancing. (Sorry Karen!) Instead, it’s devoted to one New Zealander writing: his name is Paul White (blogtwitter).

First off, let me say that Paul White’s blog, “Page Free Space,” is just plain awesome. When I see Paul’s written a new post I know to allocate some time for it and read it through slowly, and that I should expect to have to think about what I’m reading to understand it.

I swear I can sometimes feel things moving around in my head when I read Paul’s posts. Apply the warning about overhead bins during flight: be careful, contents may shift while you’re reading Paul White’s blog.

So What’s My Favorite Post of the Year?

I picked Paul’s post, The Case of the Missing Shared Locks.

There’s a lot to love about this post. It is a great demonstration that things aren’t as simple as they seem.

Paul starts the post with the question:

If I hold an exclusive lock on a row, can another transaction running at the default read committed isolation level read it?

The answer to that would seem fairly straightforward. But in fact, things are pretty complicated. However, if you go through it slowly and really look at the examples, it can help you understand a lot about locking.

This is good.

Why is it Good that Things Are So Complicated? It’s CONFUSING.

Have you ever said something along these lines? “I’d like to give a presentation sometime, but I don’t have anything to talk about.”

Or, “I’m not sure that I have anything that interesting to fill a whole hour.”

Well, take a look at Paul’s post. He took something small, and he looked very closely at it. He played with it a couple of different ways, and he worked on it to see how it behaved. He stepped through it in a series of short, straightforward steps.

You can do the same thing with many things you’re familiar with. You can take a topic, or a feature, or a method of doing something and distill it into an interesting question. You can then look closely at the question and work with it carefully. Use it as a chance to explore something. You’re probably familiar with it, but by taking the time to write about it or present it, you’ll have the opportunity to get to know it better than you ever thought you could.

Who’s Next?

I’m handing the dreidl off to Crys Manson (blogtwitter) for Day 10.

Crys is a seriously great DBA, a fantastic friend, and she sometimes makes me snort liquid through my nose laughing.

Tag, Crys, you’re it!

How’d We Get Here?

If you want to check out where we’ve been so far, we’ve had:

A Little Present

You don’t need to be Jewish for this to be your favorite holiday song this year. Rock  on with the Maccabeats, y’all. (You will need to click the “watch on YouTube” link.)

[youtube=http://www.youtube.com/watch?v=qSJCSR4MuhU]

Comments { 13 }

SQLPASS Day 2- Optimization Timeouts and All about TLogs

SQLPass unfortunately can’t last forever, but happily it’s still going strong. Here’s some highlights from my Day #2.

Paul Randal Knows Exactly What’s Going on in Your Transaction Log…

A definite highlight of day 2 was attending Paul Randal‘s session on Logging and Recovery in SQL Server. I’ve read Pauls’ blog posts on this topic and attended his classes before, but even being familiar with the material I find I always learn something from his talks. You just can’t beat being strong on the basics!

I took a lot of notes in the session, this is my favorite excerpt from my notes:

  • SQL Server must reserve space in the TLOG so that it can roll back the active transactions, if needed.
  • Once a VLF no longer contains log records that are required, it can be cleared
  • This is done by a log backup in full or bulk_logged recovery models, or by checkpiont in simple
  • All that happens when a VLF is “cleared” is that it is marked as inactive
    • Nothing is cleared at that time
    • Nothing is truncated
    • Nothing is overwritten
    • The log file size does not change
    • The only thing that happens is that whole VLFs are marked inactive if possible (no active transactions)

Ben Nevarz asks, “How You Doing, Optimizer?”

One of my favorite pieces of information on day 2 was in Ben Nevarez‘s talk on how the query optimizer works. He mentioned this DMV, which I hadn’t used before yesterday:

Sys.dm_exec_query_optimizer_info Check me out!

The other useful bit of info is that the timeout flag is recorded in the xml for the sql plans, so plans which the optimizer finds so complicated that it times out on compilation can be queried from the cache!

SQLPASS homework assignment: Write and test this query, determine how to automate running it and collecting the information.

Sample Queries

This sample from BOL  to find excessive compiles/recompiles:

select *
from sys.dm_exec_query_optimizer_info
where counter = 'optimizations'
or counter = 'elapsed time'

See Also…

Ben Nevarez on the Phases of Query Optimization

Conor Cunningham’s Blog on sys.dm_exec_query_optimizer_info– excerpt:

“The other use of the DMV is to get a good statistical picture of a running system.  Say that I’m a DBA and I want to know how many queries in my application have hints or _need_ hints to work well.  Well, this will tell you.  Granted, it doesn’t separate recompiles from compiles, and if you have a system where plans are getting kicked out of the cache things may be a bit skewed, but I can tell you that this is far better than simply guessing.  Often the DB application developer doesn’t realize that they’ve built an application that requires a lot of hinting or a lot of compilations, and you can see this in more detail than you get with the performance counters”

Comments { 0 }

SQL PASS Day 1: To Free or Not To Free the Proc Cache?

Yesterday was day 1 of SQL PASS 2009. I am attending a variety of sessions on execution plans this year, and along the way I heard three very different opinions yesterday on managing the procedure cache in presentations.

Rule of Thumb: The “it depends” answer is usually right.

Opinion 1: Never Ever Clear the Proc Cache on a Production Server

This first opinion came in a good, solid presentation on using execution plans for troubleshooting. There were some good examples of when you want sql to look at the statistics and trigger generating a new plan, and when you don’t. (AKA when parameter sniffing is a good or a bad thing.) But the speaker was wholeheartedly against clearing the proc cache in production.

While I can definitely see this being true for some systems, I have definitely seen advantages of clearing the proc cache on others (more to come below), so I already knew this was too simple an answer for me– at least until I’ve solved the problems I have with out of date statistics on frequently modified large tables.

(Thanks to Grant Fritchey for a great presentation.)

Opinion 2: Be Free, Procedure Cache, be Free!

This second opinion came in a session on using DMVs to troubleshoot performance. This session was even geared toward OLTP systems, and the speaker said he regularly frees the procedure cache on his production sql servers at a given interval. He sees slight CPU pressure after doing so, but has the benefit of being able to capture and trend exactly what procedures go into the cache using the DMVs afterward (with the benefit of clean timestamps).

So in his environment, he has no issues clearing the proc cache.

(Thanks to Dr.DMV for a great talk!)

Opinion 3: It Depends: Check the Size of Your Proc Cache, Free if You Need To (and can handle the CPU for Recompilations)

The third speaker (Maciej Pilecki) talked about looking at the total size of the proc cache, and stressed that as this cache grows, it can steal space from the buffer pool. For each system, you should look at the size of the procedure cache and the amount of execution plan reuse you are getting on the system.

There are two main performance benefits to plan reuse (whether parameterized adhoc queries or procedure queries):

  • Speed: (recompiling takes time and CPU resources)
  • Smaller proc cache / More room for buffer pool to hold data in memory

Bonus: Maciej also mentioned how the ‘Optimize for Adhoc Workload’ option in sql 2008 can help alleviate bloat of the adhoc procedure cache. When enabled, this will only cache adhoc plans on their second run– for the first run sql will just store a small record that the query was executed once.

I really enjoyed these sessions, and one of the great things about PASS is the opportunity to hear and synthesize different perspectives on these topics.

Love it!

See also: Maciej Pilecki’s post on clearing only the adhoc part of the cache
Lara Rubbelke’s post on memory pressure and the proc cache

Comments { 0 }

Who's Using All that Space in TempDb, and What's their Plan?

Whatcha Doing in My TempDb???

This post contains a script that I adapted from the fantastic SQL Server Storage Engine Blog.

It comes in handy in my job all the time! Sometimes tempdb is filling up, but sometimes I just want to monitor the amount of tempdb and check out execution plans of heavy tempdb users while watching performance on a server. It just really comes in handy more frequently than I would have thought before I started using it.

Note: This script returns space used in tempdb only, regardless of the db context it’s run in, and it only works for tempdb.

Continue Reading →

Comments { 0 }

Finding Plans and Stats for Queries like '%something%'

I often need to find a query plan in the cache for a process that has run long overnight. Typically I’ll be able to figure out from our logging some of the tables involved in the query. Sometimes I will have most of the executing text but won’t know exactly what dates or reference points were included.

Even when I have enough information to get an estimated plan, it’s usually really helpful if I can pull the actual plan out of the cache along with runtime statistics.

The query below is what I use at this point to try to find these plans– I also sometimes use it just to look for long running queries in general.

One note to remember– the last_execution_time field is the time of the plan activities at the last execution. So if you’re looking for a query that ran for an hour, this time would show at the beginning of that execution. (The logging on my systems is done after a batch of activities complete, so I always have to do a bit of work to figure out approximately when the activity would have started and look around that time for the plan.)

Continue Reading →

Comments { 0 }

The DBA Sees Your Blocking Spids… A Bird's Eye Summary of Blocking

Maybe you’re a user in a reporting database running a long query in the read committed isolation level, merrily blocking a process which needs to load data.

Maybe you’re an application misbehaving in a transactional front end database, blocking other applications.

Or possibly you’re a middle tier application who is behaving well and trying to do a small update, but the table you’re updating holds a legacy trigger designed when it was a much smaller table, and that trigger is doing data validation with long running locks.

With there being a lot of users, a lot of applications, and in general a lot of blocking, it can be useful to get a bird’s eye view.

I manage a system where we frequently used to see lots of blocking related to triggers. We’d see one spid block another, and then that spid block a few more, and so on. We would also occasionally see a middle tier application open a connection and start doing work, then stop activity but leave a transaction open. In that case it could hold locks until the connection terminated, even though the spid was sleeping.

So it was extremely useful to be able to see quickly who the spids were at the root of the blocking trees. There would often be a very large number of connections open sp_who2 wasn’t the easiest method.

I also like to collect the execution plans for blockers, which makes it much easier to track down the cause.

Hence, this script…
Continue Reading →

Comments { 0 }