Archive | Performance RSS feed for this section

There’s more than one way to skin an eggplant: Using APPLY for calculations

Choices, choices

Here’s a little TSQL snack. I picked this up in a presentation by Itzik Ben-Gan at the PNWSQL user group recently, and it’s become a fast favorite.

CROSS APPLY and OUTER APPLY- another use

The APPLY operator is perhaps more flexible than  you think. You may already know that you can use it to inline a function, or to replace a join.

But wait, there’s more! You can also use APPLY to perform calculations and simplify your query syntax– this is because the APPLY operator allows you to  express a calculation that can be referred to:

  • in further joins (which may or may not use APPLY)
  • by columns
  • in the where clause
  • in the group by

This is really helpful, because you can’t refer to the results of a computation in one column from anywhere but the ORDER BY.  This is because of the order of evaluation of parts of the statement.

I know this sounds confusing. It’ll make more sense in an example.

A sample query– the ‘before’ version

Here is a query written for the AdventureWorks sample database. There’s all sorts of examples that are possible for this, but I decided to go with one grouping data by month, using my favorite formula to round dates.

It shows the total quantity of orders by Product for an entire order month, for orders placed on or after 2004-07-01.

SELECT  DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth,
        p.Name AS ProductName,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh.OrderDate >= '2004-07-01'
GROUP BY DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0),
        p.Name
ORDER BY OrderDateMonth,
        p.Name

Notice that to group the date at the month level, we need to include the calculation in the column in the column list, as well as in the group by clause.

The query rewritten using APPLY for the calculation

This can be rewritten with CROSS apply to move the calculation into the JOIN area and only specify it once.

The benefits: this will simplify your syntax and reduce the chance of typos and errors, particularly when you need to go in and change the calculation. In cases when you’re displaying a sum in one column and showing a percentage using it in another column, this trick is *fantastic*. (Query numbers from the DMVs a lot? you’ll love this.)

Here, the calculation on the date is moved into the cross apply. It can be referenced as oh1.OrderDateMonth in both the list of columns, and in the GROUP BY portion of the query without rewriting the calculation.

SELECT  oh1.OrderDateMonth,
        p.Name AS ProductName,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
CROSS APPLY ( SELECT    DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth ) AS oh1
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh.OrderDate >= '2004-07-01'
GROUP BY oh1.OrderDateMonth,
        p.Name
ORDER BY OrderDateMonth,
        p.Name

What does the execution plan look like?

Click for a larger image

The execution plan for these two queries are identical.

In this case, the optimizer looks at these two queries and realizes the activities it needs to do will be the same.

Other options

You can create further CROSS APPLY or OUTER APPLY joins that refer to computations in prior joins.

You can also refer to the resulting computation in the where clause.

But be careful….

As with anything, you want to make sure you’re getting a good execution plan, and not shooting yourself in the foot with a new trick.

One big area to watch: although you can refer to these computations conveniently in the WHERE clause, you still want to be careful you’re using appropriate criteria.

For instance, if we were to change the example above to refer to the result from the CROSS APPLY oh1 in the where clause like this:

SELECT  oh1.OrderDateMonth ,
        p.Name AS ProductName ,
        SUM(orderQty) AS totalQuantity
FROM    sales.SalesOrderHeader oh
CROSS APPLY ( SELECT    DATEADD(MM, DATEDIFF(MM, 0, oh.OrderDate), 0) AS OrderDateMonth ) AS oh1
JOIN    Sales.SalesOrderDetail od
        ON oh.SalesOrderID = od.SalesOrderID
JOIN    production.Product p
        ON od.ProductID = p.ProductID
WHERE   oh1.OrderDateMonth >= '2004-07-01'  ---Don't do this!
GROUP BY oh1.OrderDateMonth ,
        p.Name
ORDER BY OrderDateMonth ,
        p.Name

… then in this case the query would not be able to use an index on OrderDate on the sales.SalesOrderHeader table, if one exists.

This is not specifically because of the CROSS APPLY, but because we are forcing SQL Server to apply the functions to every value to identify if it satisfies the criteria. That prevents a seek.

Comments { 2 }

How To: Automate Continuous SQLServer Activity with Stored Procedures and Powershell Jobs

The Goal

It’s often useful to be able to run a bunch of stored procedures in the background over a period of time against a test instance.

This can be nice for:

  • Demos and presentations.
  • Populating DMVs with data you can slice and dice.
  • Learning to use things like extended events and server side trace (which are much more interesting with something to look at).
  • Testing a variety of automation scripts.

This post shows you how to create several stored procedures for AdventureWorks2008R2 which will provide different result sets and have slightly different run times when run with a variety of parameters– in this case, individual letters of the alphabet.

You can then run PowerShell commands which start jobs in the background. Each job runs a stored procedure and loops through all letters of the alphabet, providing each one as a parameter. You can set the job to do that loop a configurable amount of times (the commands are set to 100). In other words, as given, each stored procedure will be run 2600 times. Since you’re running multiple jobs and they’re all going asynchronously in their own threads, you’ll have a variety of commands trying to run at the same time.

Optional: you can start the PowerShell jobs under different credentials if you need.

Alternatives: In the past, I’ve typically done things like this with T-SQL loops (often with dynamic SQL) and multiple Management Studio windows. This works OK, but it’s a little time consuming to open each window, paste everything in (or open multiple files), and start them all up. I find it much more convenient now to use scripts.

Step 1: Create Stored Procedures with a single alphabet-based parameter

Let’s get one thing clear: these procedures aren’t designed to run optimally, and they aren’t coded nicely.

You’ll notice these procedures have all sorts of problems. And that’s by design– my goals are for testing things around these, so it’s really a little better for me if they don’t play perfectly nice.

In other words, these sure ain’t for production. :)

/****************
Jump in the kiddie pool
********************/
USE AdventureWorks2008R2;
go

/****************
CREATE THE SCHEMA
********************/
IF SCHEMA_ID(N'test')  IS NULL
	EXEC sp_executesql N'CREATE SCHEMA test AUTHORIZATION dbo'
GO

/****************
CREATE Silly Stored Procedures in the Schema
********************/
IF OBJECT_ID(N'test.EmployeeByLastName', 'P') IS NULL
	EXEC sp_executesql N'CREATE PROCEDURE test.EmployeeByLastName as return 0'
GO
ALTER PROCEDURE test.EmployeeByLastName
	@lName nvarchar(255)
AS
	SELECT @lName = N'%' + @lName + N'%'

	select *
	FROM HumanResources.vEmployee
	WHERE LastName LIKE @lName
GO

IF OBJECT_ID(N'test.EmployeeByFirstName', 'P') IS NULL
	EXEC sp_executesql N'CREATE PROCEDURE test.EmployeeByFirstName as return 0'
GO
ALTER PROCEDURE test.EmployeeByFirstName
	@fName nvarchar(255)
AS
	SELECT @fName = '%' + @fName + '%'

	select *
	FROM HumanResources.vEmployee
	WHERE FirstName LIKE @fName
GO

IF OBJECT_ID(N'test.EmployeeDepartmentHistoryByLastName', 'P') IS NULL
	EXEC sp_executesql N'CREATE PROCEDURE test.EmployeeDepartmentHistoryByLastName as return 0'
GO
ALTER PROCEDURE test.EmployeeDepartmentHistoryByLastName
	@lName nvarchar(255)
AS
	SELECT @lName = N'%' + @lName + N'%'

	select *
	FROM HumanResources.vEmployeeDepartmentHistory
	WHERE LastName LIKE @lName
GO

IF OBJECT_ID(N'test.EmployeeDepartmentHistoryByFirstName', 'P') IS NULL
	EXEC sp_executesql N'CREATE PROCEDURE test.EmployeeDepartmentHistoryByFirstName as return 0'
GO
ALTER PROCEDURE test.EmployeeDepartmentHistoryByFirstName
	@fName nvarchar(255)
AS
	SELECT @fName = '%' + @fName + '%'

	select *
	FROM HumanResources.vEmployeeDepartmentHistory
	WHERE FirstName LIKE @fName
GO

IF OBJECT_ID(N'test.ProductAndDescriptionByKeyword', 'P') IS NULL
	EXEC sp_executesql N'CREATE PROCEDURE test.ProductAndDescriptionByKeyword as return 0'
GO
ALTER PROCEDURE test.ProductAndDescriptionByKeyword
	@keyword nvarchar(255)
AS
	SELECT @keyword = '%' + @keyword + '%'

	select *
	FROM Production.vProductAndDescription
	WHERE Name LIKE @keyword OR ProductModel like @keyword OR description LIKE @keyword
GO

Once you’ve got the procedures written, you just need to set up your PowerShell commands.

Step 2: Create PowerShell Jobs to Run the Procedures in Loops

These commands use PowerShell background jobs.

Even if you don’t know PowerShell, if you look at these commands you can pretty easily pick out where the 1 to 100 loop is, where the a to z loop is, and what commands are being run.

Since the jobs are running to create load in the background and I don’t care about collecting query results, I pipe the output all to Out-Null.

#test.EmployeeByLastName
Start-Job -ScriptBlock {Import-Module sqlps; foreach($_ in 1..100) {foreach ($_ in [char]"a"..[char]"z") {Invoke-Sqlcmd -Query "exec test.EmployeeByLastName '$([char]$_)'" -ServerInstance "YOURMACHINE\YOURINSTANCE" -Database AdventureWorks2008R2 | Out-Null }}}

#"test.EmployeeByFirstName"
Start-Job -ScriptBlock {Import-Module sqlps; foreach($_ in 1..100) {foreach ($_ in [char]"a"..[char]"z") {Invoke-Sqlcmd -Query "exec test.EmployeeByFirstName '$([char]$_)'" -ServerInstance "YOURMACHINE\YOURINSTANCE" -Database AdventureWorks2008R2 | Out-Null }}}

#"test.EmployeeDepartmentHistoryByFirstName"
Start-Job -ScriptBlock {Import-Module sqlps; foreach($_ in 1..100) {foreach ($_ in [char]"a"..[char]"z") {Invoke-Sqlcmd -Query "exec test.EmployeeDepartmentHistoryByFirstName '$([char]$_)'" -ServerInstance "YOURMACHINE\YOURINSTANCE" -Database AdventureWorks2008R2 | Out-Null }}}

#"test.EmployeeDepartmentHistoryByLastName"
Start-Job -ScriptBlock {Import-Module sqlps; foreach($_ in 1..100) {foreach ($_ in [char]"a"..[char]"z") {Invoke-Sqlcmd -Query "exec test.EmployeeDepartmentHistoryByLastName '$([char]$_)'" -ServerInstance "YOURMACHINE\YOURINSTANCE" -Database AdventureWorks2008R2 | Out-Null }}}

#"test.ProductAndDescriptionByKeyword"
Start-Job -ScriptBlock {Import-Module sqlps; foreach($_ in 1..100) {foreach ($_ in [char]"a"..[char]"z") {Invoke-Sqlcmd -Query "exec test.ProductAndDescriptionByKeyword '$([char]$_)'" -ServerInstance "YOURMACHINE\YOURINSTANCE" -Database AdventureWorks2008R2 | Out-Null }}}

Each command will start an asynchronous background job.

Step 3: Manage Jobs (if needed)

Once the jobs are running in the background, you may want to check on their status. You can do so by running:

get-job

if you want to remove a job from the list, you can use Remove-Job with the job number, or you can remove all jobs (whether or not they are running) with:

Remove-Job * -Force

If you want to see the output of a job, you can use Receive-Job– supply the jobnumber. If you’re troubleshooting and want to see errors, you probably want to remove | Out-Null from the command that starts the job, and use a fewer number of loops. Then you can can receive the job’s output and see any errors.

Receive-Job JOBNUMBER
Comments { 6 }

The Magic of the Self-Assigned Lab (SQLSkills Immersion Training Days 2 and 3)

The duck has obtained scuba gear-- it can go to new depths on its own.

Today, more blog from SQLSkills Immersion Training on Internals and Performance in Dallas, TX. For more about the training, see my prior post.

Where We’ve Been

We’ve moved through most of Module 8. We’ve been through the land of the transaction log, locking and blocking, isolation and versioning, table design and partitioning, and many neighborhoods of index internals.

A Sample Case

I love learning about how the transaction log works. I know a bit about how the log operates, and as a DBA I keep this knowledge pretty fresh. So I was looking forward to yesterday’s session on transaction logging quite a bit.

This week, I’ve learned a lot about the internals of how the log works with crash recovery, as well as how log buffers work in general. (And I mean a lot. Pages and pages of notes.)

Looking over my notes, what makes me happy is how much I was able to note down and absorb much more than I have been in trainings previously. This is attributable to three things:

  • A lot of time devoted to the topic;
  • Plenty of room for questions (and there were lots!) with lots of rich data in the answer;
  • A good context for the information.

I’m really proud that I know enough now that I’m able to understand and note this level of detail. That’s a great feeling.

The magic of the Self Assigned Lab

When I’m in a great session or a good conversation and I learn something that works differently than I thought, or something very interesting, I made a note to myself with a “To Do”. These are basically self-assigned lab assignments: I’ve learned from blogging and presentations that I learn a ton by setting up my own scenarios and working to show that something works (and look at how it works), or the opposite.

I already have enough self-driven labs to keep me learning, and blogging about the best parts, for several months. Here are a few:

  1. ToDo: find an automation opportunity with DBCC PAGE WITH TABLERESULTS
  2. ToDo: create a scenario where you can’t get rid of a secondary filegroup without unusual operations.
  3. ToDo: Look at/experiment with transaction savepoints

There are lots more good ones, but I’m hoarding quite a few of them. (I have 41 at the time of this writing.)

That’s the magic of great training– not only do you pick up a lot and you receive pre-designed labs you can learn from, but you also find paths you hadn’t imagined to explore and create tools on your own. And you’re inspired to go there.

Comments { 7 }

Dirty Pages and Statistics IO

Warning: The DROPCLEANBUFFERS command referenced in this post is appropriate for test systems only and impacts the entire SQL Instance. If you are new to SQL Server, please use this commands with care, and be careful to read the linked Books Online documentation. Happy testing!

You were hoping for a picture, right?

The other day I was running some test queries and looking at the number of reads, and I noticed something funny.

I was dropping clean buffers prior to running a query, but I would sometimes see that there had been no physical reads.

No physical reads? Where was the data coming from?

I was working on a small number of rows, but it still bothered me.

The output looked like this:

The Set-Up

Here’s a simple simulation of what I was doing. First, create a database and insert some values.

SET NOCOUNT ON;
SET STATISTICS IO OFF;
create database dirtyBuffers
GO
USE dirtyBuffers
GO
--Create a table and insert some values
create table dbo.testme (
	i int identity,
	j char(2000) default 'baroo'
)
GO
insert dbo.testme default values
GO 20

Then, turn on Statistics IO so we can see read information. Drop clean buffers, so data isn’t in memory. Then run a query.

SET STATISTICS IO ON;

DBCC DROPCLEANBUFFERS
GO

--Select some rows
select * from dbo.testme

It should read it from disk, right?

 

What I Forgot

I was forgetting about dirty pages. In  order to get a “cold cache”, you need to first run a CHECKPOINT command to flush dirty pages to disk, then run DBCC DROPCLEANBUFFERS to remove everything from the bufferpool. This is very well documented in Books Online.

This was easy to forget because typically I test execution of queries against a restored copy of a production database, or a dataset which isn’t changing.

What I Hadn’t Realized

I don’t think I ever specifically realized that dirty pages could be immediately re-used for query results– but it makes perfect sense. I had only thought about clean pages, which were read in for one query, to be available for re-use.

I felt a little silly when I realized this. Shouldn’t I have known this? But after thinking about it I realized: there’s little gaps like this in most everyone’s knowledge. Sometimes it takes a little bit of extra experience to notice the gap and fill it in. It happens to us all.

After rerunning the commands and including a CHECKPOINT with DBCC DROPCLEANBUFFERS, I see the expected output– a physical read.

Comments { 3 }

Read from the Right End of the Index: BACKWARD Scans

Optimizing queries is the most fun when you don’t need to add indexes. There’s nothing quite so nice as finding a way to make reading data faster, without slowing down writes or creating new data structures that need to be maintained.

Here’s one way you can use BACKWARD scans to do this.

The Scenario: Clustered index on an increasing integer, and you’d like recently created rows

This is a common enough situation: you have a table with a clustered index on an integer value which increases with each row. You have another column which records the date the row was created.

You’d like frequently query the most recently created rows over some period of time.

The table has very frequent inserts, so for performance reasons you want to use the minimal indexes required. (And in general, this is the best practice.)

Question: Do you need to add a nonclustered index on the column containing the date the row was created?

Answer: Maybe not!

Getting the right clustered index scan

Say we’re working with the following table, which we have filled with five million rows of Tweetie birds. (Note: This generation technique is a tally table population technique which I found on Stack Overflow, which is attributed to Itzik Ben-Gan.)

CREATE TABLE dbo.Birds (
    birdId INT NOT NULL ,
    birdName NVARCHAR(256) NOT NULL,
    rowCreatedDate DATETIME2(0) NOT NULL )
GO	

--Insert 5 million Tweetie birds
--Make them as if they were all created a minute apart.
;WITH
  Pass0 as (select 1 as C union all select 1),
  Pass1 as (select 1 as C from Pass0 as A, Pass0 as B),
  Pass2 as (select 1 as C from Pass1 as A, Pass1 as B),
  Pass3 as (select 1 as C from Pass2 as A, Pass2 as B),
  Pass4 as (select 1 as C from Pass3 as A, Pass3 as B),
  Pass5 as (select 1 as C from Pass4 as A, Pass4 as B),
  Tally as (select row_number() over(order by C) as Number from Pass5)
INSERT dbo.Birds (birdId, birdName, rowCreatedDate)
SELECT Number AS birdId ,
    'Tweetie' AS birdName ,
    DATEADD(mi, number, '2000-01-01')
FROM Tally
WHERE Number <= 5000000

--Cluster on BirdId. We won't add any other indexes.
CREATE UNIQUE CLUSTERED INDEX cxBirdsBirdId ON dbo.Birds(BirdId)

Say we would just like to see the maximum value in the rowCreatedDate column.

The most basic way to get this row is with this query:

SELECT MAX(rowCreatedDate)
FROM dbo.Birds

However, that leads to a table scan. We get lots of reads: 22,975 logical reads and 201 physical reads.

If we know we have a strong association between the BirdId column and the RowCreatedDate column, and that the highest ID in the table is the most recent row, we can rewrite the query like this:

SELECT MAX(rowCreatedDate)
FROM dbo.Birds
WHERE birdId = (SELECT MAX(birdId) FROM dbo.Birds)

This query still does a clustered index scan. But yet it does only 3 logical reads and 2 physical reads.

Looking in the execution plan, our query was able to use the extra information we provided it to scan the index backwards. It stopped when it had everything it needed, which was after a short distance– after all, it only needed recent rows, and those are all at one end of the table.

This backwards scan can be very useful, and can make using the MAX aggregate very useful.

But you usually need more than just the max value…

To see a bit more about how you extend this logic, compare these three queries:

Query A

This makes you think you need that non-clustered index: it does 22,975 logical reads, 305 physical reads, and 22968 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE rowCreatedDate >= '2009-07-01 05:00:00'

Query B

We can introduce the backwards scan by adding an ORDER BY BIrdId DESC to the query. Now we get 23019 logical reads, 47 physical reads, and 22960 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE rowCreatedDate >= '2009-07-01 05:00:00'
ORDER BY birdid desc

Query C

The this last query gives the optimizer extra information about using BirdId to do a BACKWARD scan to grab the maximum BirdId, and then use that to do a BACKWARD seek of the clustered index in nested loops to get the data. It does only 50 logical reads, 4 physical reads, and 817 read-ahead reads.

--Only run against a test server, not good for production
DBCC DROPCLEANBUFFERS

SELECT birdId, birdName, rowCreatedDate
FROM dbo.Birds
WHERE birdId >=
	(SELECT MAX(birdId)
	FROM dbo.Birds
	WHERE rowCreatedDate <= '2009-07-01 05:00:00')
AND rowCreatedDate >= '2009-07-01 05:00:00'
ORDER BY birdId DESC

Be Careful Out There

The examples I’m using work because there is a correlation between the integer field and the date field. Not all tables may be like this. As with all queries, you need to be familiar with your data.

Consider Your Options– Even the Ones You Don’t Think Are Great

I’m quite sure BACKWARD index reads are covered in some talks and publications on tuning.  But I learned about this by considering multiple approaches, even those I didn’t think would work at first. It pays to try things out, and you can look a lot by looking carefully at execution plans (including the properties) and your Statistics IO output.

What this means to me: it’s good to keep an open mind.

Comments { 4 }

Date Rounding Tactics and the Tiny Devil of SMALLDATETIME

Tiny Devils

With every new year I think a little bit about time and dates. This posts looks a little more at that in TSQL.

Rounding Dates: Which way is best?

Sometimes in TSQL you need to round a datetime value to the precision of either a day, hour, minute, or second.

I realized recently that I have a few ways I know how to do this, but I wasn’t sure which was the most efficient.

I did a little searching and didn’t find anything super conclusive. I had a little chat with Jeremiah Peschka (blog | twitter) and he told me which way he thought was fastest and why.

And so I decided to run some tests. Jeremiah has a way of being right about these things, but I had to see for myself.

I’ll go ahead and tell you: He was totally right, and I’ll show you why. But I learned a couple things along the way.

Reference: Types and Storage Sizes

To get started, let’s review some of our friends, the SQL Server datatypes. Hi friends!

Rounding to the Day

The most frequent case in which I need to round dates is to the day level. So instead of ’1/4/2011 6:15:03.393921′, I want just ’1/4/2011′.

SQL 2008′s date type made this a lot easier for everyone– now we can just cast a datetime or datetime2 value as a date, and we’ve got what we need. PLUS, our new value is nice and small, weighing in at 3 bytes.

I think most everyone agrees, we like this!

SELECT CAST('1/1/2010 23:59:59.000' AS DATE) AS [I'm a date!]

Rounding to Hours, Minutes, or Seconds:
Beware the tiny devil of SMALLDATETIME

This is still a bit more complicated. When you start thinking about these and different datatypes, you need to make sure you understand what you mean by rounding.

In SQL Server, our datatypes actually have some different opinions about what rounding means. Check this out:

SELECT
CAST('1/1/2010 23:59:59.000' AS DATETIME) AS [I'm a DATETIME!],
CAST('1/1/2010 23:59:59.000' AS DATETIME2(0))  AS [I'm a DATETIME2(0)!'],
CAST('1/1/2010 23:59:59.000' AS SMALLDATETIME) AS [I'm a SMALLDATETIME, and I'm very confused.],
CAST('1/1/2010 23:59:59.000' AS DATE) AS [I'm a DATE!]

This returns:

The SMALLDATETIME value rounds this up to January 2nd, instead of January 1. The Date datatype does not.

In considering whether or not to use SMALLDATETIME, you need to learn and establish whether or not to round up for minutes and date values. With a different example, if something occurred at 12:30:31 AM, would that be represented as having  happened in the 12:30 minute, or at 12:31?

Most of us actually want to round down. We want the largest minute number which is less than or equal to the datetime value. This is similar to what FLOOR does for integers. You could also call this truncating the portion of the datetime value you don’t want.  This is not, however, what SMALLDATETIME gives you, so use it with care.

So this is what I’m saying:

Like, seriously, SMALLDATETIME: you are SO messed up.

Comparing Methods of Rounding Dates

So given that warning, let’s actually round some date values, and let’s compare the efficiency of each method.

To start out with, let’s create a table and toss in a bunch of date values. We’ll run queries against these dates and measure SQL Server’s abilities to work with it.

To make up a bunch of datetime data, I’m using my trusty recursive CTE from my prior post.

--Populate a table with some data
CREATE TABLE dbo.Donuts ( DonutTime DATETIME2(7) )

DECLARE
@startDate DATETIME2(7)= '2010-12-01 00:00:00' ,
@endDate DATETIME2(7)= '2010-12-11 01:30:00' ;

WITH    MyCTE
AS ( SELECT
@startDate AS [Makin' the Donuts]
UNION ALL
SELECT
DATEADD(ms, 1225, [Makin' the Donuts])
FROM
MyCTE
WHERE
[Makin' the Donuts] < @endDate )
INSERT  dbo.Donuts
SELECT
[Makin' the Donuts]
FROM
MyCTE
OPTION
( MAXRECURSION 0 ) ;

SELECT @@ROWCOUNT
--We now have 709716 rows of DonutTime

Now let’s look at different methods to manipulate datevalues. For our examples I’ll be rounding to the minute.

Contestant 1 -
DATEPART: isolate each part of the date, then concatenate

As we learn TSQL, this is the first method that occurs to us. We know DATEPART will return part of a date (great name!), so we can chop apart the bits. However, to get them back together properly we have to turn each part into a string to clue them back together. And then if we want to treat it like a date (which we pretty much always do), we have to cast it back.

Just look at this baby. It’s pretty ugly.

SELECT
CAST(CAST(DATEPART(YY, DonutTime) AS CHAR(4)) + '-' + CAST(DATEPART(MM, DonutTime) AS NVARCHAR(2)) + '-'
+ CAST(DATEPART(DD, DonutTime) AS NVARCHAR(2)) + '  ' + CAST(DATEPART(hh, DonutTime) AS NVARCHAR(2)) + ':'
+ CAST(DATEPART(mi, DonutTime) AS NVARCHAR(2)) + ':00.000' AS DATETIME2(0)) AS [Wow, that was a lot of typing.]
FROM
dbo.Donuts

Running this (after cleaning out buffers), I got these results:

Contestant 2 -
Subtracting what you don’t want

There’s a couple of variations on contestant #2. I’ll take the one I like best, which is casting to a smaller byte size by using DATETIME2(0), which is 6 bytes rather than 8 and effectively truncates to the second. Then I’ll subtract the seconds.

SELECT
DATEADD(ss, -DATEPART(ss, DonutTime), CAST (DonutTime AS DATETIME2(0)))
FROM
dbo.Donuts

Running this one (yes, I cleaned out the buffers), I got these results:

Well now, that’s much lower CPU time there.

NB: I did test, and in all my trials it was lower CPU time to cast into DATETIME2 rather than using a nested DATEADD function to subtract milliseconds.

Contestant 3-
Convert to a shorter character string, then back to date

This contestant is near and dear to my heart. I like it because it’s easy for me to remember. You take a short trip into CHAR() with the 121 date format and set the length to chop off the parts of the date you don’t want. Then you cast or convert back to a DATETIME2(0).

I think I like this one because it feels just a little bit violent. But not in a bad way. It’s like roller derby.

SELECT
CAST(CONVERT(CHAR(16), DonutTime, 121) AS DATETIME2(0))
FROM
dbo.Donuts

Oh, sad. This one didn’t do very well. It’s definitely better than Contestant #1, at least.

Contestant 4-
Use DATEADD to calculate the minutes since a given date, then add them back

Here’s the method Jeremiah suggested to me. The way he described it was “Just figure out the number of minutes since the beginning of time, and use that.”

Being a philosophy major, I of course asked “So, when was the beginning of time?”

Being a developer, he answered, “Just call it zero.”

SELECT
DATEADD(mi, DATEDIFF(mi, 0, CAST(DonutTime AS DATETIME2(0))), 0)
FROM
dbo.Donuts

Here are the results (clean buffers, as usual):

Ooo, check out the CPU time on that one.

Note: I ran a few trials and this is faster on the CPU when you cast as DATETIME2(0) before doing your maths. I did that to make all things equal with the other contestants, who had the same benefit.

Who Won, and Why

Here’s a recap of how everyone performed:

Why did contestants 2 and 4 do so well?

Jeremiah pointed out that datetime values are stored internally as two four byte integers. (BOL reference: see “Remarks”) Performing mathematic functions on an integer value is a nice fast activity on the CPU.

Performing conversions back and forth to character based datatypes, however, is not so natural, nor so fast.

What’s the internal storage format of DateTime2?  Well, I’m not sure about that one. BOL isn’t so up-front about these things anymore. If you happen to know, please tell me in the comments. I can tell, however, that it’s something that enjoys mathematics.

Comments { 7 }

The 9th Day of SQL: Things Aren't as Simple as They Seem

The 12 days of SQL

Brent Ozar (blogtwitter) had an idea: a group of people should blog about writing which they’ve loved this year by people in the SQL community. For each “day of SQL,” someone picks a blog which they thought was great and writes about it.

Yesterday was Day 8, when Karen Lopez talked about a post by Louis Davidson and asked “What is your over/under?”  Karen is a great speaker, an inspiring writer, and just an incredibly interesting person. Check out her post!

On the 9th Day of SQL the engine gave to me: Something a little different than I expected.

Day 9: The Day of Paul White

This day of SQL is not about nine ladies dancing. (Sorry Karen!) Instead, it’s devoted to one New Zealander writing: his name is Paul White (blogtwitter).

First off, let me say that Paul White’s blog, “Page Free Space,” is just plain awesome. When I see Paul’s written a new post I know to allocate some time for it and read it through slowly, and that I should expect to have to think about what I’m reading to understand it.

I swear I can sometimes feel things moving around in my head when I read Paul’s posts. Apply the warning about overhead bins during flight: be careful, contents may shift while you’re reading Paul White’s blog.

So What’s My Favorite Post of the Year?

I picked Paul’s post, The Case of the Missing Shared Locks.

There’s a lot to love about this post. It is a great demonstration that things aren’t as simple as they seem.

Paul starts the post with the question:

If I hold an exclusive lock on a row, can another transaction running at the default read committed isolation level read it?

The answer to that would seem fairly straightforward. But in fact, things are pretty complicated. However, if you go through it slowly and really look at the examples, it can help you understand a lot about locking.

This is good.

Why is it Good that Things Are So Complicated? It’s CONFUSING.

Have you ever said something along these lines? “I’d like to give a presentation sometime, but I don’t have anything to talk about.”

Or, “I’m not sure that I have anything that interesting to fill a whole hour.”

Well, take a look at Paul’s post. He took something small, and he looked very closely at it. He played with it a couple of different ways, and he worked on it to see how it behaved. He stepped through it in a series of short, straightforward steps.

You can do the same thing with many things you’re familiar with. You can take a topic, or a feature, or a method of doing something and distill it into an interesting question. You can then look closely at the question and work with it carefully. Use it as a chance to explore something. You’re probably familiar with it, but by taking the time to write about it or present it, you’ll have the opportunity to get to know it better than you ever thought you could.

Who’s Next?

I’m handing the dreidl off to Crys Manson (blogtwitter) for Day 10.

Crys is a seriously great DBA, a fantastic friend, and she sometimes makes me snort liquid through my nose laughing.

Tag, Crys, you’re it!

How’d We Get Here?

If you want to check out where we’ve been so far, we’ve had:

A Little Present

You don’t need to be Jewish for this to be your favorite holiday song this year. Rock  on with the Maccabeats, y’all. (You will need to click the “watch on YouTube” link.)

[youtube=http://www.youtube.com/watch?v=qSJCSR4MuhU]

Comments { 13 }
Doing many things at once

Review: A day of doing many things at once with @AdamMachanic

Parallel Puppy OperationsA day of doing many things

At SQLPass this year I was fortunate to attend “A day of doing many things at once: Multitasking, Parallelism, and Process distribution” by Adam Machanic (blog | twitter). This was a day long post-conference.

So, how was it?

This was a fantastic seminar. There was a really good flow to the talk, which started in CPU background and architecture, then moved through Windows Internals, SQL Server internals, and on to specifics of parallelism in queries. Then we finally moved on to administration topics, as well as different methods of process distribution. A full outline of the day is here.

I think the presentation worked very well because of the balance of theory and practice. Essentially, there was a very good ratio between ‘what, ‘why’, and ‘how’.

I’ll look back at the outline for this seminar when designing longer presentations myself.

Did I learn anything useful?

Yes! The information on plan shapes and tricks to manipulate them was incredibly interesting, and is something I know will be useful. I also learned some interesting specifics about how the DAC works, and have a much more holistic view of how SQL Server uses processors and parallelism. Check out my tweets below for a little more insight into what my day was like!

Free webcasts. Yep, free.

Adam has some webcasts on parallelism available for download which you can watch for free.

My tweetstream from the session…

Here’s what my day was like, according to Twitter.

  • Postcon fun with @AdamMachanic today! #sqlpass Processes do not run, *threads* do.
  • Quick discussion of fiber mode for SQL Server: very limiting (http://bit.ly/bn6RoK)
  • Thread starvation: pre-emption by high priority threads can prevent some threads from ever running.
  • Threads running on client OS get a smaller amount of quantum units than on a server os (more frequent interrupt frequency)
  • Three types of processor affinity: none, hard affinity, and ideal affinity
  • Lots of love for sysinternals (http://bit.ly/WPxha) and theCPU-Z tool (w/ props to @BrentO for recommending http://bit.ly/1iBcg6)
  • Interrupt counts include not just when a quantum expires, but also when a thread finishes.
  • Lots of cool WMI queries being run from inside SSMS
  • Mine is still getting even better :) RT @whimsql: Amen Tom! RT @SQLRockstar Best. Summit. Ever.
  • Meeting the SQLOS! It’s a “cooperative” scheduling system: everyone’s equal
  • SQLOS provides an abstraction layer so storage engine, qp, etc can all talk to it instead of directly to the OS
  • Proc Affinity at sql server level may be worth testing w/ multi instances. WIth virtualization taking predominance, is less common.
  • Differences between resource waits and signal waits being explained
  • 484 Wait types in SQL Server 2008– plug for #sqlhelp hash tag for those with limited documentation.
  • I totally just got called on in a “what feature uses a hidden scheduler” Pop Quiz. #FAIL
  • @PaulRandal yep, we were all “so THAAAAAAAT’S how that works.”
  • Don’t think of operators in QPs as being parallelized. Think more of each set of rows as being prallelized.
  • Very few iterators are actually parallel-aware. Most do not need to be, even if being used by parallel streams.
  • OH: “I trust myself, but I don’t know if you should.” <– always an appropriate comment when referring to production environment
  • And now we return to our discussion of the “Big O” and the Query Processor.
  • We just covered tempdb spills and @crysmanson ‘s old enemy, the resource_semaphore wait type.
  • Few outer rows demo showing repartitioning scheme and rows redistributed on threads– very cool
  • Verrrrrrry interesting stuff with CROSS APPLY and parallelism
  • Cost threshold for parallelism default is still what it was set originally in 7.5, for many contemporary systems it may be too low.
  • And that makes me happy to hear since we do raise the default cost threshold for parallelism on our prod servers :)
  • @AdamMachanic just actually turned it up to 11.
  • If you hit THREADPOOL waits, don’t just up the max worker threads permanently, find the root cause for the situation.
  • Finishing up with a monitoring parallelism section — really nice flow to the talk today!
  • Piles o’ DMV fun, including the reason sys.dm_exec_requests has some funkiness: it shows wait state only for the root task
  • @AdamMachanic is demoing how sp_whoisactive will display your wait types, find your tempdb contention, and wash your dishes.
  • Demo of manipulating memory grants to cause a query to spill to tempdb purposefully… we’re not in kansas anymore.
  • @TheSQLGuru I’ve enjoyed it a ton– great combo of really interesting demos and information.


Comments { 0 }