Diagnose blocking, configure alerts, and troubleshoot blocking live with sp_WhoIsActive (13 minutes)

Part of "Troubleshooting Blocking and Deadlocks for Beginners (2 hours 10 minutes)"

When blocking may be a problem, you need easy, built-in tools

This video shows you how to…

  1. Quickly diagnose if an instance may have a blocking problem
  2. Create a lightweight blocking alert
  3. Troubleshoot a live locking problem when the alert fires

We’ll put those skills to the test as we take a look at the ‘repeat blocker’ pattern in a demo environment.

To run the demos in this lesson…

Practice configuring the SQL Server Agent to use Database Mail (optional)

Microsoft has some pretty good documentation on this process:

If you already know how to do this (you’ve done it many times in production), you don’t have to do this on your test instance. But without it, your alert won’t be able to send you an email when it fires.

Transcript

All right, let’s get into it.

Now, I’ve got Management Studio open and I have run the setup script for this demo

A couple things you should know about the setup script

There’s a lot of notes in it that you can read. The setup script runs a lot of stuff and wreaks havoc on the instance that it’s on.

So, this should only ever be run in a dedicated test environment. It doesn’t have to be anything fancy. It can just be your laptop, but you don’t want it to be anything that anyone else uses, because it’s gonna do things like restore databases, and change around security, and start up some jobs that it created to create blocking because, hey, we’re doing blocking demos. We need to wreak some havoc.

So, dedicated, private little test environment only.

Luckily, developer edition for SQL Server 2016 is totally free, thank you Microsoft.

I have run the setup script and we have created some problems!

We are now gonna go into the first file for problem one.

Our SQL Server is periodically freezing up

Let’s say that I wasn’t sure if my instance had a blocking problem.

I might wanna just do a quick and dirty check and say is blocking an issue on this instance, and SQL Server makes it really easy to do that, using a dynamic management view called sys.dm_os_wait_stats.

sys.dm_os_wait_stats

This dynamic management view will show you what has the SQL Server been waiting on since it started up and we can ask it with the script that we have here has SQL Server been waiting on lock waits? Do we have waits that begin with LCK?

If we haven’t had any of those, then it’s really unlikely that we have a blocking problem on our instance.

So let’s check that out ‘cause like I said I have created a little havoc in our SQL Server, and sure enough, whoa, we do have lock waits on this SQL Server.

Now you’ll notice that a lot of these have wait minutes of zero. Haven’t filtered those out. It’s important to look at that and say have I actually done any noticeable amount of waiting? Just because I’ve had one wait if it didn’t take any time at all that’s not something I should worry about.

Clearly though, I have had some waits that took quite a lot of time. My instance has only been online for 150 minutes but 139 of those minutes have been waiting for a shared lock and the average wait time there is six minutes. So, I do have a significant amount of waiting time, especially considering how long it’s been up. You know, it’s just a test instance, right?

But I need to learn more.

So, when I’m looking at an instance and I just wanna know hey is this an issue, sys.dm_os_wait_stats is great for saying hey, when you wait, what do you wait on, and how long do you wait, and that’s critical to know before devoting too much time to something.

But it’s clear we need some work here, right?

I want to know when that blocking happens: the easiest way to do this with SQL Server is to use a built in performance alert

We can create an alert in the SQL Server Agent that looks at a performance counter. We’re gonna look at the counter for process is blocked and say, hey, SQL Server Agent, can you periodically sample this counter for me, and when you see that there’s a problem, you know, one or more processes blocked, fire the alert.

I don’t want it to fire constantly, or I don’t want it to respond constantly. The SQL Server Agent actually won’t look at this every microsecond. It looks at this every 20 to 30 seconds. It doesn’t guarantee that it’s gonna, you know, look at it immediately, but after it fires and notifies me, I don’t want it to notify me every 20 seconds and sort of stress test my email server or anything. So I wanna set a delay in responses and say okay after you notify me, wait before you notify me again, and I picked 300 seconds for that.

Let’s create our performance alert. You can do it with the GUI as well. I just think this code is nice and easy, and this just created the alert.

I did some setup in the background to make notifications work

We didn’t create a response yet. I need to add a notification to the alert for it to actually know who to talk to, and for the notification to work I have to have done a few things on my instance already, and I have to have set up Database Mail, and configured the SQL Server Agent to use Database Mail, then I have to have configured operators in the SQL Server Agent. That’s a list of who to notify and how to do it. I have already set up those things and there’s instructions and books online which are really good that you can follow if you haven’t set those up before.

Adding a notification

I’m just gonna add a notification to my super simple blocking alert. It’s gonna notify the SW demo operator and the setup script sets that up.

All right, let’s take a look at our alert and see if it fires. Here under the SQL Server Agent I’ve got super simple blocking alert. I’m gonna click on the properties, bring up the GUI for that, and I can see that just like I asked for it is a performance condition type alert here at the top, and it looks at the general statistics, object counter, processes blocked, and alerts me if it’s one or higher, anything above zero.

On the response tab it’s gonna notify that operator that was my second T SQL command and it’s gonna send an email.

If you’d like to get a text message on your phone…

Most cellular providers have an email address that if it sends to that email address it will text your phone. So just check with your cell provider if you want to get an SMS text alert.

Is it working?

In the options tab I can see that delay between responses. I set it in seconds in the T SQL. It’s listed in minutes in the GUI, and then let’s go into the history. We can see that yes, my alert is firing. It has fired and it has responded.

So, hey, that means that I have blocking on my instance right now. We should take a look!

I’m gonna cancel out of here, ‘cause I just opened the GUI to look at the properties of the alert, and let’s see if we can troubleshoot that blocking that’s going on right now.

My favorite tool to troubleshoot live blocking in SQL Server, and frankly to troubleshoot a lot of things in SQL Server, is a free stored procedure called sp_WhoIsActive

This is written by Adam Machanic who’s created this stored procedure to say what’s running in my SQL Server? How long has it been running? What is waiting on? Is it blocking or being blocked by someone else? He wrote this procedure years ago and has been updating it faithfully ever since. It’s always been free and it’s a fantastic resource.

You can download it at whoisactive.com.

You can install it into the master database or any tools database that you have on the SQL Server.

I’m gonna run sp_WhoIsActive and say hey, who is running on my SQL Server? Is there blocking happening right now?

There are two sessions running on my SQL Server

The column on the very left gives me super valuable information right out of the gate. How long have they been running? Whoa, they’ve both been running for 10 minutes. I can see the SPIDs or session IDs.

I can see the SQL text that they’re running. I can click on it to get that in a new window. I can see the login names they’re running under, and are they waiting on anything.

This session here, the select statement, says hey, I have that shared wait that I’m waiting on, the same one that we saw when we queried sys.dm_os_wait_stats and I’ve been waiting on this for quite a while.

Let’s go ahead and click on this query who’s waiting, session 62. It is running a select query from the Sales.Customers table with a group by and wow huh it’s waiting to get a lock on Sales.Customers, all right.

Let’s keep going to the right and take a look at all the information sp_WhoIsActive gives us. We can see under blocking session ID that it is blocked by session 57 who that is the top statement there.

Let’s click on the T SQL and see what the top statement is doing. The top statement explicitly opened a transaction and then started running an update query against the Sales.Customers table. So, it is changing bill to customer ID for rows where bill to customer ID is set to a specific value and it is blocking our other query.

Scrolling on over we can see that that session who is blocking, it is not blocked by someone else, ‘cause we could have a chain of, you know, A blocked B blocks C, D, E, and F, but in this case, the blocker isn’t blocked by anyone.

We can see at this point in time how many reads, how many writes it’s done, what its status is, how many transactions it has, and all sorts of information about it.

So, okay, we have figured out that we do have blocking going on but what is going on with it?

Your challenge: why do you think that this blocking is taking so long?

Why is that second session just stuck?

Because let’s refresh sp_WhoIsActive. It is not going anywhere, and notice as I refresh this that the wait info time keeps going up.

Sys.dm_os_wait_stats vs. sys.dm_os_waiting_tasks

One thing to know about why sp_WhoIsActive can see this wait time going up is that it’s looking at a dynamic management view called sys.dm os waiting tasks. This is a view that says who’s waiting on stuff right this instant. That first DM view we looked at, sys.dm_os_wait_stats, it only shows us waits that are complete right now, waits that happened in the past.

So if we go back into our first query up there and say sys.dm_os_wait_stats, can I see that there’s blocking happening right now in my SQL Server, and I refresh this right now. Notice that wait minutes is stuck at 139.39. It is NOT going to update.

Sys.dm_os_wait_stats is not gonna update with the information from our session who’s currently blocked right now until it becomes unblocked, one way or another, right?

Sys.dm_os_wait_stats, this guy, it shows waits that have completed.

If we want to see active waits, waits that are happening right now, we need to look at sys.dm_os_waiting_tasks. A really convenient way to do that is with sp_WhoIsActive.

All right, back to your challenge

Your challenge before the next video is to:

  1. Quickly map out and say: do you have a guess as to why this first query is stuck, based on what you saw in the video?
  2. What is your big picture strategy for tackling this and trying to keep this from happening again?

Now this isn’t a super easy problem to solve. When I first did it as a DBA it wasn’t something I solved right away. I did need to have a good strategy to solve it, and actually I had to adjust my strategy, ‘cause I didn’t pick the right one.

So go big picture on your strategy and just note down: here’s why I think it’s stuck, and here’s the strategy I would take to find the root cause.

I’ll see you in the next video for the solution.