Archive | August, 2009

Who's Using All that Space in TempDb, and What's their Plan?

Whatcha Doing in My TempDb???

This post contains a script that I adapted from the fantastic SQL Server Storage Engine Blog.

It comes in handy in my job all the time! Sometimes tempdb is filling up, but sometimes I just want to monitor the amount of tempdb and check out execution plans of heavy tempdb users while watching performance on a server. It just really comes in handy more frequently than I would have thought before I started using it.

Note: This script returns space used in tempdb only, regardless of the db context it’s run in, and it only works for tempdb.

Continue Reading →

Comments { 0 }

Troubleshooting 1.00.002: Never forget the Windows Event Log!

This evening during some maintenance I was reminded of one very important rule: when looking into any issue on a windows server, never forget to check the Windows Event Log.

In the SQL Server upgrade log this was just expressed as “Service ‘MSSQL$OURSERVICENAME’ could not be started. Verify that you have sufficient privileges to start system services. The error code is (1814) The specified resource name cannot be found in the image file.”

I was running a SQL Server upgrade and sifting through the sql server log files, and it took me a few minutes to remember to look in the Windows log, where the error was expressed much more clearly: SQL Server was attempting to create tempdb on the D drive (not its usual location) in a directory that didn’t exist. The problem ended up being that the default database locations on the sql instance were still set to be on a disk that was not part of the cluster (and therefore not set properly as a dependency). CU4 apparently needed to create some files temporarily in this location when starting up SQL Services in the background as part of its installation.

So no matter how humble or weird the issue, I always have to remind myself: check the Windows Event log! Always! It’s one of the basics for a reason.

Update: the specific SP3 CU4 upgrade error was blogged by the CSS team here.

Comments { 0 }

Using Last Backup Date to Make Sure You get a Full when You need one, and a Differential when you Don't

Today I was glancing at once of my servers and noticed the backup job was running later than normal. I haven’t been working with this server for long, so I glanced to check where the backup was writing to and checked the output directory. I found that a differential backup was being written, and that the differential backup from the day before was much larger than normal.

Having quick backup times on weekdays on this instance and also saving space on the backup server where possible are both priorities, so the backup job for this server was set to do a full backup on Saturday, and a differential backup every other day of the week.

However, there was a production incident on this instance on Saturday, and the backup job was canceled by the person on call. I didn’t receive a monitoring ticket (since it was a cancellation it didn’t throw an alert), and the person on call must not have checked the backup schedule or thought to mention it.

So today I thought about it a bit, and then recalled that there’s a way to check the last backup time. I wanted to use this so I could have the job do a check as to when the last successful full backup time was, and if it was long enough, to do a full backup no matter what. On this particular database, after more than a week the differentials get a bit out of hand and we need to get in a full– ideally the job would be smart enough to realize this on Sunday if the run on Saturday has an issue.

I’ve even written a script that checked last backup time before! But I thought, “Oh, this must be easy, I’ll just check the usual places for the last full backup date.”

And so I did. Since it shows on the database properties in management studio, it seemed like it must be someplace obvious. Here are some places that last backup time is not listed:

  • sys.databases
  • the databasepropertyfunction
  • the databasepropertyex function

Where is it? Well I finally dug it out of my existing script, and my memory. It’s in msdb, in the backupset table. And once I looked at the existing script, I was glad I did. I think I may brush it up and use it pretty soon– it does a check for last successfull backup time across multiple servers. That can be pretty handy to audit, particularly when you’re taking on responsibility for new servers and want to do a quick high level inventory. (Of course, for a quick check you could also write a query to loop through databases on an instance and use the multi-server execution feature in SSMS 10.)

So here’s the query I’m using to find the last full backup, complete with some pseudo-code explaining a bit about how it could be used…

Continue Reading →

Comments { 0 }

What Articles are being Replicated, and How Big are They?

I needed to quickly report on what articles are being replicated from a database yesterday, and it was helpful to include the size of the articles, number of rows, and the subscriber information.

The system tables make this pretty quick work– just run the following against your publisher and you’ll be able to document this easily.

select
	publicationName= sp.name
	, tableType=case p.index_id when 1 then 'Clustered Index' else 'Heap' end
	, p.index_id
	, articleName=ar.name
	, sub.srvname
	, ar.dest_table
	, totalSizeGB = cast(sum(au.total_pages)*8./1024./1024. as numeric(10,1))
	, numRows = (select top 1 [rows] from sys.partitions p2 where p2.object_id=o.object_id and p2.index_id in (0,1))
	, createDate= o.create_date
	, modifyDate= o.modify_date
	, droppedGB = cast(sum(case au.[type] when 0 then au.total_pages else 0 end)*8./1024./1024. as numeric(10,1))
	, inRowGB = cast(sum(case au.[type] when 1 then au.total_pages else 0 end)*8./1024./1024. as numeric(10,1))
	, LOBGB = cast(sum(case au.[type] when 2 then au.total_pages else 0 end)*8./1024./1024. as numeric (10,1))
	, rowOverflowGB = cast(sum(case au.[type] when 3 then au.total_pages else 0 end)*8./1024./1024. as numeric(10,1))
from sysarticles ar
left join syspublications sp on sp.pubid=ar.pubid
left join syssubscriptions sub on
	ar.artid=sub.artid
	and dest_db !='virtual'
left join sys.objects o on ar.objid= o.object_id
left join sys.partitions AS p (NOLOCK) on
	p.object_id = o.object_id
	and p.index_id in (1,0)
left join sys.allocation_units AS au (NOLOCK) on
		au.container_id = p.partition_id
		and p.index_id in (1,0)
group by
	sp.name
	, ar.name
	, sub.srvname
	, ar.dest_table
	, o.name
	, o.object_id
	, o.create_date
	, o.modify_date
	, p.index_id
order by
	1
Comments { 0 }

Troubleshooting 1.00.001: sp_who2 and the SQL Server Log

I thought I would do a series of posts thinking through general troubleshooting steps I use for a wide variety of issues– everything from slow running performance, system crashes, or application failures where the source hasn’t been identified.

Typically, the first thing I do in almost any situation I can think of is verify if I can connect to the SQL Server instance and verify if I can run sp_who2 and view the contents of the SQL Server Log.

This is actually three separate tests, but you get quite a lot of information about the situation very quickly:

  1. Can I connect to the instance over TCP/IP?  And does it work with the default timeout, or do I have to extend the value to be able to connect? Do I need to specify the port?
  2. If I can connect, does sp_who2 return results? If so, I do a quick glance through it just to get a quick overview of what’s active at the moment.
  3. Can I open the SQL Server Log, and if I can, are there any obvious recent events that are unusual, such as deadlocks, timeouts related to accessing disk, or other errors?

So my first pass is to hit the sql server interface in these three ways.

Following this, I tend to either look more into SQL Server, or into OS level information, depending on the situation. More on that to come.

Comments { 0 }