Archive | Troubleshooting RSS feed for this section

My site got hacked. And it was kinda fun.

Programmpower.ru said: FU

I recently got a direct message on twitter from the fantastic Karen Lopez (b|t), who let me know my blog was redirecting to another site — a site called ‘programmpower.ru’.

I don’t know about you, but there’s something strange and just icky about that URL.

Karen is super savvy, so she let me know right off that she was only seeing this when using Firefox when she didn’t specify the www. So while ‘http://littlekendra.com’ was up to no good in Firefox, ‘http://www.littlekendra.com’ was standing strong, and other browsers weren’t having the issue.

Where do you start?

It’s been a while since I’ve worked with stuff like this, but it was a fun challenge.

The first thing I wanted to verify was whether or not people were getting to my site at all when the issue occurred. Were they actually getting to http://www.littlekendra.com and then being redirected afterward? Or were they not even getting there?

This was probably my first question because I know from experience that it’s very easy to answer.

I decided to crack open Fiddler, my favorite old http debugging proxy. “Http debugging proxy” may sound pretty fancy, but this is really just a simple, friendly little tool you can open up which will show you everything your internet browser is hitting.

Then I realized that Fiddler only runs on Windows. I was sad, but this is clearly the very definition of a first world problem (nerd edition). A little searching and I found a Firefox add-in called TamperData that’s quick, free, easy, and did the trick on my Mac.

TamperData said: you’re getting to www.littlekendra.com before you get redirected off to the land of evil.

Oh my. I’ve been hacked.

Maybe I’m getting older and wiser. Maybe once you break something enough times you just stop getting upset. (My description of someone who’s an MCM is “someone who’s gotten into trouble, and then had to fix it. A lot.”)  But I didn’t really mind.

These things happen— if you have a website long enough, it’ll probably get hacked. And these things are usually pretty easy to fix with a little searching.

Is it JavaScript?

If the issue was happening after reaching my site and only with one browser, I thought perhaps there was some JavaScript being executed that was only impacting Firefox. Well, that’s easy to check too– just disable JavaScript in Firefox and see if you can reproduce the issue. Thanks, Rob Farley!  (b|t)

I did that, and the issue still happened.

I guess it’s time to ask the internet…

Sure enough, I did a little more searching and I found that I was very likely the victim of an .htaccess hack.

This is a very powerful little file that you can do all sorts of tricks with. And people can play tricks on you with it. The most common hack is to add a bunch of spaces after what looks like the end of the file, and then to put in a bunch of code that redirects traffic— down where you’ll never think to look. Oh, how devious.

But how did it get there?

Before I fixed the issue fully, I wanted to try to make it difficult for it to happen again. It wasn’t a super-emergency (there aren’t THAT MANY of you reading this live– I know you love your RSS feeds), so I spent a little time looking at fixing the root cause before the issue itself.  Normally I’d do the opposite order in a production environment, but it’s my own website so I do what I want!

I checked with my theme provider, and sure enough there’s a recent fix for a vulnerability that can allow this kind of hack to happen– the Woo Themes Timthumb bugfix. In no time at all I got my site updated by following the guide to update the Woo Framework and my theme.

How do you confirm if it’s a .htaccess hack?

You need to take a look at your .htaccess file. You can’t look at this through the WordPress UI in a web browser, you need to either log onto the host (if you’re running your own), or just connect via the magic of FTP.

I fired up my trusty FireFTP add-on, and got to the difficult task of remembering my user name and password for FTP access. This was easily the hardest part of the fix.

After you connect with FTP, save yourself a little frustration by enabling your FTP program to show hidden files. in FireFTP this is under Tools/Options on the General tab.

Then head on down to the root of your website. This can be configured in different ways, but it may be at web/content. Just cruise on around until you find a .htaccess file, which will probably appear to be grayed out because it’s a hidden file.

FTP that file down to a place on your local system and open it to give it a look. Don’t forget that hacks of this file normally put in a bunch of white space, making it look like the file is normal— scroll down to the end of the file. If this is your issue, you should see a bunch of redirects down there.

How’d I fix it?

First, I saved a copy of the file under a different name. Any time you make a change like this, you want to keep a copy if you can just in case things go from bad to worse.

Then I cleared out all the evil redirecty code and saved the file on my local machine.

I then overwrote the file in my root directory with the modified copy.

Presto-chango— after clearing the cache in Firefox, I could no longer reproduce the issue. The whole thing took less time than it took to write this blog post. And strangely, it really was kinda fun!

Don’t forget to back that up

Now’s a good time to export all your WordPress content (Tools -> Export), dontcha think? I think I’ll go change some passwords for fun, too.

Update

It looks like I missed a step! Here’s a handy WordPress.org FAQ: My Site Was Hacked. There’s a sequence you should follow in your cleanup:

  • Change  your passwords;
  • Regenerate Wordpress keys and update them in your wp-config.php file (details on how to do this in the FAQ linked above);
  • Then change your passwords again.
I’m not sure that this has resolved my problem completely. I’m considering planning a time to reinstall cleanly. Since I have everything exported, I have the luxury of waiting a bit to see.
Comments { 1 }

Dynamic Management Quickie: Exploring SQL Server’s system views and functions as you work

Note: If you like this post, here’s one you’re going to like even better! Check out DMV/DMF Info Just A Couple Clicks Away by Brad Schultz.

There’s a lot of dynamic management and system objects to keep track of in SQL Server.

We all sometimes have the moment when we can’t remember exactly which DMV, DMF or other system view/function returns a particular column, or if something even IS accessible from the system objects.

When this happens, remember that it’s easy to query system object and column names. Sys.system_columns and sys.system_objects are here to help.

Exploring the system views and functions yourself will also help you find new things.

This example shows all the system views and functions which are likely to have to do with CPU:

SELECT
SCHEMA_NAME(SCHEMA_ID)+ '.' + o.Name AS SysObjName,
o.type_desc,
c.name AS ColumnName
FROM sys.system_columns c
INNER JOIN sys.system_objects o ON o.object_id=c.object_id
WHERE c.name LIKE '%cpu%'

I like to use this version of the query, which includes the URL to look up more about the DMV. I like to use the browser in SSMS itself to look these up, so I include the shortcut for that in the header.

SELECT
	SCHEMA_NAME(SCHEMA_ID)+ '.' + o.Name AS SysObjName,
	o.type_desc,
	c.name AS ColumnName,
	'http://social.msdn.microsoft.com/Search/en-US/?Refinement=117&Query=' + SCHEMA_NAME(SCHEMA_ID)+ '.' + o.Name AS [Help! Ctrl + ALT + R to open web browser in SSMS]
FROM sys.system_columns c
INNER JOIN sys.system_objects o ON o.object_id=c.object_id
WHERE c.name LIKE '%page%'

The output looks like this (click for a larger view):

I’ve started using this lately since it helps me explore as well as remember.

Just when you think you know everything about the system objects, you’ll find something new.

Fun example: look at all the columns like ‘%page%’.

Comments { 8 }

Be Prepared: Collecting Data from sp_WhoIsActive in a Table

I am a fangirl of Adam Machanic‘s sp_whoisactive stored procedure.

Be prepared. When things go wrong, you want to focus on the problem itself. Make a plan for how you

One of the great things about sp_whoisactive is that it’s very quick to generate a table to collect the data you need, then run the procedure in a loop with a wait statement to regularly collect the data you want over an interval. This usage is documented (see Adam’s post here), but I’ve chatted with several people who either didn’t know you could do this, or didn’t have a script saved.

What’s sp_whoisactive?

Sp_whoisactive is a procedure written by Adam Machanic. It can be found here: sp_whoisactive downloads

What to Do Today

Do your future self a favor: Today, set yourself up the following:

  • Create the sp_whoisactive stored procedure in a user database you use only for administration
  • Play with the script I provide below and look at the  parameters of the stored procedure
  • Save a script which creates a table for your preferred version of sp_whoisactive, then runs in a loop to store the data.

Why Do This?

Even if you’re already running monitoring tools, using sp_whoisactive to sample data from the DMVs (including query plans) can be invaluable.

When things go wrong, terribly wrong, you want to have this script in your back pocket to collect data about activity on your SQL instance in the background while you’re focusing on the problem. This will help you resolve the issue as quickly as possible, by referring periodically to the data it’s collecting. Since the data is stored, you can also use it afterward to explain the situation and prevent it from happening again.

Step 1: Create Your @destination_table

Let me just note again: this is not my original work. I don’t want to take credit for it, but I do want to advocate for it!

First, create your destination table. Sp_whoIsActive lets you generate a destination table easily. Here is a basic version which includes the current date in the table name, as well as three flags to get transaction info (tlog write info and duration), query plans, and block leaders:

DECLARE @destination_table VARCHAR(4000) ;
SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;

DECLARE @schema VARCHAR(4000) ;
EXEC sp_WhoIsActive
@get_transaction_info = 1,
@get_plans = 1,
@find_block_leaders = 1,
@RETURN_SCHEMA = 1,
@SCHEMA = @schema OUTPUT ;

SET @schema = REPLACE(@schema, '<table_name>', @destination_table) ;

PRINT @schema
EXEC(@schema) ;

Step 2: Create Your Loop to Periodically Log Data

You need to need to make sure to specify the same parameters for sp_whoisactive in this loop, so the output matches the schema for the table you’ve created.

This does 10 runs with a 5 second wait between runs. Depending on what’s going on, I change those numbers accordingly.

DECLARE
    @destination_table VARCHAR(4000) ,
    @msg NVARCHAR(1000) ;

SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;

DECLARE @numberOfRuns INT ;
SET @numberOfRuns = 10 ;

WHILE @numberOfRuns > 0
    BEGIN;
        EXEC dbo.sp_WhoIsActive @get_transaction_info = 1, @get_plans = 1,
            @find_block_leaders = 1, @DESTINATION_TABLE = @destination_table ;

        SET @numberOfRuns = @numberOfRuns - 1 ;

        IF @numberOfRuns > 0
            BEGIN
                SET @msg = CONVERT(CHAR(19), GETDATE(), 121) + ': ' +
                 'Logged info. Waiting...'
                RAISERROR(@msg,0,0) WITH nowait ;

                WAITFOR DELAY '00:00:05'
            END
        ELSE
            BEGIN
                SET @msg = CONVERT(CHAR(19), GETDATE(), 121) + ': ' + 'Done.'
                RAISERROR(@msg,0,0) WITH nowait ;
            END

    END ;
GO

Step 3: Set Up Your Query to Look at the Results

This is optional, but since the table is named by datestamp, I like to keep a query on hand to look at the results without having to copy and paste the table name. I also like to put the collection_time column first and sort by it.

DECLARE @destination_table NVARCHAR(2000), @dSQL NVARCHAR(4000) ;
SET @destination_table = 'WhoIsActive_' + CONVERT(VARCHAR, GETDATE(), 112) ;
SET @dSQL = N'SELECT collection_time, * FROM dbo.' +
 QUOTENAME(@destination_table) + N' order by 1 desc' ;
print @dSQL
EXEC sp_executesql @dSQL
Comments { 16 }

Corrupting Databases for Dummies- Hex Editor Edition

Corruption is so ugly it gets a lolworm instead of a lolcat.

Let’s make one thing clear from the start:

This Post Tells You How To Corrupt a SQL Server Database with a Hex Editor in Gruesome Detail

And that’s all this post tells you. Not how to fix anything, just how to break it.

If you aren’t familiar with corruption, corruption is bad. It is no fun at all on any data, or any server, that you care about.

Where You (Possibly) Want To Do This

You only want to do this on a test database, in a land far far away from your customers, for the purpose of practicing dealing with corruption.

When things go badly, you want to  be  prepared. This post gives you the tools in a simple, step by step fashion, to create different types of corruption so that you can practice resolving them.

Big Disclaimer: Do not run this in production. Or anywhere near production, or anything important. Ever. Only use this at home, in a dark room, alone, when not connected to your workplace, or anything you’ve ever cared about. If you corrupt the wrong pages in a user database, you may not be able to bring it back online. If you corrupt a system database, you may be reinstalling SQL Server.

References, and Thanks to Paul Randal

Everything I’m doing here I learned from Paul Randal’s blog posts. It just took me a little bit to understand how to use the hex editor and make sure I was doing it properly, so I thought I’d put down the steps I used here in detail. If you’d like to go straight to the source:

First, Get Your Hex Editor

Download XVI32 by Christian Maas. No installer is necessary: download the zip file, then unzip all files to a directory and run XVI32.exe

Create a Database to Corrupt

For our adventure, our database is named CorruptMe. We’ll create a single table, insert some data, and create a clustered index and nonclustered index on it.

(Note: Data generation technique found on Stack Overflow, attributed to Itzik Ben-Gan.)

USE master;
IF db_id('CorruptMe') IS NOT NULL
BEGIN
	ALTER DATABASE CorruptMe SET SINGLE_USER WITH ROLLBACK IMMEDIATE
	DROP DATABASE CorruptMe
END	

CREATE DATABASE CorruptMe;
GO

--Make sure we're using CHECKSUM as our page verify option
--I'll talk about other settings in a later post.
ALTER DATABASE CorruptMe SET PAGE_VERIFY CHECKSUM;

USE CorruptMe;

--Insert some dead birdies
CREATE TABLE dbo.DeadBirdies (
    birdId INT NOT NULL ,
    birdName NVARCHAR(256) NOT NULL,
    rowCreatedDate DATETIME2(0) NOT NULL )

;WITH
  Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1),
  Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),
  Pass2 AS (SELECT 1 AS C FROM Pass1 AS A, Pass1 AS B),
  Pass3 AS (SELECT 1 AS C FROM Pass2 AS A, Pass2 AS B),
  Pass4 AS (SELECT 1 AS C FROM Pass3 AS A, Pass3 AS B),
  Pass5 AS (SELECT 1 AS C FROM Pass4 AS A, Pass4 AS B),
  Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS NUMBER FROM Pass5)
INSERT dbo.DeadBirdies (birdId, birdName, rowCreatedDate)
SELECT NUMBER AS birdId ,
    'Tweetie' AS birdName ,
    DATEADD(mi, NUMBER, '2000-01-01')
FROM Tally
WHERE NUMBER <= 500000

--Cluster on BirdId.
CREATE UNIQUE CLUSTERED INDEX cxBirdsBirdId ON dbo.DeadBirdies(BirdId)
--Create a nonclustered index on BirdName
CREATE NONCLUSTERED INDEX ncBirds ON dbo.DeadBirdies(BirdName)
GO

Now we can take a look at the pages our table and nonclustered index got created on. I wanted to specifically corrupt a page in the nonclustered index on the DeadBirdies table. Of course if you wanted the clustered index, you could use index Id 1.

DBCC IND ('CorruptMe', 'DeadBirdies', 2)

I want to pick a data page for this nonclustered index, so I pick a PagePID where PageType=2. (The reference I use for DBCC IND is here.)

I pick PagePID 2784.

Note: If you’re following along, you may get a different PagePID if you use a different default fill factor.

Optional: Check out the page with DBCC PAGE

If you’d like to take a look at the page you’re about to corrupt, you can do so with the following command.

--Turn on a trace flag to have the output of DBCC PAGE return in management studio
--Otherwise it goes to the error log
DBCC TRACEON (3604);
GO
DBCC PAGE('CorruptMe', 1,2784,3);

Set the database offline

You must take your victim database offline to render it fully helpless accessible to your hex editor.

USE master;
ALTER DATABASE CorruptMe SET OFFLINE;

Also, get the name of your physical data file which you’ll open in your hex editor. Copy this to your clipboard.

SELECT physical_name FROM sys.master_files WHERE name='CorruptMe';

Figure out the starting offset of the page you want to corrupt. You do this simply by multiplying the page ID (PagePid) by 8192 (the number of bytes on a page).

SELECT 2784*8192 AS [My Offset]

It’s the Moment We’ve Been Waiting For: Trash That Page

Fire up your hex editor: run XVI32.exe.

Depending on your operating system, you may want to run this with elevated privileges / right click and “run as administrator”.

Open the database file by using File ? Open, and then the data file name you copied to the clipboard. (If you didn’t set the database offline, you’ll get an error that it’s in use. If you got an error that you don’t have permissions to view the file, make sure you do have permissions and that you ran XVI32.exe with elevated privileges.)

Go to the page you want to corrupt by using Address ? GoTo (or Ctrl + G), then paste in your Offset Value. You want to search for this as a decimal.

XVI43.exe will take to right to the beginning of that page.

You can see the ASCII representation of the data in the right pane. For our example, you should be able to see the word ‘Tweetie’ represented.

I like to put the cursor  in the right pane at the beginning of the word ‘Tweetie’. XVI32.exe will automatically move the cursor in the left pane, to the appropriate location.

You can corrupt the data  by editing in the right pane or left pane.

For my example, I am replacing the ASCII ‘T’ in the first occurrence of the word ‘Tweetie’ with an ‘S’. You can edit more, but a little tiny corruption goes a long way.

Save the file, and you’re done!

Admire Your Own Corruption

First, bring your database back online. If you correctly edited pages in the data, this should work just fine.

Note: If you corrupted critical system tables early in the database, this may not work! If so, go back to the steps above to identify a good page offset.

ALTER DATABASE CorruptMe SET ONLINE;

You can see the corruption in a couple of different ways. If you have checksums enabled on the database, you can see the corruption by reading the page with the data on it.

Since I corrupted a page in a nonclustered index in my example, I need to make sure I use that index. So I can see it with this query:

Use CorruptMe;
SELECT birdName FROM dbo.deadBirdies;

That returns this big scary error, which confirms I did indeed corrupt page 2784:

Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xb633a8e1; actual: 0xaeb39361). It occurred during a read of page (1:2784) in database ID 18 at offset 0x000000015c0000 in file ‘D:\BlahBlahBlah\CorruptMe.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

You can also see this by running a CHECKDB or CHECKTABLE command.

DBCC CHECKDB('CorruptMe')

An excerpt from its output:

Msg 8928, Level 16, State 1, Line 1
Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data): Page (1:2784) could not be processed.  See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data), page (1:2784). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.

Now Sit Back and Laugh Maniacally. And Then Fix It.

So, the whole point of this was probably to test something.

So take a moment to enjoy the fact that FOR ONCE you don’t have to panic when you see these errors, because it’s all part of your master plan.

Then go out and fix the corruption, and run your tests.

Comments { 17 }

SQLPASS Day 2- Optimization Timeouts and All about TLogs

SQLPass unfortunately can’t last forever, but happily it’s still going strong. Here’s some highlights from my Day #2.

Paul Randal Knows Exactly What’s Going on in Your Transaction Log…

A definite highlight of day 2 was attending Paul Randal‘s session on Logging and Recovery in SQL Server. I’ve read Pauls’ blog posts on this topic and attended his classes before, but even being familiar with the material I find I always learn something from his talks. You just can’t beat being strong on the basics!

I took a lot of notes in the session, this is my favorite excerpt from my notes:

  • SQL Server must reserve space in the TLOG so that it can roll back the active transactions, if needed.
  • Once a VLF no longer contains log records that are required, it can be cleared
  • This is done by a log backup in full or bulk_logged recovery models, or by checkpiont in simple
  • All that happens when a VLF is “cleared” is that it is marked as inactive
    • Nothing is cleared at that time
    • Nothing is truncated
    • Nothing is overwritten
    • The log file size does not change
    • The only thing that happens is that whole VLFs are marked inactive if possible (no active transactions)

Ben Nevarz asks, “How You Doing, Optimizer?”

One of my favorite pieces of information on day 2 was in Ben Nevarez‘s talk on how the query optimizer works. He mentioned this DMV, which I hadn’t used before yesterday:

Sys.dm_exec_query_optimizer_info Check me out!

The other useful bit of info is that the timeout flag is recorded in the xml for the sql plans, so plans which the optimizer finds so complicated that it times out on compilation can be queried from the cache!

SQLPASS homework assignment: Write and test this query, determine how to automate running it and collecting the information.

Sample Queries

This sample from BOL  to find excessive compiles/recompiles:

select *
from sys.dm_exec_query_optimizer_info
where counter = 'optimizations'
or counter = 'elapsed time'

See Also…

Ben Nevarez on the Phases of Query Optimization

Conor Cunningham’s Blog on sys.dm_exec_query_optimizer_info– excerpt:

“The other use of the DMV is to get a good statistical picture of a running system.  Say that I’m a DBA and I want to know how many queries in my application have hints or _need_ hints to work well.  Well, this will tell you.  Granted, it doesn’t separate recompiles from compiles, and if you have a system where plans are getting kicked out of the cache things may be a bit skewed, but I can tell you that this is far better than simply guessing.  Often the DB application developer doesn’t realize that they’ve built an application that requires a lot of hinting or a lot of compilations, and you can see this in more detail than you get with the performance counters”

Comments { 0 }

The Case of the Undroppable Database

This was funny….

Once Upon A Time there was an Orphan Database

I needed to drop a formerly-logshipped database on our warm standby server. When attempting to drop it, I found that it failed because it was a logshipped database from a replication publisher. Hmm.

The database was still in standby/read only mode, so I couldn’t run any system sprocs in the database itselt to clean up replication. Uh oh.

My colleague Gina suggested I just run a restore command with recovery to bring it online so I could run those sproc and then strip out replication. Sounds great!

But that didn’t work, because the file it needed to bring it online had been deleted from the file system at some point. Someone must have figured it was OK since this database hadn’t been actively logshipped in a while. Oh boy.

At this point, Think Very Carefully About What You Don’t Want To Do

Now, remember that this is a warm standby server, and is not customer facing. So I considered stopping sql, renaming the files, starting sql, and then seeing if I could drop the database. However, it seemed that this was not likely to work anyway, because there are those pesky records in the master database with metadata, showing that this is a replicated database. And it also just seems like a messy, klugy way to go about the issue.

So Why Not Just Edit the Metadata in Master?

I thought that some updates to that metadata were still possible in the post-sql 9 world, and looked into DAC connections and single user mode. I am very happy to say that situations almost NEVER come up when I have to even consider how to udpate metadata in master. Even on a non-customer facing database, it’s a bad idea.

But even if you want to do it for this, it’s not going to work, since most of the tables were moved into the resource database with SQL 2005 and can’t be updated. Sorry!

A Happy Ending

My colleague Robin very quickly came up with the right solution, which works perfectly. It’s fast, easy, harmless, and requires no updating of system tables or even DAC connections.

• Create a new empty db named x
• Back up db x
• Restore over your read only/standby database with the backup of X (using the with replace option)

VOILA, your database is online and empty, and no longer marked as replicated in the metadata. It’s now totally droppable!

Overall, this was a really fun, if weird little problem to work through.

Comments { 0 }

Troubleshooting 1.00.002: Never forget the Windows Event Log!

This evening during some maintenance I was reminded of one very important rule: when looking into any issue on a windows server, never forget to check the Windows Event Log.

In the SQL Server upgrade log this was just expressed as “Service ‘MSSQL$OURSERVICENAME’ could not be started. Verify that you have sufficient privileges to start system services. The error code is (1814) The specified resource name cannot be found in the image file.”

I was running a SQL Server upgrade and sifting through the sql server log files, and it took me a few minutes to remember to look in the Windows log, where the error was expressed much more clearly: SQL Server was attempting to create tempdb on the D drive (not its usual location) in a directory that didn’t exist. The problem ended up being that the default database locations on the sql instance were still set to be on a disk that was not part of the cluster (and therefore not set properly as a dependency). CU4 apparently needed to create some files temporarily in this location when starting up SQL Services in the background as part of its installation.

So no matter how humble or weird the issue, I always have to remind myself: check the Windows Event log! Always! It’s one of the basics for a reason.

Update: the specific SP3 CU4 upgrade error was blogged by the CSS team here.

Comments { 0 }

Troubleshooting 1.00.001: sp_who2 and the SQL Server Log

I thought I would do a series of posts thinking through general troubleshooting steps I use for a wide variety of issues– everything from slow running performance, system crashes, or application failures where the source hasn’t been identified.

Typically, the first thing I do in almost any situation I can think of is verify if I can connect to the SQL Server instance and verify if I can run sp_who2 and view the contents of the SQL Server Log.

This is actually three separate tests, but you get quite a lot of information about the situation very quickly:

  1. Can I connect to the instance over TCP/IP?  And does it work with the default timeout, or do I have to extend the value to be able to connect? Do I need to specify the port?
  2. If I can connect, does sp_who2 return results? If so, I do a quick glance through it just to get a quick overview of what’s active at the moment.
  3. Can I open the SQL Server Log, and if I can, are there any obvious recent events that are unusual, such as deadlocks, timeouts related to accessing disk, or other errors?

So my first pass is to hit the sql server interface in these three ways.

Following this, I tend to either look more into SQL Server, or into OS level information, depending on the situation. More on that to come.

Comments { 0 }