Tag Archives: Corruption

More Running with Scissors: Corrupting your Database with 823 Errors

Corrupting databases is a lot like eating paste. Delicious, delicious paste.

This weekend, a question came up on Twitter asking if there was an easy way to simulate an 823 error. It seemed like a fun task to figure out.

In a  previous post, I showed  how to corrupt your database with a Hex editor to cause 824 errors.

What’s an 823 Error?

An 823 error in SQL Server is a severe error that occurs when accessing a database file. It’s described in detail in KB2015755.

This is Useful!

This is useful for learning about corruption, and practicing responses to corruption events.
You can also use this to test configuration scripts you have for database mail, operators, and alerts, to make sure the alerts are working properly.

Disclaimer: these scripts are for test environments only/as always use these at your own risk and be careful not to eat paste.

Step 1: Create a share in Windows on your test machine

First, you need to create a shared folder in Windows.
In order for later steps to work, grant your SQL Server instance’s service account full control over the share.

Step 2: Map a the share to a network drive from the SQL Instance

The ‘Net use’ command is specific to a profile, so the easiest way to handle this is to enable XP_CMDSHELL and map the drive from SQL Server itself.

--Oh no, security nightmare!
sp_configure 'xp_cmdshell', 1
RECONFIGURE

EXEC xp_cmdshell 'net use "Y:" "\\MYMACHINE\z_testnetworkDrive" /PERSISTENT:NO'

Step 3: Create a test database with a filegroup, file, and table  on the network drive

Now we just need to create a database, and create a table on the drive.

--Create test database and add a filegroup and table on the network share
IF  db_id('TestMe') IS NOT NULL
BEGIN
	USE master;
	ALTER DATABASE TestMe SET SINGLE_USER WITH ROLLBACK IMMEDIATE
	DROP DATABASE TestMe
END

CREATE DATABASE TestMe
go

ALTER DATABASE TestMe ADD FILEGROUP FG1
ALTER DATABASE TestMe ADD FILE (NAME=f1, FILENAME='Y:\f1.ndf', SIZE=128MB)TO FILEGROUP FG1 

USE TestMe
go
CREATE TABLE t1 (
	i INT IDENTITY,
	j CHAR(200) DEFAULT 'x'
) ON FG1
go
INSERT t1 DEFAULT VALUES
GO 20

--Flush everything to disk
CHECKPOINT
GO

Step 4: Start a Loop of Reads in Another Connection

Now we want to simulate reads. Open up a new connection against your instance, and run the following commands to repeatedly read data from the t1 table. We use DROPCLEANBUFFERS to make sure we’re reading from disk each time. (We already ran a checkpoint to flush the writes to disk.)

--Run this in another connection
SET NOCOUNT ON;
GO
BEGIN
	DBCC DROPCLEANBUFFERS

	SELECT * FROM t1
END
GO 50000

Step 5: Disconnect the network drive, and voila! 823 Error.

Now, back in your first connection, disconnect the network drive with the following command:

EXEC xp_cmdshell 'net use "Y:" /DELETE /Y'

Your connection which is running reads should fail with an error like this:

The connection will automatically be terminated when the error occurs.

Don’t forget to disable XP_CMDSHELL

Like so:

sp_configure 'xp_cmdshell', 0
RECONFIGURE

That’s better.

Activity: Recover from the corruption, without bringing the network drive back online

To fully do the activity, you’ll want to add some database backups in before the “corruption” event of disconnecting the network drive. You may want to combine full, differential, and/or log backups, and change data in the table at various points between (and after) backups.

Then, practice bringing things back online. How much data will be lost in each scenario? How quickly can you bring the database online?

Another Solution- The USB Drive

You can also do this by creating the filegroup, file, and table on a USB stick, and removing the USB stick instead of unmapping the network drive.

However, I preferred this example since it’s easy to re-run from management studio itself, and no additional physical devices are required.

Comments { 3 }

Corrupting Databases for Dummies- Hex Editor Edition

Corruption is so ugly it gets a lolworm instead of a lolcat.

Let’s make one thing clear from the start:

This Post Tells You How To Corrupt a SQL Server Database with a Hex Editor in Gruesome Detail

And that’s all this post tells you. Not how to fix anything, just how to break it.

If you aren’t familiar with corruption, corruption is bad. It is no fun at all on any data, or any server, that you care about.

Where You (Possibly) Want To Do This

You only want to do this on a test database, in a land far far away from your customers, for the purpose of practicing dealing with corruption.

When things go badly, you want to  be  prepared. This post gives you the tools in a simple, step by step fashion, to create different types of corruption so that you can practice resolving them.

Big Disclaimer: Do not run this in production. Or anywhere near production, or anything important. Ever. Only use this at home, in a dark room, alone, when not connected to your workplace, or anything you’ve ever cared about. If you corrupt the wrong pages in a user database, you may not be able to bring it back online. If you corrupt a system database, you may be reinstalling SQL Server.

References, and Thanks to Paul Randal

Everything I’m doing here I learned from Paul Randal’s blog posts. It just took me a little bit to understand how to use the hex editor and make sure I was doing it properly, so I thought I’d put down the steps I used here in detail. If you’d like to go straight to the source:

First, Get Your Hex Editor

Download XVI32 by Christian Maas. No installer is necessary: download the zip file, then unzip all files to a directory and run XVI32.exe

Create a Database to Corrupt

For our adventure, our database is named CorruptMe. We’ll create a single table, insert some data, and create a clustered index and nonclustered index on it.

(Note: Data generation technique found on Stack Overflow, attributed to Itzik Ben-Gan.)

USE master;
IF db_id('CorruptMe') IS NOT NULL
BEGIN
	ALTER DATABASE CorruptMe SET SINGLE_USER WITH ROLLBACK IMMEDIATE
	DROP DATABASE CorruptMe
END	

CREATE DATABASE CorruptMe;
GO

--Make sure we're using CHECKSUM as our page verify option
--I'll talk about other settings in a later post.
ALTER DATABASE CorruptMe SET PAGE_VERIFY CHECKSUM;

USE CorruptMe;

--Insert some dead birdies
CREATE TABLE dbo.DeadBirdies (
    birdId INT NOT NULL ,
    birdName NVARCHAR(256) NOT NULL,
    rowCreatedDate DATETIME2(0) NOT NULL )

;WITH
  Pass0 AS (SELECT 1 AS C UNION ALL SELECT 1),
  Pass1 AS (SELECT 1 AS C FROM Pass0 AS A, Pass0 AS B),
  Pass2 AS (SELECT 1 AS C FROM Pass1 AS A, Pass1 AS B),
  Pass3 AS (SELECT 1 AS C FROM Pass2 AS A, Pass2 AS B),
  Pass4 AS (SELECT 1 AS C FROM Pass3 AS A, Pass3 AS B),
  Pass5 AS (SELECT 1 AS C FROM Pass4 AS A, Pass4 AS B),
  Tally AS (SELECT ROW_NUMBER() OVER(ORDER BY C) AS NUMBER FROM Pass5)
INSERT dbo.DeadBirdies (birdId, birdName, rowCreatedDate)
SELECT NUMBER AS birdId ,
    'Tweetie' AS birdName ,
    DATEADD(mi, NUMBER, '2000-01-01')
FROM Tally
WHERE NUMBER <= 500000

--Cluster on BirdId.
CREATE UNIQUE CLUSTERED INDEX cxBirdsBirdId ON dbo.DeadBirdies(BirdId)
--Create a nonclustered index on BirdName
CREATE NONCLUSTERED INDEX ncBirds ON dbo.DeadBirdies(BirdName)
GO

Now we can take a look at the pages our table and nonclustered index got created on. I wanted to specifically corrupt a page in the nonclustered index on the DeadBirdies table. Of course if you wanted the clustered index, you could use index Id 1.

DBCC IND ('CorruptMe', 'DeadBirdies', 2)

I want to pick a data page for this nonclustered index, so I pick a PagePID where PageType=2. (The reference I use for DBCC IND is here.)

I pick PagePID 2784.

Note: If you’re following along, you may get a different PagePID if you use a different default fill factor.

Optional: Check out the page with DBCC PAGE

If you’d like to take a look at the page you’re about to corrupt, you can do so with the following command.

--Turn on a trace flag to have the output of DBCC PAGE return in management studio
--Otherwise it goes to the error log
DBCC TRACEON (3604);
GO
DBCC PAGE('CorruptMe', 1,2784,3);

Set the database offline

You must take your victim database offline to render it fully helpless accessible to your hex editor.

USE master;
ALTER DATABASE CorruptMe SET OFFLINE;

Also, get the name of your physical data file which you’ll open in your hex editor. Copy this to your clipboard.

SELECT physical_name FROM sys.master_files WHERE name='CorruptMe';

Figure out the starting offset of the page you want to corrupt. You do this simply by multiplying the page ID (PagePid) by 8192 (the number of bytes on a page).

SELECT 2784*8192 AS [My Offset]

It’s the Moment We’ve Been Waiting For: Trash That Page

Fire up your hex editor: run XVI32.exe.

Depending on your operating system, you may want to run this with elevated privileges / right click and “run as administrator”.

Open the database file by using File ? Open, and then the data file name you copied to the clipboard. (If you didn’t set the database offline, you’ll get an error that it’s in use. If you got an error that you don’t have permissions to view the file, make sure you do have permissions and that you ran XVI32.exe with elevated privileges.)

Go to the page you want to corrupt by using Address ? GoTo (or Ctrl + G), then paste in your Offset Value. You want to search for this as a decimal.

XVI43.exe will take to right to the beginning of that page.

You can see the ASCII representation of the data in the right pane. For our example, you should be able to see the word ‘Tweetie’ represented.

I like to put the cursor  in the right pane at the beginning of the word ‘Tweetie’. XVI32.exe will automatically move the cursor in the left pane, to the appropriate location.

You can corrupt the data  by editing in the right pane or left pane.

For my example, I am replacing the ASCII ‘T’ in the first occurrence of the word ‘Tweetie’ with an ‘S’. You can edit more, but a little tiny corruption goes a long way.

Save the file, and you’re done!

Admire Your Own Corruption

First, bring your database back online. If you correctly edited pages in the data, this should work just fine.

Note: If you corrupted critical system tables early in the database, this may not work! If so, go back to the steps above to identify a good page offset.

ALTER DATABASE CorruptMe SET ONLINE;

You can see the corruption in a couple of different ways. If you have checksums enabled on the database, you can see the corruption by reading the page with the data on it.

Since I corrupted a page in a nonclustered index in my example, I need to make sure I use that index. So I can see it with this query:

Use CorruptMe;
SELECT birdName FROM dbo.deadBirdies;

That returns this big scary error, which confirms I did indeed corrupt page 2784:

Msg 824, Level 24, State 2, Line 1
SQL Server detected a logical consistency-based I/O error: incorrect checksum (expected: 0xb633a8e1; actual: 0xaeb39361). It occurred during a read of page (1:2784) in database ID 18 at offset 0x000000015c0000 in file ‘D:\BlahBlahBlah\CorruptMe.mdf’. Additional messages in the SQL Server error log or system event log may provide more detail. This is a severe error condition that threatens database integrity and must be corrected immediately. Complete a full database consistency check (DBCC CHECKDB). This error can be caused by many factors; for more information, see SQL Server Books Online.

You can also see this by running a CHECKDB or CHECKTABLE command.

DBCC CHECKDB('CorruptMe')

An excerpt from its output:

Msg 8928, Level 16, State 1, Line 1
Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data): Page (1:2784) could not be processed.  See other errors for details.
Msg 8939, Level 16, State 98, Line 1
Table error: Object ID 2105058535, index ID 2, partition ID 72057594038910976, alloc unit ID 72057594039828480 (type In-row data), page (1:2784). Test (IS_OFF (BUF_IOERR, pBUF->bstat)) failed. Values are 12716041 and -4.

Now Sit Back and Laugh Maniacally. And Then Fix It.

So, the whole point of this was probably to test something.

So take a moment to enjoy the fact that FOR ONCE you don’t have to panic when you see these errors, because it’s all part of your master plan.

Then go out and fix the corruption, and run your tests.

Comments { 18 }