I am creating a simple empty database named ModTest to try out this problem query.
You don’t even have to download a sample database from anywhere to run this code and explore and try your hand at tuning this problem query yourself.
We’re going to set up some test procedures
The first procedure uses a user database table. It’s named dbo.UserDatabaseTableTest
I create the procedure, and now let’s explore the code a little bit and see what it does.
The first thing it does is create a table named dbo.UserDatabaseTable that has RowID, CharColumn and a clustered primary key. It then puts about, well it puts EXACTLY a million rows into the user table, it just generates those rows. You don’t have to restore a database, the rows are generated by this query. Then it gives a message that says, “I’m going to update rows in the table,” and it captures the current system time before it starts the update. Then it runs the update against the user database table. It’s going to update a number of rows that we specify in a parameter named @RowID that we pass into the stored procedure. Then after the update’s done, it checks what the time is again and prints a message that says, “Here’s the duration of just the update.”
We won’t be looking at the duration of the whole procedure because for the purposes of this problem we don’t care how long it takes to put the million rows in. What we care about is the duration of how long it takes to update the table.
That is our procedure for the user database table, and let’s go ahead and give it a run. What we’re going to do, we have a million rows in the table and we’re going to update 800,000 of those rows, the bulk of the rows in the table.
We run our procedure and it’s putting a million rows into that table and updating them, and sure enough we can see that the duration of the update for the user database table is about 1.5 seconds. Just for the update portion of that.
Let’s now create a procedure that uses a temp table but otherwise has the same code
This is named dbo.TempTableTest, and we’ll create it and now explore the code. It’s very, very similar. It creates the table, it adds a million rows to it, and then we record the time. We perform the update based on the row ID that’s passed in, we’re going to update 800,000 rows out of that million, and then it tells us, “Okay, here’s how long it took.”
Very, very similar code, just with semantics adapted for the temporary table. Let’s go ahead and run this code for the 800,000 rows against the temporary table and see how long does the update take against it here.
Well, it didn’t take 1.5 seconds this time. The duration of the update of the temp table was five seconds.
Well maybe that’s just a fluke? Let’s run it again. After all, I’m running this on a VM on my laptop where I am running other software. This isn’t the most scientific performance environment. This is repeatable, and it is normal for the duration against that temp table to be over five seconds, and for the duration against the user database table to be around 1.5 seconds on my laptop.
You might wonder, “Well that was a temporary table,” right? What we had there was a temporary table with a clustered primary key. I’m giving it indexing similar to the normal table.
What if we did a table variable? Next up we’re creating a procedure named TableVariableTest
This is very similar, except we’re declaring a table variable, again with the clustered primary key on RowID pumping a million rows into it, checking our time and then updating the number of rows we pass in. Very similar code just using a table variable here.
Now when I run this using the table variable, let’s see is it more like the user table or more like the temp table? Well, interestingly enough not everything in tempdb works the same. The duration here was under a second of duration using the table variable.
Sometimes we’re tempted to say things like, “Table variables are bad, temp tables are good,” or the opposite, “Temp tables are good, table variables are bad.” Whenever you find yourself making a blanket statement about that, know that the SQL server will always find ways to surprise you for different cases.
In this case our table variable is very, very fast.
What if I use a table variable and a recompile hint?
I’ve just created a stored procedure named TableVariableTestRECOMPILE, and this is the same code for the table variable. We have a table variable with a clustered primary key, but now the difference here is that when I do the update I have added in an option recompile query hint.
Will it still be fast and take under a second or will it be different? Let’s see how long that update takes now. Running TableVariableTestRECOMPILE. Huh, it’s not done yet, it’s still taking longer.
\We are back.
Now that I added that option recompile hint, which in theory helps SQL server be smarter about how much stuff is in that table variable, now we’re back to five and a half seconds, that same or similar duration that we got when we were using the temp table.
The plot thickens, the plot thickens.
We’ve just been looking at duration so far. Remember that when I got this question, there was also a note that not only did the temp table take longer, but also it used more IO.
Let’s measure the IO used by each of these procedures
The way I’m going to do this first is by using a classic. I’m going to say SET STATISTICS IO ON.
This will print to my messages tab information about how many logical reads, physical reads, et cetera, are done by the procedure.
I’m going to first run UserDatabaseTableTest
We’re still using that 800,000 rows, and we’ll see how many logical reads are done by this. The first result set here at the top is inserting the million rows into the table. What I care about is this update of rows in the user table.
We can see that again it did take under one and a half seconds this time, but close to one and a half seconds, it was 1.3 approximately seconds this time.
I can see that this update of user rows in the table did about 53,500 logical reads when using a table in a user database.
What I would expect to see based on the information I got from the folks asking me the question is that we’re going to see more IO for the slower version of this using the temp table.
Let’s measure the IO when we use the temp table
The output looks a little different, and that’s normal because statistics IO output when we’re using a temp table it is normal to get this super long line in there over to the left. Once again we have a slow duration.
We’re over five seconds, so it’s repeatably slow, and looking at the amount of IO that it did, I’m just going to throw a return in here so we can get it on this line. It did more that 53,000 logical reads.
This is, let’s add in some commas there, this is 4.8 and a half million logical reads. That’s not just a little bit more IO, that is a LOT more IO, I mean they weren’t kidding, it does a lot more IO.
What about our table variable?
Our table variable was nice and fast as long as we don’t have that recompile hint on the update statement. Sure enough our table variables IO looks quite similar to the user table where at one second, just over one second of execution time, and we did 53,000 logical reads.
How about when that recompile hint is added?
Adding in the recompile hint, there’s a query hint on the update statement that updates the table variable. Once that’s there, we have more than five seconds, similar run time to the temporary table, and we are back to 4.8 million and a half logical reads.
We’ve got a weird performance difference here!
We have these procedures and some of them are fast, UserDatabaseTableTest is fast, TableVariableTest is fast, and some of them are slow. TempTableTest is slow and TableVariableTestRECOMPILE is slow.