AVG() in TSQL: Watch Your Data Types

on June 21, 2018

This week’s Quizletter featured a quiz on using the AVG() function in SQL Server.

I was inspired to write this quiz because I’ve been teaching “TSQL School” each week. When we covered aggregate functions, I remembered how tricksy AVG() can be.

Question 1: AVG of 1 and 1

In question 1, we average a value on two rows: one and one.

How did folks do on this question?

  • Correct: 91% - These folks guessed the answer would be: 1
  • Incorrect: 9% - These folks almost all guessed the answer would be: 1.0

This seems like a subtle difference, but the key to the answer is understanding what data type SQL Server will implicitly convert these numbers to. And data types matter a lot when it comes to the results you can get out of AVG. In this case, 1 will be implicitly converted to an INT data type.

Question 2: AVG of 1 and 2

Here’s where the wacky fun begins: if I average two ints, one and two, I get back… one!

  • 46% - YAY TEAM! Great work! This is bizarro, so good on you folks.
  • 54% - I feel your pain. Most of these folks guessed 1.5, a few guessed 1.500000

What does this mean? Well, let’s look at an example with negative numbers to help get a bigger picture:

When we round integers -1 and -2, we get -1.

How to describe this behavior

There are two different ways you can think about this behavior. Pick the one that is easier for you to remember.

Description 1: SQL Server rounds integers “toward zero”. (When the average was positive, we got the ‘floor’. When the average was negative, we got the ‘ceiling’.)

Description 2: It’s truncating the decimal bits.

Extra credit: Guess which one of these descriptions came from me, and which one came from my computer scientist partner.

Question 3: Mixing in a CAST()

Aha! Now we’re getting fancy. We have a CAST function outside of the AVG function.

The keys to this answer are that CAST will be run after the AVG function completes, and the AVG function is still returning 1, because inside the AVG function, it sees the INT data type.

  • Correct: 36% - These cunning quizzers worked out that AVG would truncate the decimal bits and return 1, and that this would then be cast to 1.0
  • Incorrect: 64% - Most folks guessed 1.5

Question 4: Move that CAST() inside the AVG()

Here we finally have an example where I am changing the data type inside the AVG, and we are casting to a numeric type with precision of three, scale of one. (One number past the decimally bit!)

And, yeah, look at that, we get back a result with a scale of SIX.

In fact, if you to the trouble to see what type we end up with here (like by selecting the results into a temp table and inspecting the data types), this is numeric(38,6). 38 is the highest precision you can get with numeric.

  • Correct: 37% - Clever rabbits!
  • Incorrect: 63% - Most of these folks guessed 1.5, and I do not blame you at all

What’s up with this? Well, there’s a bit of info inside the documentation on AVG, but it’s about the decimal data type. Numeric and decimal are twins (but you want to use one or the other consistently, because of things like foreign keys). In the section on return types, it explains that the return type for decimal category (p, s) will be:

decimal(38, s) divided by decimal(10, 0)

Plugging that formula into the numbers we’re working with in this question:

Yep, it checks out.

Takeaways

Weird things like this are why some folks say not to do calculations in the SQL language.

For times when you do need to do calculations in TSQL, remember to think about your data types– both inside your functions, and outside.