Required Testing for Installing SQL Server Cumulative Updates and Service Packs

on April 28, 2016

Microsoft recently updated their policies and recommendations for installing cumulative updates.

Cumulative-Updates-And-Testing Now, in the header for each cumulative update, it reads:

Microsoft recommends ongoing, proactive installation of CUs as they become available:

SQL Server CUs are certified to the same levels as Service Packs, and should be installed at the same level of confidence.

Historical data shows that a significant number of support cases involve an issue that has already been addressed in a released CU.

CUs may contain added value over and above hotfixes. This includes supportability, manageability, and reliability updates.

But Important Things Can Go Wrong in Cumulative Updates and Service Packs

It says something else in the header of the cumulative update:

Just as for SQL Server service packs, we recommend that you test CUs before you deploy them to production environments.

That’s because nobody’s perfect. Something may break in your environment after you install a Service Pack or cumulative update. Microsoft tests the releases, but you need to test them, too.

You Should Still Be Installing Cumulative Updates

Some folks are in the habit of waiting for a while after a Service Pack is released to see if issues arise or if the Service Pack gets patched. That isn’t a great plan either. You end up way behind on the release cycle. Whenever you have a performance issue, it’s extra work to figure out if your problem is fixed in the releases you haven’t installed yet.

Don’t be cynical about Service Packs and Cumulative Updates. Be better at testing.

So what should you test?

I’ve got a list!

1. Test Query Performance. You Need a Benchmark Test.

Install the cumulative update into a non-production environment (prod test or staging) and run application activity against it.

Ideally, you want this to be representative activity for your real, production applications, run from multiple application servers. You also want the hardware and storage to be as similar as possible.

If that’s not possible for you to set up, you need some sort of tool to generate concurrent activity. There are a lot of options – Michael Swart gives a great run down here.

Whatever you choose, implement it in a way that mimics your production environment as much as possible. Think about things like these:

If production uses in-memory tables, your benchmark should use it
If a database on the production instance uses Transparent Data Encryption, a database on your benchmark server should, too (it impacts tempdb for all databases)
Your security setup/ authentication should mimic what you use in production. Use kerberos in prod? Use it in staging.

2. Test Backups, While the Benchmark Is Running

This one is super easy to forget, because most people don’t run backups outside of production.

But don’t forget testing full and log backups while activity is running! Yep, Service Packs and cumulative updates can break those, or significantly change performance on them. Don’t let this be a nasty surprise you only learn about in production.

Backups in your staging environment may be normally slower than backups in production. That’s OK, just make sure you track what’s normal for the environment, so you can tell if it got significantly slower.

3. Test Index Maintenance and CHECKDB (Yep, While the Benchmark is Running)

Similar to backups, you want to put all your maintenance through its paces. Don’t run it all at the same time, run it serially.

Most benchmark workloads can run at different levels. You may choose to normally run your benchmark at a lower level while running index maintenance, and that’s totally fine – a benchmark that hammers the staging server to its limits hopefully doesn’t mimic your production environment. Just be consistent and know what normal times are.

4. Test Failovers (if You’re Using High Availability)

This is so critical. I learned that back when I applied a Service Pack to a production cluster, and then found it wouldn’t come online when I tried to fail it over. We had a configuration issue that we only found in production because we didn’t have any clusters outside of production.

At least, we didn’t before that incident. We sure had them afterward.

If you’re running Availability Groups with three replicas and a logshipping subscriber in production, you need the same topology in pre-production/staging. And you need to test and plan exactly how the upgrade will roll out, when you’ll fail over what to where, and don’t forget to make sure that it fails back, too.

5. Test Application Jobs and Scheduled Packages

Make sure that you’ve got the same scheduled SQL Agent jobs and regularly scheduled processes set up against your staging server, and that they all run normally and in good time.

Never assume that staging matches production exactly before you start. Yep, it should, but it never hurts to make sure that the same jobs are set up the same way.

Do You Have Time to Do This Every Cumulative Update?

Cumulative Updates are released every two months. Your job probably includes more than testing Cumulative Updates and Service Packs. You may not have time to do this properly– and if performance and availability is important, you shouldn’t cut corners on this.

In a perfect world, you’d test and evaluate every CU. The more imperfect your world is, the harder that is.

Compromise: Don’t Install Every CU, But Read the Release Notes and Update Regularly

When I first became a DBA, our team had the following policy:

Pick a Service Pack + Cumulative Update combo that was recent, and establish it as stable across production
Read release notes for all hotfixes, Cumulative Updates, and Service Packs and look for fixes that might be relevant to the environment
Update to a new SP+CU level a few times a year, either when a compelling fix was released or if a full quarter had gone by with no change
New SP + CUs always burn in on development instances first
They then proceeded to staging / pre-production
In staging and pre-production, DBAs kick the tires, fail the instances over, and look for deal breakers (the testing plan above)
They then went to less critical production instances first, and gradually worked up to more critical instances
Exceptions were available for critical hotfixes / breakfix, and those issues always involved escalation to Microsoft Premiere Support
Change Management was used all the time

It worked really well then, and this plan still works well with the new Cumulative Update policy.

How to Stay Current on New Releases

Cumulative Updates not only contain important fixes, they also sometimes contain interesting little features, like trace flag 7471. To stay fresh, you can: