SQL Agent Jobs: Checking for failed steps at the end of a job

I use the SQL agent a lot, and it is handy for a lot of things, but  it can be frustrating to not be able to pass state information between steps.

For example, I have a job where I want to execute data verification steps against multiple tables. It makes sense to have the check for each table in its own step with a clear label to simplify troubleshooting– so when the job fails, you can see which step had an error and know from the name exactly what’s wrong.  But I want all steps in the job to run, regardless of whether a step fails— I want to check for failure at the end.

The most basic way to do this is to have each job step log to a table. This isn’t really bad, but I’d rather not maintain a table for every job of this type. It leaves room for failure, it’s more to maintain, and it just feels redundant for  the most part: all of the job history  is tracked in MSDB anyway, shouldn’t I be able to use that?The requirements I have for the job are:

  • Each step must be relatively short and do a discrete unit of work.
  • Each step will raiseerror with log if it has a problem, but still continue on to the next step.
  • At the end of the job the final step checks see if any previous step in the current run has failed. If so, it fails the job itself, otherwise the job will succeed.
  • The last step should list all the steps that have failed, just for ease of use.

So far this is working in my tests as the code for the final step, and it meets my needs for this type of job. Note: This uses a token to determine the job_id of the running job, so this will only work within the execution context of a SQL Agent job. If you want to test in a regular query, you’ll have to assign a fake job_id for your test.  SQL Agent steps also don’t parse tokens successfully (at least in the version I’m using).

set quoted_identifier on;
declare @errorMsg nvarchar(max);

    @errorMsg= 'The following steps failed, please investigate:' + LEFT(o.list, LEN(o.list)-1)
from (
        [text()] = step_name + ','
    FROM msdb.dbo.sysjobhistory jh
    join msdb.dbo.sysjobactivity ja on
        run_status=0 --step failed
        and step_id != 0
        and jh.job_id = CONVERT(uniqueidentifier, $(ESCAPE_NONE(JOBID)))
        and --this block ensures that we just pull information from the most recent job run
            -- The start time of the step, converted to datetime
            CONVERT (DATETIME, RTRIM(run_date))
            + ( run_time * 9
                + run_time % 10000 * 6
                + run_time % 100 * 10
            ) / 216e4
            >= ja.start_execution_date -- the time the job last started
    order by instance_id
    FOR XML PATH ('')
) o (list)

if @errorMsg is null
    print ' Everything looks good...'
    raiserror (@errorMsg, 16, 1)