Monthly Archives: May 2009

Monitoring SQL Agent Jobs

In my last post I showed how to reliably get your SQL Agent jobs to only bother you if they fail. But how can you be confident that they are in fact running? Well, what I did is I set up a report in Reporting Services that reads the job history tables on a list of servers and shows me what jobs have run lately and whether they succeeded. I have the report sent to me every morning. Successful jobs are green and failed jobs are red. I get a clear consolidated picture of what’s been going on.

Here’s the query the report uses:

if exists (	select	1 
		from	dbo.sysobjects 
		where	id = object_id(N'dbo.rpt_ITSQLAgentJobs') 
			and objectproperty(id, N'IsProcedure') = 1  )
	drop procedure dbo.rpt_ITSQLAgentJobs
go
 
set quoted_identifier on
go
set ansi_nulls on
go
 
create proc dbo.rpt_ITSQLAgentJobs
(
	@From   datetime
	, @To   datetime
)
as
 
/*
----------------------------------------------------------------------
Version:    1.0
Date:       13/05/2009
----------------------------------------------------------------------
*/
 
set nocount on
 
declare @FromInt int
	, @ToInt int
 
-- In sysjobhistory Run Date is an integer in the form YYYYMMDD.
set @FromInt = cast((convert(varchar(8),@From,112)) as int)
set @ToInt = cast((convert(varchar(8),@To,112)) as int)
 
-- I tried a union but the servers aren't all on the same collation so that doesn't work.
-- The temp table approach gets round that.
 
----------------------------------------------------------------------
-- server1
----------------------------------------------------------------------
  
-- Create the temp table from the first server's results.
select	sjh.run_date
	, sjh.run_time
	, sjh.server
	, sj.name as job_name
	, sjh.step_name
	, sjh.message
	, sjh.run_status
into	#t
from	server1.msdb.dbo.sysjobhistory sjh with (nolock)
	inner join server1.msdb.dbo.sysjobs_view sj with (nolock) on sj.job_id = sjh.job_id
where	sjh.step_id = 0 --(Job outcome)
	and sjh.run_date >= @FromInt
	and sjh.run_date <= @ToInt

---------------------------------------------------------------------- 
-- server2 
---------------------------------------------------------------------- 

begin try 

	insert	#t 
	select	sjh.run_date 
		, sjh.run_time 
		, sjh.server collate Latin1_General_CI_AS 
		, sj.name as job_name 
		, sjh.step_name 
		, sjh.message 
		, sjh.run_status 
	from	server2.dbo.sysjobhistory sjh with (nolock) 
		inner join server2.msdb.dbo.sysjobs_view sj with (nolock) on sj.job_id = sjh.job_id 
	where	sjh.step_id = 0 --(Job outcome) 
		and sjh.run_date >= @FromInt
		and sjh.run_date <= @ToInt
 
end try
 
begin catch
 
	insert	#t
		(run_date
		, run_time
		, [server]
		, job_name
		, step_name
		, [message]
		, run_status)
	select	99999999
		, 0
		, 'server2'
		, ''
		, ''
		, error_message()
		, 0
 
end catch
 
-- Repeat the server2 bit for every other server you want to monitor
 
--Return results
select	*
	, case run_status
		when 0 then 'Failed'
		when 1 then 'Succeeded'
		when 2 then 'Retry'
		when 3 then 'Canceled'
		when 4 then 'In progress'
		else 'Unknown (' + cast(run_status as varchar(3)) + ')'
	  end as status_desc
from	#t
order by run_date desc
	, run_time desc
	, [server]
 
go
Advertisements

Better Notification from SQL Agent Jobs

I manage about 10 SQL servers and I rely heavily on SQL Agent jobs to do this. My aim is to get everything going smoothly with minimal intervention from me, but I want to know straight away when something goes wrong. SQL Agent has an excellent scheduler and what you can do in a job step is flexible and powerful but it’s not great at coping if the job goes wrong. Here’s my simple solution.

Let’s look at an example: a job to backup a number of databases, one in each step. I want it to backup as many as it can, so if there’s an error I want it to keep going and then tell me at the end that the job wasn’t successful.

You can’t pass data directly between job steps. MSDN says you can do it via a permanent table or global temporary table – the permanent table worked for me but not the global temporary table – but this doesn’t feel right either. Depending on how you configure what a step does if it fails, you can control things to a degree:

  • If each step quits the job I don’t get all my backups done.
  • If each step goes to the next step the job won’t notify me there was a problem because it got successfully to the last step and that succeeded too. (I raised a request on Connect about this. If the job history viewer can show an asterisk when one of the steps failed, the notification system should be able to make this distinction too.)
  • I suppose I could use the ‘Go to step X’ to create a structure with an optional ‘Step X-1 failed’ step after each step which is skipped when there’s no error. But that’s just ugly and fragile.

None of these work for me. So what I do is have each step just go to the next step and the last step is a stored proc that checks the job history for this run and raises an error if anything failed. That last step quits the job reporting failure or success. Then I set the job to notify me if it fails. If any backup fails the job just moves on to the next one and, at the end, the stored proc sees the error and raises a new one, which triggers the notification email to me.

Here’s the stored proc:

if exists (	select	1 
		from	dbo.sysobjects 	
		where	id = object_id(N'dbo.dba_CheckJobSuccess') 
			and objectproperty(id, N'IsProcedure') = 1	)
	drop proc dbo.dba_CheckJobSuccess;
go

set ansi_nulls on
go
set quoted_identifier on
go
 
 
create proc [dbo].[dba_CheckJobSuccess]
(
	@JobID		uniqueidentifier 
)
as
 
/*
------------------------------------------------------------------
Version : 1.0
Date    : 07/05/2009
------------------------------------------------------------------
*/
 
set nocount on
 
 
--Check for job success
 
declare @PrevInstance	int
	, @ErrCount	int
 
--Find the end of the history for the previous run
select	top 1 @PrevInstance = sjh.instance_id
from	msdb.dbo.sysjobhistory sjh with (nolock)
where	sjh.job_id = @JobID
	and sjh.step_id = 0 --0=(Job outcome)
order by sjh.instance_id desc
 
select	@PrevInstance = isnull(@PrevInstance,0)
 
--Look for failed steps since then
select	@ErrCount = count(*)
from	msdb.dbo.sysjobhistory sjh with (nolock)
where	sjh.job_id = @JobID
	and sjh.instance_id > @PrevInstance
	and sjh.step_id <> 0 --0=(Job outcome)
	and sjh.run_status = 0
 
select @ErrCount = isnull(@ErrCount,0)
 
--Raise an error if any step(s) failed
if @ErrCount > 0
begin
	raiserror('One or more job steps failed.', 16, 1)
end
 
go

And here’s how the last step calls it:

--Check for job success
exec master.dbo.dba_CheckJobSuccess @JobID = $(ESCAPE_SQUOTE(JOBID))

The value for @JobID uses token replacement to get the ID of the job in which the code is executing so I can paste this into any job and it will just work.