Category Archives: Administration

Handling service failures

There are times when a SQL-related service has a problem and just stops. Windows services are like that sometimes. Unless you do some additional configuration, you’re not going to know about it until users start shouting at you. Here’s what you can do to make that less likely.

Open up Services and double click on the SQL Server service. By default the Recovery tab looks like this:

20160304_default_recovery_tab

If this services stops unexpectedly you won’t know. Not until your phone starts ringing off the hook!

Here’s how I suggest you improve things:

20160304_configured_recovery_tab

The first time the service fails the system will wait a minute and try to restart it. A second failure will get the same treatment. If it fails a third time you need to know so the third failure runs a PowerShell script to send an alert email.

This won’t kick in if the service is stopped by hand or by something like a SQL Server service pack. It’s only for when things go wrong.

The full contents of the Command line parameters box goes like this:

-file "\\server_name\share_name\email_service_crash.ps1" -ServerName "server_name" -ServiceName "SQL Server"

Clearly you’ll need to edit the file path, server name and service name to suit your environment.

The powershell script, email_service_crash.ps1, is this:

Param (
    [Parameter(Mandatory=$True)]
    [String]$ServerName,

    [Parameter(Mandatory=$True)]
    [String]$ServiceName
)

$from = "from_address"
$to = "to_address"
$subject = $ServiceName + " service is down on " + $ServerName
$body = "Find out why and get it started again."

$creds = new-object System.Net.NetworkCredential("smtp_user_name", "smtp_user_password")
$smtp = new-object System.Net.Mail.SmtpClient("smtp_server_name")
$smtp.UseDefaultCredentials = "False"
$smtp.Credentials = $creds
$smtp.Send($from, $to, $subject, $body)

Again, you’ll need to edit some of the variables and parameters: $from, $to, and the SMTP server, user and password.

Set the script up and test it in a cmd window like this:

powershell.exe -file "\\server_name\share_name\email_service_crash.ps1" -ServerName "server_name" -ServiceName "SQL Server"

Once you’ve got it working, you can take the string from ‘-file’ onwards and use it in the service Recovery tab.

You might be asking what the checkbox Enable actions for stops with errors means. Good question. The documentation is no help but it turns out there are some situations where this stuff won’t kick in unless that box is ticked. For more details have a look at this question on Server Fault: http://serverfault.com/questions/72318/set-up-recovery-actions-to-take-place-when-a-service-fails

You can re-use this in the Recovery tab for each service you want to cover. Just remember to change the -ServiceName parameter you pass to the PowerShell script.