Category Archives: Log Shipping

Upgrading log shipping from 2005 to 2008 or 2008R2

If you’re using log shipping you need to be aware of some small print. The general idea is to upgrade the secondary server first and then the primary server because you can continue to log ship from 2005 to 2008R2. But this won’t work if you’re keeping your secondary databases in STANDBY mode rather than IN RECOVERY. If you’re using native log shipping you’ll have some work to do. If you’ve rolled your own log shipping (ahem) you can convert a STANDBY database to IN RECOVERY like this:

restore database [dw] with norecovery;

and then change your restore code to use WITH NORECOVERY instead of WITH STANDBY. (Finally all that aggravation pays off!)

You can either upgrade the secondary server in place or rebuild it. A secondary database doesn’t actually get upgraded until you recover it so the log sequence chain is not broken and you can continue shipping from the primary. Just remember that it can take quite some time to upgrade a database so you need to factor that into the expectations you give people about how long it will take to fail over.

For more details, check this out: http://msdn.microsoft.com/en-us/library/cc645954(SQL.105).aspx.

Log shipping and shrinking transaction logs

I just solved a problem that had me worried for a bit. I’m log shipping from three primary servers to a single secondary server, and the transaction log disk on the secondary server was getting very full. I established that several primary databases had unused space that resulted from big, one-off updates so I could shrink their logs. But would this action be log shipped and applied to the secondary database too? I thought probably not. And, more importantly, would it break log shipping? My secondary databases are in a Standby / Read Only state so I didn’t think I could shrink their logs. I RTFMd, Googled, and asked on a Q&A site (not the evil one) but was none the wiser.

So I was facing a monumental round of shrink, full backup, full secondary restore and re-start log shipping (which would leave us without a disaster recovery facility for the duration). Then I thought it might be worthwhile to take a non-essential database and just make absolutely sure a log shrink on the primary wouldn’t ship over and occur on the secondary as well. So I did a DBCC SHRINKFILE and kept an eye on the secondary. Bingo! Log shipping didn’t blink and the log on the secondary shrank too.

I just love it when something turns out even better than I dared to hope. (And I guess this highlights something I need to learn about what activities are logged.)

Roll your own log shipping

I tried to use SQL Server’s built-in log shipping to set up a disaster recovery facility but I found there were some serious limitations. So, after some research, I decided to do it myself. Here’s what happened.

Requirements
A disaster recovery plan is useless unless you can test it, preferably at least once a year. To be really sure, I think you need to prove you can fail over, use, and fail back the production systems. It’s all very well testing on a test setup – you have to make sure it works – but there’s no substitute for confirming that it actually works in the environment it’s meant to preserve. There are so many things that have to fit together and, in a disaster situation, you really don’t want find yourself saying “Crap, we didn’t think about that!” and explaining to your boss that the business is going to be down for a day or two instead of the hour or two you’d confidently promised.

So you have to be able to smoothly fail over to the standby servers, have all the data consuming systems gracefully adapt to the new data sources, confirm they all work, and then reverse the process. Without losing any data. At least once a year. Easy!

Some specific technical requirements:

  • Setup and configuration must be scriptable. I don’t want to screw around with multi-page wizards and write reams of documentation about it. If I have to rebuild the server I want to get it done quickly and easily, and scripts are the way.
  • I need it to tell me if it goes wrong, and then I need some tools to help me understand where I am and how I got there so I can fix it.
  • When failed over, I want to be able to log ship back the other way. In a real disaster this is probably not an issue but it’s vital for testing the solution. Once live transactions have been applied to the standby server you have to log ship them back to the normal live server or lose them.
  • I decided to have a time lag of 3 hours before transactions are applied to the standby server. We thought we could use this to give us a chance to rescue data that was deleted by mistake.

Trying Out-of-the-Box Log Shipping
I filled in about 11 pages of wizard, setting up the source, standby and monitor. Then I captured the script, thinking I could just copy it for each database and find and replace the database name. But the script doesn’t run! I spent quite a while searching for how to fix and couldn’t find an answer. At this point I decided I could live without the scriptability so I went through the wizard again, building a detailed installation guide for our domain.

It all looked great at first. There’s a report on the monitor server to show the progress of backups, file copies and restores. I failed over, putting the source database into standby mode ready to receive logs. But when I tried to restore the standby database right up to date there was no way to override the 3 hour delay I’d specified. I went into the wizard, took out the delay, restored all logs and brought the database online. My test application adapted to the new data source. Not ideal in a real disaster with everyone breathing down my neck but I can live with it.

Now to set up log shipping back the other way. I go into the wizard and it won’t let me set up the monitor. Apparently you can only monitor shipping between two servers in one direction. If you remove the initial shipping you can set it up going the other way. So that’s about 11 pages of wizard per database to failover, and the same to fail back. That’s enough of that!

My Way
I looked around hoping to find that somebody else had already done a full solution but I couldn’t find one. I felt Pop Rivetts Does Log Shipping came the closest so I cannibalised it into what I wanted.

I have the following jobs on both servers:

LogShipBackup
Does a log backup on selected databases, compresses the file with WinRAR, copies it to a share on the other server, and does housekeeping on the files and the backup/respore history tables in msdb. The copy stage copies over any files that are not on the other side so it can catch up if connection has been lost for a bit.

LogShipBackupAlert
Sends me an email if the LogShipBackup job has not run recently and successfully. LogShipBackup will send me an email if it fails but that won’t alert me if it doesn’t run.

LogShipRestore
Decompresses and restores logs in sequence on selected databases, and does housekeeping on the files and the backup/respore history tables in msdb. You can specify a delay so that logs aren’t applied immediately. It checks the backup history so it doesn’t trip up if a file is copied over again. It actively kills connections to a database to ensure the restore can go ahead.

LogShipRestoreAlert
Sends me an email if the LogShipRestore job has not run recently and successfully. LogShipRestore will send me an email if it fails but that won’t alert me if it doesn’t run.

LogShipFailOver
Disables the LogShipBackup and LogShipBackupAlert jobs, backs up the tail of the log on selected databases, puts them in read-only standby mode, and copies the backup files over to the other server. The copy stage copies over any files that are not on the other side so it’s quite resilient.

LogShipBringOnline
Disables the LogShipRestore and LogShipRestoreAlert jobs, restores all logs in sequence on selected databases and brings those databases online.

The jobs LogShipBackup and LogShipBackupAlert are enabled on the primary server, and LogShipRestore and LogShipRestoreAlert are enabled on the standby server. I run them every 15 minutes. To fail over, run LogShipFailOver on the primary and then LogShipBringOnline on the standby server.

Apart from the alert jobs, each job has one or more steps that perform the appropriate actions on a database that is shipping or receiving logs. The jobs rely on a set of stored procedures installed in both servers’ master databases.

The code is available for download here. If you’re interested, please feel free to check it out and use it if you want to. If it doesn’t quite do what you want you can always fix that. I’d be interested to know what you think of it. I’d also be very keen to hear from you if you’ve found ways round the problems I found with the built-in log shipping.

Recommended Reading
To find out more about log shipping I recommend the following: