Category Archives: Troubleshooting

SQL Server service startup failure

I just had a SQL Server service that didn’t start after a reboot of the machine. Error message: “The service did not start due to a logon failure. [0x8007042d]”. Hmm.

In the Local Security Policy editor the service account was no longer listed under the user rights assignment for Logon as a service. I discovered I had no permission to change that because it’s managed by a domain policy.

A search found this thread which helped me to sort it out. I tried resetting the service account from the SQL Server Configuration Manager and that didn’t fix the Logon as a service rights so I still had the same error. But when I reset the service account from Services in the Control Panel instead that re-applied the Logon as a service rights and the service started.

I still have no idea why that permission dropped off but at least now I know how to get it back again. It’s progress of a sort.


Missing SQL PerfMon counters

A server has just lost all the SQL Server PerfMon counters. They’ve all disappeared into thin air. It plays havoc with our monitoring tool! We’ve had a couple of instances of this lately. I haven’t found the cause and it’s starting to get on my nerves. Luckily the answer is fairly simple once you know. NB: This involves restarting the SQL Server service so you might want to do it out of hours.

The steps are:

  1. Run cmd.exe as Administrator.
  2. Go to the Binn directory. On my SQL 2014 instance this was cd C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\Binn.
  3. unlodctr MSSQLSERVER (or unlodctr MSSQLSERVER$INSTANCENAME for a named instance).
  4. lodctr perf-MSSQLSERVERsqlctr.ini (or lodctr perf-MSSQLSERVER$INSTANCENAMEsqlctr.ini for a named instance).
  5. Restart the Remote Registry service if running.
  6. Restart the Performance Logs & Alerts service if running.
  7. Restart the SQL Server service.

There are some other strange things that can happen: Remote Registry service stopped, all PerfMon counters – or just SQL counters – disabled in the Registry.


An unexpected error

I recently had an overnight job fail with this error message:

Msg 8630, Level 17, State 70, Line 1
Internal Query Processor Error: The query processor encountered an unexpected error during execution (HRESULT = 0x80004005).

Not much to work with there! Luckily the script has lots of PRINTs in it so it was easy to find the statement in question. It was a fairly simple UPDATE from a SELECT, nothing too complicated. I ran it and it worked. No sign of a block or deadlock at the time it failed so I figured it was just a one-off. You don’t have time to explore every rabbit hole.

But the next night the error happened again. I managed to narrow it down to the SELECT statement but still no idea what was wrong. It took a few tries to get it to work this time. Very strange. Time to get to the bottom of this. Books Online had nothing. I searched and found various things about memory pressure and corrupt dates but none of that panned out. I ran DBCC CHECKDB. No errors.

Sometimes you just need to call for help so I contacted Microsoft support. It was very interesting: the questions they asked, the process of collecting lots of data, and the information they got from that data. They confirmed what I thought which was that the machine had plenty of disk space, memory and processor to spare so it looked like a data issue. They suggested rebuilding the indexes on the tables involved in the SELECT.

The tables weren’t that big so I:

  1. scripted out the non-clustered indexes
  2. dropped the non-clustered indexes
  3. rebuilt the clustered indexes
  4. re-created the non-clustered indexes

This did the trick so it’s worth trying if you get this error and can’t find any other cause.

Fun with upgrading and BCP

I just had trouble with using BCP out via xp_cmdshell. Probably serves me right but that’s a different issue. I got a strange error message ‘Unable to resolve column level collations’ which turned out to be a bit misleading. I wasted some time comparing the collations of the the server, the database and all the columns in the query. I got so desperate that I even read the Books Online article. Still no joy but then I tried the interweb. It turns out that calling bcp without qualifying it with a path causes Windows to search the folders listed in the Path environment variable – in that order – and execute the first version of BCP it can find. But when you do an in-place version upgrade, the new paths are added on the end of the Path variable so you don’t get the latest version of BCP by default.
To check which version you’re getting execute bcp -v at the command line. The version number will correspond to SQL Server version numbering (eg. 10.50.n = 2008 R2). To examine and/or edit the Path variable, right-click on My Computer, select Properties, go to the Advanced tab and click on the Environment Variables button. If you change the variable you’ll have to restart the SQL Server service before it takes effect.

Failing report subscriptions

We had an interesting problem while I was on holiday. (Why doesn’t this stuff ever happen when I’m there?) The sysadmin upgraded our Exchange server to Exchange 2010 and everone’s subscriptions stopped. My Subscriptions showed an error message saying that the email address of one of the recipients is invalid. When you create a subscription, Reporting puts your Windows user name into the To field and most users have no permissions to edit it. By default, Reporting leaves it up to exchange to resolve that into an email address. This only works if Exchange is set up to translate aliases or ‘short names’ into email addresses. It turns out this leaves Exchange open to being used as a relay so it is disabled out of the box. You now have three options:
  1. Open up Exchange. That would be bad.
  2. Give all Reporting users the ability to edit the To field in a subscription. a) They shouldn’t have to, it should just work. b) They don’t really have any business subscribing anyone but themselves.
  3. Fix the report server to add the domain. This looks like the right choice and it works for us. See below for details.
  • A single email domain name.
  • A clear relationship between the Windows user name and the email address. eg. If the user name is joebloggs, then joebloggs@domainname needs to be the email address or an alias of it.
Warning: Saving changes to the rsreportserver.config file will restart the Report Server service which effectively takes Reporting down for around 30 seconds. Time your action accordingly.
Edit the file rsreportserver.config (most probably in the folder ..\Program Files[ (x86)]\Microsoft SQL Server\MSRS10_50[.instancename]\Reporting Services\ReportServer). There’s a setting called DefaultHostName which is empty by default. Enter your email domain name without the leading ‘@’. Save the file. This domain name will be appended to any destination addresses that don’t have a domain name of their own.

Runaway version store in tempdb

Today was really a new one. I got back from a week off and found our main production server’s tempdb data file had gone from its usual 200MB to 36GB. Ironically I spent Friday at the most excellent SQLBits VI and one of the sessions I attended was Christian Bolton talking about tempdb issues – including runaway tempdb databases. How just-in-time was that?!
I looked into the file growth history and it looks like the problem started when my index maintenance job was chosen as the deadlock victim. (Funny how they almost make it sound like you’ve won something.) That left tempdb pretty big but for some reason it grew several more times. And since I’d left the file growth at the default 10% (aaargh!) the worse it got the worse it got. The last regrowth event was 2.6GB. Good job I’ve got Instant Initialization on. Since the Disk Usage report showed it was 99% unallocated I went into the Shrink Files dialogue which helpfully informed me the data file was 250MB.
I’m afraid I’ve got a life (allegedly) so I restarted the SQL Server service and then immediately ran a script to make the initial size bigger and change the file growth to a number of MB. The script complained that the size was smaller than the current size. Within seconds! WTF? Now I had to find out what was using so much of it. By using the DMV sys.dm_db_file_space_usage I found the problem was in the version store, and using the DMV sys.dm_db_task_space_usage and the Top Transactions by Age report I found that the culprit was a 3rd party database where I had turned on read_committed_snapshot and then not bothered to monitor things properly.
Just because something has always worked before doesn’t mean it will work in every future case. This application had an implicit transaction that had been running for over 2 hours.

Have you really fixed that problem?

The day before yesterday I saw our main live server’s CPU go up to constantly 100% with just the occasional short drop to a lower level. The exact opposite of what you’d want to see.

We’re log shipping every 15 minutes and part of that involves calling WinRAR to compress the log backups before copying them over. (We’re on SQL2005 so there’s no native compression and we have bandwidth issues with the connection to our remote site.) I realised the log shipping jobs were taking about 10 minutes and that most of that was spent shipping a ‘live’ reporting database that is completely rebuilt every 20 minutes. (I’m just trying to keep this stuff alive until I can improve it.) We can rebuild this database in minutes if we have to fail over so I disabled log shipping of that database. The log shipping went down to less than 2 minutes and I went off to the SQL Social evening in London feeling quite pleased with myself. It was a great evening – fun, educational and thought-provoking. Thanks to Simon Sabin & co for laying that on, and thanks too to the guests for making the effort when they must have been pretty worn out after doing DevWeek all day first.

The next morning I came down to earth with a bump: CPU still at 100%. WTF? I looked in the activity monitor but it was confusing because some sessions have been running for a long time so it’s not a good guide what’s using the CPU now. I tried the standard reports showing queries by CPU (average and total) but they only show the top 10 so they just show my big overnight archiving and data cleaning stuff. But the Profiler showed it was four queries used by our new website usage tracking system. Four simple indexes later the CPU was back where it should be: about 20% with occasional short spikes.

So the moral is: even when you’re convinced you’ve found the cause and fixed the problem, you HAVE to go back and confirm that the problem has gone.

And, yes, I have checked the CPU again today and it’s still looking sweet.