Category Archives: Performance

In praise of

Here’s a great free addition to your query tuning toolbox that you may well not have heard about: It hasn’t beenĀ mentioned nearly enough in my opinion.

Here’s how the author, Richie Rump, puts it:

Sometimes reading SQL Server ouput from Statistics IO and Statistics Time can be a total drag. This page will help with that. Just paste in the output of Statistics IO and/or Statistics Time and press Parse. Your output will be formatted and totaled. Enjoy.

Here’s a quick look at what it does.

If you want to see how long each part of a query takes or what logical reads are involved you can set a couple of options on: statistics io and statistics time.

set statistics io on;
set statistics time on;

-- List all indexes in the current DB
select as TableName
	, as IndexName
	, i.*
from	sys.indexes i
	inner join sys.objects t on t.object_id = i.object_id
where	i.type > 0
	and t.type = 'U'
	and t.is_ms_shipped = 0
	and not like 'sys%'
order by asc
	, i.type desc
	, asc;

This adds extra output to the Messages tab when you run your query:

SQL Server parse and compile time:
   CPU time = 0 ms, elapsed time = 14 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

(1161 row(s) affected)
Table 'syspalvalues'. Scan count 0, logical reads 2322, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'syssingleobjrefs'. Scan count 0, logical reads 2403, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysschobjs'. Scan count 1, logical reads 2566, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'sysidxstats'. Scan count 634, logical reads 1472, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

 SQL Server Execution Times:
   CPU time = 47 ms,  elapsed time = 190 ms.

It’s not that readable, even for a simple example like this. Here’s where StatisticsParser comes in. Paste in this output and hit the Parse button to see the information laid out nice and clearly:


I think this is brilliant.

Thank you, Richie.


Slow in the app, fast in SSMS

Users complain about a timeout but when you run the exact same query in SSMS it runs in a flash. Sounds familiar? I’ve been baffled by this before. I worked out that I was getting a different query plan in SSMS because of different SETĀ options but, having dealt with that, I was then stuck with parameter sniffing as the cause of the timeout. I’ve read about that but still didn’t really understand how to fix it. Erland Sommarskorg has published an excellent article ( in which he clearly explains what’s going on and provides tools and techniques to fix it. Highly recommended reading. Thanks, Erland.

Be careful with table variables

I just looked into a poorly performing stored procedure and brought it down from over 4 minutes to under 4 seconds by making one simple change: converting a table variable into a temporary table. What’s interesting is that there were only about 20,000 rows of about 1,100 bytes each in the table variable, which would be nothing in a permanent table. So what happened? I got stuck in and this is what I found.
The stored proc is for a sales report where the user can choose from a list of things to group the results by. The relevant sales are selected into a table variable with an empty column Group1, the approriate column is copied into Group1, and then there’s a SELECT with a whole lot of aggregations GROUPed BY Group1. The query plan showed 99% of the cost was in the main insert into the table variable. Further reading revealed that no statistics are created on a table variable and so to the Query Optimizer it always has a cardinality of 1, whereas statistics are created on temporary tables. Was this the problem?
I changed it to use a temporary table instead, ran that, and compared the query plans and the output from the Profiler, including the events Auto Stats (under Performance) and SP:StmtStarting & SP:StmtCompleted (under Stored Procedures). The query plans for populating the temporary table and table variable were identical but the temporary table approach showed statistics being created on the temporary table before the final SELECT, and the SELECT itself was parallelized. The SP:StmtStarting and SP:StmtCompleted events in the Profiler confirmed that it was the final SELECT that took nearly all the time with a table variable.
Interestingly the query plan showed the final SELECT was 1% of the total cost with a table variable and 23% with a temporary table. It just shows, you have to be careful about looking at the steps in the plan and zooming in on the step that has the largest share of the total cost. I could have gone blind looking at the first step without ever improving it.

Have you really fixed that problem?

The day before yesterday I saw our main live server’s CPU go up to constantly 100% with just the occasional short drop to a lower level. The exact opposite of what you’d want to see.

We’re log shipping every 15 minutes and part of that involves calling WinRAR to compress the log backups before copying them over. (We’re on SQL2005 so there’s no native compression and we have bandwidth issues with the connection to our remote site.) I realised the log shipping jobs were taking about 10 minutes and that most of that was spent shipping a ‘live’ reporting database that is completely rebuilt every 20 minutes. (I’m just trying to keep this stuff alive until I can improve it.) We can rebuild this database in minutes if we have to fail over so I disabled log shipping of that database. The log shipping went down to less than 2 minutes and I went off to the SQL Social evening in London feeling quite pleased with myself. It was a great evening – fun, educational and thought-provoking. Thanks to Simon Sabin & co for laying that on, and thanks too to the guests for making the effort when they must have been pretty worn out after doing DevWeek all day first.

The next morning I came down to earth with a bump: CPU still at 100%. WTF? I looked in the activity monitor but it was confusing because some sessions have been running for a long time so it’s not a good guide what’s using the CPU now. I tried the standard reports showing queries by CPU (average and total) but they only show the top 10 so they just show my big overnight archiving and data cleaning stuff. But the Profiler showed it was four queries used by our new website usage tracking system. Four simple indexes later the CPU was back where it should be: about 20% with occasional short spikes.

So the moral is: even when you’re convinced you’ve found the cause and fixed the problem, you HAVE to go back and confirm that the problem has gone.

And, yes, I have checked the CPU again today and it’s still looking sweet.

Cursor bad, set-based good

I just had another reminder, as if one was needed, that a cursor should be your last resort to get something done. I was looking to shave some time off my data warehouse build job and noticed a step that looked like it was taking a lot longer than I would expect. I found it was doing 12 cursors in a row on the same table. For each row in the source table it would try and insert into the target table and the insert was wrapped in a TRY…Catch to suppress duplicate row insert errors and keep going. Why? Because a few rogue cases had two rows instead of the one row they should have had. (For the record, I didn’t write this, I just inherited it.)

I changed each cursor to a straight INSERT target-table SELECT … FROM source-table using a row_number function to pick out just the first row for each case. I thought it would go a fair bit faster. In fact, it went from 7 minutes to 15 seconds. I had to check there was still the right number of rows in the target table!

I wish I had time to see how much of the overhead was the cursor and how much was the TRY…CATCH. I suspect each TRY…CATCH only makes a tiny difference but thousands of them will add up.