Category Archives: SQL Coding

Strange gotcha with column names

Here’s a funny thing I discovered while working with a table I created by importing data from Excel. (I know, I only have myself to blame.) A number of the column names ended up with a trailing space on the end. They were Y/N type columns where the value was either Y or null. I decided to update all the nulls to Ns so the nulls wouldn’t trip me up later. So I typed a query like this: update

set = isnull(, ‘N’). I noticed Intellisense had underlined the column name but I knew what I was doing and assumed it was still refreshing its cache. The update ran fine but when I tried to select that column I got a ‘no such column’ error. WTF?

I can only assume this is a side-effect of the way you can update things like rowset functions and not just tables. I’ve done a bit of playing around and found that (surprise surprise) some variants will work and some will fail. Try it for yourself:

create table dbo.TestTable
(
 [TestColumn ] varchar(20) null
);
go

--This update will give the result: (0 row(s) affected)
update dbo.TestTable 
set TestColumn = 'hello';
go

--But this update will fail:
--Msg 207, Level 16, State 1, Line 2
--Invalid column name 'TestColumn2'.
update dbo.TestTable 
set TestColumn2 = 'hello';
go

drop table dbo.TestTable;
go

I suppose the moral of the story is: concentrate and check that updates affect the number of rows you expect. (Damn, that’s not going to work!)

An annoying problem with Nulls

I just lost half an hour of my life which I’ll never get back on this. Don’t make the same mistake. I extracted some customer details for a mailing (it’s OK, they ticked the box that they want to hear about our special offers) but Marketing complained that some of the address data contained nulls. I checked the spreadsheet I’d given them and they were right. So I checked my query and followed the trail back to the source record for an example. The value in AddressLine2 was Null. But my select statement said isnull(AddressLine2, ”) as AddressLine2 and it was returning Null. WTF?! The sharp-eyed reader has probably already noticed that the value was the string ‘Null’, not the empty value NULL. I got there in the end but it took a while. Good job it wasn’t upper case or I might have had to ask one of the developers for help!

If you get this problem and it’s upper case you could try using the lower() function which will change the case of ‘NULL’ but not NULL:

declare @a varchar(10) = null;
declare @b varchar(10) = 'NULL';
select lower(@a) as a, lower(@b) as b;

I love it when a plan comes together

I’m currently working on an application so that our Marketing department can produce most of their own mailing lists without my having to get involved. It was all going well until I got stuck on the bit where the actual SQL query is generated but a rummage in Books Online revealed a very clean solution using some constructs that I had previously dismissed as pointless.
Typically we want to email every customer who is in any of the following n groups. Experience shows that a group has the following definition:
<people who have done A>
[(AND <people who have done B>) | (OR <people who have done C>)]
[APART FROM <people who have done D>]
When doing these by hand I’ve been using INNER JOIN for the AND, UNION for the OR, and LEFT JOIN + WHERE D IS NULL for the APART FROM. This would produce two quite different queries:
-- Old OR
select  A.PersonID
from  (
  -- A
  select  PersonID
  from  ...
  union  -- OR
  -- C
  select  PersonID
  from  ...
  ) AorC
  left join  -- APART FROM
  (
  select  PersonID
  from  ...
  ) D on D.PersonID = AorC.PersonID
where  D.PersonID is null 

-- Old AND
select  distinct main.PersonID
from  (
  -- A
  select  PersonID
  from  ...
  ) A
  inner join  -- AND
  (
  -- B
  select  PersonID
  from  ...
  ) B on B.PersonID = A.PersonID
  left join  -- APART FROM
  (
  select  PersonID
  from  ...
  ) D on D.PersonID = A.PersonID
where  D.PersonID is null

 

But when I tried to write the code that can generate the SQL for any combination of those (along with all the variables controlling what each SELECT did and what was in all the optional bits of each WHERE clause) my brain started to hurt. Then I remembered reading about the (then new to me) keywords INTERSECT and EXCEPT. At the time I couldn’t see what they added but I thought I would have a play and see if they might help. They were perfect for this. Here’s the new query structure:

-- The way forward
select  PersonID
from  (
    (
      (
      -- A
      select  PersonID
      from  ...
      )
      union      -- OR
      intersect  -- AND
      (
      -- B/C
      select  PersonID
      from  ...
      )
    )
    except
    (
    -- D
    select  PersonID
    from  ...
    )
  ) x

I can easily swap between between UNION and INTERSECT, and omit B, C, or D as necessary. Elegant, clean and readable – pick any 3! Sometimes it really pays to read the manual.

Finding rows that intersect with a date period

This one is mainly a personal reminder but I hope it helps somebody else too. Let’s say you have a table that covers something like currency exchange rates with columns for the start and end dates of the period each rate was applicable. Now you need to list the rates that applied during the year 2009. For some reason this always fazes me and I have to work it out with a diagram. So here’s the recipe so I never have to do that again:

select  *
from    ExchangeRate
where   StartDate &lt;= '31-DEC-2009'
        and EndDate &gt;= '01-JAN-2009'

I Love row_number()

I think the row_number() function is one of the best things that’s been added to SQL for some time. If you’re not familiar with it you really should check it out. It’s very useful where you want to join a set of data to another set detailing the latest/biggest/etc something related to the first set. Imagine top (n) and group by combined. The 5 most recent orders from each customer. The person who has used each repoort the most. You get the picture.
It adds a sequential number column to a resultset. The syntax is

row_number() over ([partition by ...] order by ...)

Order by (required) specifies what order you want the rows numbered in. Partition by (optional) controls when the numbering should start back at 1 again.

Suppose you want to list your customers and the value of each one’s latest order. You could do it like this:

select	c.Name
	, (select top 1 o.OrderValue from Orders o where o.CustomerID = c.CustomerID order by o.OrderDate desc) as LatestOrderValue
from	Customers c

Not too bad in a simple example but it can get really convoluted really quickly in the real world and then you pass that magic cut-off point where the query optimiser suddenly gives you a bad plan.

This is the same thing with row_number():

select	*
from	(
	select	c.Name
		, o.OrderValue as LatestOrderValue
		, row_number() over (partition by o.CustomerID order by o.CustomerID asc, o.OrderDate desc) as RowNum
	from	Customers c
		inner join Orders o on o.CustomerID = c.CustomerID
	) x
where	RowNum = 1

It looks a bit heavy but it clearly expresses the intention and stays readable when the query gets more complex.