Monthly Archives: March 2011

Dates & times: it’s worse than I thought

The BBC has a rather sobering article about time zones here: Fascinating stuff that explains how time zones work and how dates and times came to be so complicated.
Next time somebody gives you an earful about how your query gave them the wrong data because of some subtlety about the time portion of a datetime, a time zone difference, or daylight saving time, you could always point them to this. It might make you look less bad. (I’m just saying.)

Transaction log growth in the model database

I had an odd one yesterday. I was sorting out internal fragmentation in our transaction logs (see Kim Tripp’s blog and I noticed that on our core production server the model database transaction log was over 200MB and had 163 virtual log files (VLFs). Clearly something was logging transactions in there and causing repeated log file growth. But what? Stumped, I asked Ask SQL Server Central and found out, thanks to ‘meltondba’. (I suspect that’s not his real name.) The model database defaults to the full recovery model and we’re backing up the system databases every hour on this server. As explained by CSS here, a full backup is recorded in the transaction log. And of course if you stay in full recovery and never clear out the transaction log it will keep on growing.
What’s the problem with this? Well, there are several. The model database is the template from which new databases are created. I created a new database on this server (create database [dw]) and it had a 3MB data file and a 200MB log file with 3 VLFs. The disk space for the log file and backups, and the time required for backups. OK it’s not major but it all adds up. If enough ‘small stuff’ like this accumulates it can really dent the performance of your system.
What should you do? I’d suggest changing the model database to the simple recovery model. You don’t want new databases to be in full recovery unless you’re going to actively manage the logs, and you don’t want to bother with that unless you need point in time recovery. Backup the log, shrink it and set it to a small size, say 3MB. Kim Tripp’s blog post shows all the code for this. When you create a new database you’ll want to set the file sizes and growth characteristics anyway.

Microsoft Business Intelligence Seminar 2011

I was lucky enough to attend the maiden presentation of this at Microsoft Reading yesterday. It was pretty gripping stuff not only because of what was said but also because of what could only be hinted at. Here’s what I took away from the day. (Disclaimer: I’m not a BI guru, just a reasonably experienced BI developer, so I may have misunderstood or misinterpreted a few things. Particularly when so much of the talk was about the vision and subtle hints of what is coming. Please comment if you think I’ve got anything wrong. I’m also not going to even try to cover Master Data Services as I struggled to imagine how you would actually use it.)
I was a bit worried when I learned that the whole day was going to be presented by one guy but Rafal Lukawiecki is a very engaging speaker. He’s going to be presenting this about 20 times around the world over the coming months. If you get a chance to hear him speak, I say go for it. No doubt some of the hints will become clearer as Denali gets closer to RTM.
Firstly, things are definitely happening in the SQL Server Reporting and BI world. Traditionally IT would build a data warehouse, then cubes on top of that, and then publish them in a structured and controlled way. But, just as with many IT projects in general, by the time it’s finished the business has moved on and the system no longer meets their requirements. This not sustainable and something more agile is needed but there has to be some control. Apparently we’re going to be hearing the catchphrase ‘Balancing agility with control’ a lot.
More users want more access to more data. Can they define what they want? Of course not, but they’ll recognise it when they see it. It’s estimated that only 28% of potential BI users have meaningful access to the data they need, so there is a real pent-up demand. The answer looks like: give them some self-service tools so they can experiment and see what works, and then IT can help to support the results. It’s estimated that 32% of Excel users are comfortable with its analysis tools such as pivot tables. It’s the power user’s preferred tool. Why fight it? That’s why PowerPivot is an Excel add-in and that’s why they released a Data Mining add-in for it as well.
It does appear that the strategy is going to be to use Reporting Services (in SharePoint mode), PowerPivot, and possibly something new (smiles and hints but no details) to create reports and explore data. Everything will be published and managed in SharePoint which gives users the ability to mash-up, share and socialise what they’ve found out. SharePoint also gives IT tools to understand what people are looking at and where to concentrate effort. If PowerPivot report X becomes widely used, it’s time to check that it shows what they think it does and perhaps get it a bit more under central control. There was more SharePoint detail that went slightly over my head regarding where Excel Services and Excel Web Application fit in, the differences between them, and the suggestion that it is likely they will one day become one (but not in the immediate future).
That basic pattern is set to be expanded upon by further exploiting Vertipaq (the columnar indexing engine that enables PowerPivot to store and process a lot of data fast and in a small memory footprint) to provide scalability ‘from the desktop to the data centre’, and some yet to be detailed advances in ‘frictionless deployment’ (part of which is about making the difference between local and the cloud pretty much irrelevant).
Excel looks like becoming Microsoft’s primary BI client. It already has:
  • the ability to consume cubes
  • strong visualisation tools
  • slicers (which are part of Excel not PowerPivot)
  • a data mining add-in
  • PowerPivot
A major hurdle for self-service BI is presenting the data in a consumable format. You can’t just give users PowerPivot and a server with a copy of the OLTP database(s). Building cubes is labour intensive and doesn’t always give the user what they need. This is where the BI Semantic Model (BISM) comes in. I gather it’s a layer of metadata you define that can combine multiple data sources (and types of data source) into a clear ‘interface’ that users can work with. It comes with a new query language called DAX. SSAS cubes are unlikely to go away overnight because, with their pre-calculated results, they are still the most efficient way to work with really big data sets.
A few other random titbits that came up: