How do you know if one string value is the same as another? The obvious way is an equality check, such as:
select case when 'lowercase'='lowercase' then 1 else 0 end as IsIdentical
However, there are a few limitations there, especially if you use the default SQL Server collation, which is case-insensitive. For example:
select case when 'lowercase'='LoWeRcAsE' then 1 else 0 end as IsIdentical
That will return 1, saying the two strings are identical. According to the collation rules, they are. Practically, they aren't.
The workaround is to convert the values to varbinary() before comparing them, because the binary representation of these values is different, regardless of collation. This works with XML as well.
Here's the same example, with a varbinary conversion:
select case when convert(varbinary,'lowercase') =convert(varbinary,'LoWeRcAsE') then 1 else 0 end as IsIdentical
This now returns 0, saying the values are in fact different. Hoorah!
The one caveat is that you pay a CPU performance penalty for doing convert() calls. In the example above, a comparison using 50K rows took 16 ms of CPU time normally, and 63ms of CPU time with the varbinary conversion.Permalink
A while back, I was working with Kendra Little (@KendraLittle, blog), and we were using log shipping to make warm standby copies of reporting databases. The warm standby databases were marked as READ_ONLY. However, this meant the query optimization process couldn’t create new statistics (MSDN) when users ran ad-hoc SELECT statements.
Using a one-table unit test, we are able to see that a read-only database cannot create stats. The estimating query plan shows a warning symbol, symbolizing that the query optimizer cannot create statistics on necessary columns.
Click on the image to see the warning in detail and actual query plan
Let’s suppose that there are stats on every single column of every single table. Then, the query optimizer can do a better job of producing good plans. Using the query in our one-table unit test now produces a plan without warnings.
Click on the image to see the actual query plan
My solution is to manually create statistics on every column where they don’t already exist. Below is an example script and a unit test for it.
The performance results are pretty clear. Using statistics in a simple, one-table unit test creates a slight performance boost. I have seen gains of 10-30% with many reporting queries; the effect of missing statistics as you join more tables together, use larger tables, and use columns for filtering.Permalink