Comparing strings regardless of collation

10 February 2011

How do you know if one string value is the same as another? The obvious way is an equality check, such as:

select case when 'lowercase'='lowercase'
        then 1 else 0 end as IsIdentical

However, there are a few limitations there, especially if you use the default SQL Server collation, which is case-insensitive. For example:

select case when 'lowercase'='LoWeRcAsE'
        then 1 else 0 end as IsIdentical

That will return 1, saying the two strings are identical. According to the collation rules, they are. Practically, they aren't.

The workaround: Use varbinary

The workaround is to convert the values to varbinary() before comparing them, because the binary representation of these values is different, regardless of collation. This works with XML as well.

Here's the same example, with a varbinary conversion:

select case when convert(varbinary,'lowercase')
                 =convert(varbinary,'LoWeRcAsE')
            then 1 else 0 end as IsIdentical

This now returns 0, saying the values are in fact different. Hoorah!

The Catch: CPU Time

The one caveat is that you pay a CPU performance penalty for doing convert() calls. In the example above, a comparison using 50K rows took 16 ms of CPU time normally, and 63ms of CPU time with the varbinary conversion.

Permalink

No Perf for You! Read-only Databases stops Auto Create Stats

13 January 2011

Read-only databases cannot create new statistics

A while back, I was working with Kendra Little (@KendraLittleblog), and we were using log shipping to make warm standby copies of reporting databases. The warm standby databases were marked as READ_ONLY. However, this meant the query optimization process couldn’t create new statistics (MSDN) when users ran ad-hoc SELECT statements.

Query Plans With Missing Statistics

Using a one-table unit test, we are able to see that a read-only database cannot create stats. The estimating query plan shows a warning symbol, symbolizing that the query optimizer cannot create statistics on necessary columns.

Click on the image to see the warning in detail and actual query plan

Query Plans With Manually-Created Statistics

Let’s suppose that there are stats on every single column of every single table. Then, the query optimizer can do a better job of producing good plans. Using the query in our one-table unit test now produces a plan without warnings.

Click on the image to see the actual query plan

MANUALLY CREATED STATISTICS SCRIPT AND TEST

My solution is to manually create statistics on every column where they don’t already exist. Below is an example script and a unit test for it.

Script: Manually-Create-Stats.sql Unit Test:  Manually-Create-Stats-Test.sql

Performance results

The performance results are pretty clear. Using statistics in a simple, one-table unit test creates a slight performance boost. I have seen gains of 10-30% with many reporting queries; the effect of missing statistics as you join more tables together, use larger tables, and use columns for filtering.

Permalink