Regular Expressions in SQL Server

29 December 2016

sql-server-regex logo

Databases store text, and the best way to manipulate text is to use a regular expression (‘regex’). Using regular expressions in SQL queries has been possible in many database engines for decades.

Now you can use regular expressions in SQL Server queries, too. I’ve created an open-source project, sql-server-regex, that lets you run regular expressions in T-SQL queries using scalar and table-valued functions.

match-example

The most common regular expression use cases are supported, including Match, Split, Group Match, and Replace.

You can use it with all versions of SQL Server that support SQL CLRs. That’s every version since SQL Server 2005, except for SQL Azure.

Next Steps

The sql-server-regex code is being tuned for performance and tested for edge-case bugs. If you’d like to help, fork the code on GitHub and get going!

Permalink

The Drunkard's Search

28 December 2016

There are many ways to use data to answer questions. I love these questions; they reveal the breadth of interest and depths of curiosity that people have.

However, the most common scenario is cringe-inducing.

Friend: So, our [Insert VIP Title] announced that we're doing a Big Data project.

Me: Oh, interesting. What big questions/headaches are you thinking about?

Friend: We're not there yet. First we want to know what we can do with all this data (web logs / network traffic / payroll / customer surveys ). 

Me: ...

Cart before horse

The purpose of a data science project is to change something. The most important things you can change are usually not in the biggest data set. Looking at the most convenient data set(or the largest) is a drunkard’s search.

Drunkard's search

Let’s look at an example. Several universities are collecting and analyzing network traffic to identify students who will fail their classes. The idea is to identify when a student’s phone/laptop is still at their dorm and not in class.

Sure, you can do that. A consultant will cheerfully sell you a giant ‘Big Data’ stack when you can use a small data set you already have (student transcripts) to achieve similar results faster, and at 1/1000th the price.

Your Brain Lies to You

Where you look for information (which streetlight to use) dramatically influences the decisions you can make.

Some other common biases are to focus on what you think of first, over-focus on recent details, avoid emotionally negative information, react differently depending on the ‘frame’ of conversation, and to follow paths familiar to us.

It turns out cognitive bias is everywhere. It is the work of a lifetime to ‘see’ these blind spots, and to compensate for them.

Good Luck!

Permalink