Database Performance Is Like Cake: The Layer Model

23 March 2011

Do you have a performance problem with databases? I bet you do. Do you ever wonder about what to do next to fix it? I bet you have.

The best way to improve your database's performance depends on your resources, your skills, and what you've done already. Also, the bigger the problem, the riskier the solution.

The Layer Model

I've come up with The Layer Model to do performance tuning. Systems have layers, or tiers, with different problems and solutions fitting into each layer. The risks and rewards are different in each layer. Most importantly, once you have optimized one layer, you should go up a layer to improve performance.

If you think about your system this way, you'll have a reliable process to improve the performance of your applications.

Permalink

T-SQL Tuesday #16: Statistics using CLR Aggregates

08 March 2011

[![T-SQL Tuesday logo

It is T-SQL Tuesday again, brought to you by Jes Schultz Borland (@Grrl_Geek). The theme for this T-SQL Tuesday is Aggregations.

For my T-SQL Tuesday post, I've written up some CLR aggregation functions that can do statistical analysis. I've built three of the most common: median(), percentile(), and covariance().

Here's my code in a ZIP file: MathCLR - By Dev Nambi.

Median

Median finds the middle number in a set of data. If you sorted the numbers and then picked the exact middle value, that is the median.

I'm using the same sample data set from my last post, which is about the runtime of a backup job over the past week. As we can see here, the median runtime is 33 minutes.

Percentile

Percentile finds the number under which X percent of the values fall. Using our sample data set, we can find that 80% of the backup runtimes are under 480 minutes.

The way to use the function is to pass in the data set as the first parameter, and the percentile you're looking for as the second.

Covariance

Covariance is less well known. It shows the relationship between two sets of numbers. It finds the  correlations between two sets of numbers.

Using our data set, let's see if there's a relationship between backup time and the amount of data backed up. As we can see below, the covariance is negative. That means that either the backups get faster as they get larger (unlikely), or something else is at work.

How about a relationship between disk queue length on the backup file server and the runtime? Aha! That shows a very high covariance, suggesting that the disks on the backup file server are the bottleneck. Either the backup server is too slow, or else some other process is slowing it down.

Good luck with your analysis!

Permalink