Data Scientist - Part 0

18 June 2012

I have just been accepted to the Data Science certificate program at the University of Washington. This is prep for my ideal job. I love to analyze data and solve hard problems. Lucky for me, that's in demand.

Data scientists have 4 key skills:

  • Statistics
  • Domain Expertise
  • Programming
  • Storytelling ability

These skills complement each other. I am sure you've been in situations where the key person was someone who knew math, the business, and the data. They are the linchpins of software and IT teams.

Classes start this fall. I will be blogging about the experience here, at DevNambi.com. I know many DBAs and developers who would be good data scientists. I know even more good statisticians and domain experts. What they all have in common is: they are ready for more, but don't feel ready.

I don't feel ready either. I am not willing to wait until I am...so here I go!

Permalink

T-SQL Tuesday 31: Logging

12 June 2012

It is T-SQL Tuesday once again, hosted by Aaron Nelson (blog / @SqlVariant) . The theme for this T-SQL Tuesday is logging.

Why Logs?

Databases encounter performance problems. All software does. Logs are a good way to isolate why servers, applications, and queries have issues. The life of a DBA (and developer) is easier with logs.

However, getting the logs can be a pain. Worse, we never have enough data when we need it. It can be hard justifying an expensive monitoring app like SCOM or Foglight. So I built a utility using Powershell that does most of the work, for free.

Save-SqlHistory

Save-SqlHistory.ps1 is a single Powershell script with some examples and documentation. It is open-source and free on GitHub. It works very simply:


PS > Save-SqlHistory
 -ServerListCsv <a csv file of servers/databases to query>
 -QueryListCsv <a csv file of script files to run, and the list  of .csv files to save the results to>

If you gave the Save-SqlHistory function a list of 1000 databases to query, and 40 monitoring queries, it would run all 40 queries against all 1000 databases. It logs the results to 40 csv files, one per query.

Run Save-SqlHistory in a scheduled task or SQL Agent job, and you have an instant monitoring tool.

Getting Started

The best way to start is to use queries that are return useful information. I like to use the fabulous DMV queries by Glenn Berry (blog@GlennAlanBerry). The example queries in Save-SqlHistory are based largely on his work, and work with SQL 2012.

To get started, downloade the code. The two CSV files to change are Queries.csv and Servers.csv, and they are intuitive. Run the script against a test database somewhere and look at the results. Next, think about what data to collect.

What information do you wish you had? Stats about index fragmentation? Top queries by CPU usage? Page life expectancy? The runtime of each backup? All you need to do is find/write queries that return this and add their locations to the Queries.csv file. Along with Glenn's work, I would highly recommend Brent Ozar's (b / tsp_Blitz script and Adam Machanic's (b / tsp_WhoIsActive script.

Run these for a week or three, analyze the CSV files in a handy tool like Excel or Tableau, and you will have a far better understanding of what your databases are  doing.

What's Next?

I will be refining and adding features to Save-SqlHistory. First on the list is multithreading, so querying 1,000 databases doesn't take too long. Next on the list is whatever you suggest, just contact me if you have an idea.

Happy Logging!

Permalink