Data Scientist, Part .1

24 July 2012

I want to be a data scientist. I want to learn in the most efficient way. I want to learn from the best.

One of the foremost data scientists is Hilary Mason, (blog@hmason). She has a tremendous ability to make difficult concepts easy to understand. See: An Introduction to Machine Learning in 30 Minutes.

What did I learn from that video? This can be fun!

In addition to learn the necessary math, I should use the most appropriate tools. A little sleuthing found a survey of the data scientists competing at

The winner? R , the open-source tool for statistical analysis.

The other tool to learn? Python, due to its ease of use and large number of libraries.

Combined, those two tools make it easy to find, consume, and analyze data from many places. Next up: math.


Job Satisfaction Analysis

27 June 2012

I started my new job at the University of Washington in May 2012, and 2 weeks after leaving Microsoft. I have been increasingly productive, content, and lighter . That makes me ask: Why? What is so different? Let’s look, using my favorite weapon of choice: data!

Measuring personal satisfaction accurately is impossibly hard. So let’s focus on how time is spent, and consider the enjoyment factor for different tasks. The assumption is that overall satisfaction is the sum of satisfaction over various hours. The more enjoyable each hour, the happier we will be.

We each enjoy different activities. Some people enjoy meetings and PowerPoint, and some people prefer solitude and writing code. In an ideal situation our job is full of tasks we enjoy doing, and mostly avoids things we don’t like.

Being a developer, my job is composed of many different activities: commuting, meetings, code reviews, writing code, trainings, research. I gave each my common tasks an ‘enjoyment’ rating, from 0 to 13, indicating how much fun I found in the task. Meetings aren’t much fun. Writing code is enjoyable. Time spent with family and friends is fantastic. The result was an Excel file.

I find data visualization is a great aid, so I put together a Tableau dashboard. First, let’s look at how much time I spend doing various tasks at my new job, compared to the old one.

The color-coding indicates how much I enjoy the task. Ask you can see, two big shifts happened. The first is a drop in tasks I don’t enjoy: meetings and email. The second is a spike in tasks I do enjoy. The other, more subtle change was a reduction of tasks I am indifferent to, such as commuting

However, this analysis is about comparison, not composition. To see the before-and-after effect, let’s try a different chart.

The height of the line is the cumulative ‘score’, which consists of [Time Spent] * [Enjoyment]. The thickness of the line is the time spent. The effect is quite dramatic. Now I have a better answer of why I feel better in my new job. I would encourage my fellow developers and data professionals to do their own analysis in similar situations. The results can be illuminating.