28 December 2016
There are many ways to use data to answer questions. I love these questions; they reveal the breadth of interest and depths of curiosity that people have.
However, the most common scenario is cringe-inducing.
Friend: So, our [Insert VIP Title] announced that we're doing a Big Data project.
Me: Oh, interesting. What big questions/headaches are you thinking about?
Friend: We're not there yet. First we want to know what we can do with all this data (web logs / network traffic / payroll / customer surveys ).
Me: ...
The purpose of a data science project is to change something. The most important things you can change are usually not in the biggest data set. Looking at the most convenient data set(or the largest) is a drunkard’s search.
Let’s look at an example. Several universities are collecting and analyzing network traffic to identify students who will fail their classes. The idea is to identify when a student’s phone/laptop is still at their dorm and not in class.
Sure, you can do that. A consultant will cheerfully sell you a giant ‘Big Data’ stack when you can use a small data set you already have (student transcripts) to achieve similar results faster, and at 1/1000th the price.
Where you look for information (which streetlight to use) dramatically influences the decisions you can make.
Some other common biases are to focus on what you think of first, over-focus on recent details, avoid emotionally negative information, react differently depending on the ‘frame’ of conversation, and to follow paths familiar to us.
It turns out cognitive bias is everywhere. It is the work of a lifetime to ‘see’ these blind spots, and to compensate for them.
Good Luck!