The Road Through Strata - Day 1

11 February 2014

This was my first day at Strata. Here’s what I found.

The Good

The Bad

I made the very mistake I warned against yesterday: I went to sessions based on the topic, and not the quality of the speaker.

I missed out on amazing sessions by John Foreman, Jeff Heer, and Carlos Guestrin.

I’ll be more selective about my sessions for the next couple days.

The Ugly

I asked a dozen people, from a variety of industries, what they did for a living. I also asked how ensured their work wasn’t being used to make more profit in an unethical way.

Nobody had an answer to the latter question. I’m fervently hoping this is due to my low sample size and not broadly representative of the data analytics community.

Meeting People

In addition to my ethical survey I had the chance to talk to people from a D.C. startup, the Lawrence Berkeley Lab, Microsoft Research, Netflix, Etsy, Vertafore, the Department of Defense, and Sage Bionetworks. Everyone was ridiculously smart, and most of them were data scientists.

I came prepared with a list of questions:

Questions

I found some common elements:

Data-Intensive Everything

The range of subject areas covered was immense.

Data-Intensive Physics

Data-Intensive Medicine

Data-Intensive Cybersecurity

Data-Intensive IT

Data-Intensive Cruft

There were some boring problems discussed…

Luckily, I was saved by the amount of discussion on data-intensive genomics…

Data-Intensive Genomics

On Monday night I attended a Big Data Science meetup, and the best presenter was Frank Nothaft, a grad student at UC Berkeley, working on large-scale genomics.

Why?

The societal benefit from this work could be immense. I understand why he was so cheerful when he talked.

How

I was impressed by the quality of thought put into the project:

There’s a lot more detail, available on the website, the in-depth research paper, or the entirely-public codebase.

Deep Neural Networks

Deep neural networks have gotten a lot of press lately, mostly because they can work well on problems most ML algorithms struggle with (image recognition, speech recognition, machine translation).

Ilya Sutskever gave a good, useful intro into deep neural networks. ‘Deep’ in this case refers to 8-10 layer of neurons ‘hidden’ between the input and output layers. A traditional neural net has 1-2 hidden layers.

The reasoning to look at 10 layers is great. Humans can do a variety of things in 0.1 seconds. However, neurons are pretty slow; they can only fire about 100/second. Therefore a human task that happens in under 0.1 seconds takes only 10 layers of neurons to do.

One of the big problems behind neural networks is that they require a lot of data to train at this depth. They are also not intuitive to tune; Ilya didn’t go over that at all in his session. It was a good 101-level talk.

“Give me explainability or give me depth”

For more, I’d recommend the Neural Networks Blog.

Open Reception

The reception afterwards was mostly dull. The food was good, and free. The vendors, however, were spreading their own particular flavors of FUD.

I asked 11 different vendors for the data to back up claims behind their value propositions. The responses were a comic mix of dumbfounded expressions, misdirection, and spin. It’s hilarious that companies selling to data and analysis professionals don’t use data to back up their marketing claims.

Tomorrow…

I find myself excited about the potential to meet awesome people and learn amazing things.

I’m looking forward to tomorrow.