Exercise Using Data

18 April 2014

There are many choices when looking at nutrition, exercise and medicine. Most people in the US and Europe suffer, pay, and die from chronic ‘lifestyle diseases.’

One of the best ways to be healthy is to be moderately active. That’s not news, and it raises a confusing question:

Why do many people suffer from preventable diseases when there are many ways to avoid them?

I suspect having so many choices is a challenge on its own. It’s easy to pick between 3-5 options; it’s impossibly hard to pick from thousands. Being healthy is a data problem.

Data can make it easy to decide what to do right now to be active.


People will often start a new sport and then stop. Why?

Researchers asked this question, and found several reasons:

  • It’s expensive
  • It takes a lot of time
  • The weather is bad
  • It doesn’t feel like fun
  • It feels embarrassing.

3 of these reasons are external (time, price, weather), and the other two are mental (fun, embarrassment).

Finding exercises that work for each person isn’t a marathon, it’s a maze.


People avoid expensive exercise. After all, the median US income is ~$51,000 per household; it’s only logical to worry about money.

However, “expensive exercise” is a matter of perspective. Exercise is far cheaper than $6,000/year for heart disease, or $11,700 for diabetes.


Exercise takes time, and there are no shortcuts. You can’t fit 30 minutes of exercise into 5 minutes.

There are ways to integrate exercise into a busy schedule. People often exercise immediately before or after their work day. Some sports require no prep or travel time. Others can be done during small breaks.

“Time wasted on exercise” is again a matter of perspective. The time isn’t wasted, it’s traded. What do we trade this time for?

Quite a bit, it turns out. Active people live longer, sleep better, have a better quality of life, better brain function, and suffer less from depression.

That’s a great for less than 4 hours/week. Even better, many quick exercises are also inexpensive.


The best sports are the ones suited for each time and place. Nobody likes to run when it’s snowing outside. There are also many, many ways to be active indoors.

Know Thyself

Humans avoid behavior that isn’t rewarding or seems disappointing. Our largest hurdles are often between our ears. A common mental barrier is social. We each react differently to social situations:

  • Do you push yourself more when your results are visible to your family & friends? Or does that make you feel isolated?
  • Do you enjoy trying new activities? Or do you enjoy having a few familiar exercises?
  • Does a competitive sport make you try harder, or give up faster?

It’s important to know this, because it’s important to develop exercise as a habit. After all, we each have a finite amount of willpower each day.

The Power of Language

We think and feel in narratives, in stories. I use this to manipulate myself into exercising.

I don’t use sports terms when exercising because I was the slow kid in gym class, and don’t enjoy remembering embarrassing moments. I think of exercise as leveling up.

I love to cycle because I see it as ground-effect flight. I enjoy snorkeling by imagining it is neutral-buoyancy meditation. I enjoy laser tag as a way to be active and invent creative tactics.

After all, what kind of exercise we do is far less important than just being active.

Go Forth, and Be Healthy

The Road Through Strata - Thursday

13 February 2014

This was my last day at the Strata conference.


Buzzwords are the new stopwords.

The vast majority of the keynotes were nothing but buzzwords, again. The audience reacted logically; they ignored the presenters and checked email, Facebook, and Twitter. One guy was trading stocks.

A notable exception was Matei Zaharia’s presentation on Spark. Spark is one of the most popular big data projects around, and Matei presented real stories and details.

James Burke Keynote

The best keynote was James Burke’s. He was illuminating, funny, and persuasive. His argument was history and discovery are messy and full of unexpected change.

Some memorable quotes:

  • “Information causes change. If it doesn’t, it isn’t information” - Claude Shannon
  • “Anybody could have done that. I just got there first”
  • “The number of potential connections in a brain is greater than the number of .atoms in the known universe. You have one of these. What are you doing with it?”
  • “We are constrained to predict the future from the past. That’s all we have”
Thinking in Systems

Discovery and progress happens between disciplines, not through specialization. I see this all the time in software teams. The most creative work comes from groups of disparate people working together.

Our society rewards specialists more than generalists. The result is a larger number of narrower niches; we know more and more about less and less. Broad thinkers are desperately needed but not valued.

I enjoy thinking in systems. I was taught and raised this way, fortunately. Being able to see the trees and the forest comes in very handy. I encourage everyone to try this.

You Never Expect the Spanish Inquisition What Really Happens

Technical change usually doesn’t cause problems directly. Its biggest headaches are predominantly due to side effects.

Facebook is a great example. Posting your personal info to friends isn’t controversial. What’s controversial is when none of it’s private anymore and visible to employers, parents, random strangers, and stalkers.

Society is reactionary to scientific, technical and industrial change. It’s important to be mindful of that.

Expressing Yourself in R

One great session was Hadley Wickham’s session on R. R is one of the most popular languages for data analysis, and one I use daily.

One of Hadley’s points is that it is good to code when doing analysis.

  • It’s reproducible
  • It helps with automation
  • Code, as text, is a precise form of communication.

The two projects Hadley is working on are dplyr and ggviz.

Dplyr is pretty amazing; it’s a way to create query-like operations in R and have them work against data frames, data cubes, or even backends like RDBMS or BigQuery. I’m reminded of LINQ and lambda expressions.

One of the beautiful parts of dplyr is that it’s declarative. You code what you want done, but not exactly how. Anyone familiar with SQL will feel right at home.

Ggviz is the other package Hadley is working on. It’s the update to ggplot2, and produces interactive visualizations using HTML, JavaScript, and SVG. It is built using Vega and Shiny.

IPython Notebooks

IPython notebooks are the de facto way to share data analysis, for several reasons:

  • They can self-contain data, code, and output graphics.
  • They are inherently reproducible.
  • They support many languages.

Brian Granger gave a great series of demos about the upcoming IPython 2, which is going to be even more user-friendly. I’m looking forward to it.

Data for Good

One of my favorite sessions, this was a panel discussion between Drew Conway, Jake Porway, Rayid Ghani, and Elena Eneva. They were discussing how data science can be used for social good.

The key takeaways:

  • The most important activity is listening. You’re only valuable when you solve real problems.
  • Not all problems are data problems.
  • You don’t need to build a complicated model. Simple models often go far.
  • More than 1/3 of the audience had volunteered with a nonprofit at some point. It was a very civic-minded audience
  • A lot of the best presenters were in the audience. That was oddly heartening.
  • The turnover for data scientists is high. Consider social impact the next time you’re looking for a gig.

Closing Thoughts

Talking to dozens of people and attending many sessions led me to some unexpected conclusions…

Breakthroughs happen in 3 ways:

  1. Designing a new algorithm in statistics or machine learning
  2. Applying an existing algorithm in stats/ML to a new kind of system (bigger scale or a new language)
  3. Applying stats/ML to a problem/industry that hasn’t seen it before.

Those are in descending order of difficulty.

Data Integration is not a solved problem

Chris Re mentioned a study done for various CTOs. The result was stark: if you’re a CTO faced with a big integration challenge, your best course of action is to quit.

People are messy

It seems like data professionals have a bit of OCD. We like things to be clean and orderly.

However, people are messy. They come in all shapes and sizes, with biases, irrational behavior and communication headaches. We have to accept people as they are or face a constant impedance mismatch with the very people we are supposed to serve.

Work on big problems

I met some amazing data scientists over the past few days. Most of them will never be famous, even if they’re exceptionally smart.

They work on boring projects. Nobody cares if a brilliant data scientist works on online advertising, or a new kind of social media platform, or becomes yet another high-finance quant.

However, people do notice when the data scientist who changes how a city does building inspections. What matters is relative impact.

This isn’t a new idea. Michael Lewis’ Moneyball was about more than stats coming into baseball; it was a beautiful example that quantitative skill can have a dramatic impact in areas where it doesn’t currently exist. For example:

  • Construction
  • Agriculture
  • Fashion
  • Music
  • Art
  • Consumer advice
  • Education
  • Government
  • Campaign finance
  • 99% of the nonprofits in the world

Want to change the world? Find out where all the money goes in education (it’s not to teachers). Build a platform to crowdsource finance for farmers and remove all the middlemen. Figure out how music affects the brain.

Build big things.