Popular Groceries using Data

09 January 2017

Everyone I know has trouble cooking. In my last blog post I looked at the most common ingredients in 24K recipes. I realized that there was a flaw in my previous post: it treated all recipes equally.

Recipes are not created equal. Some are more popular, for good reason. My recipe data set has information on recipe ratings, which are a good proximate metric for popularity. I used the ratings to compute a popularity ‘score’.

Data to the rescue, once again!

data-superhero

Common Ingredients

Of the ~103 million recipe-ingredient-rating combinations, half of them are in just 25 common ingredients. That’s a smaller number than the 50 ingredients needed in the previous post, probably because popular recipes use more similar ingredients than the average.

Let’s stick with the 50 most common ingredients for now, which cover 61% of 103 million recipe-ingredient-rating combinations.

This is great news for grocery shopping. We can make many popular recipes using the same number of ingredients.

Several common ingredients in the last post aren’t as popular this time around: nutmeg, pecans, potatoes, red bell peppers, thyme, and vinegar.

Conversely, several less common ingredients are more popular now: chili powder, lean beef, margarine, mozzarella cheese, paprika, and chocolate chips.

Pareto’s Pantry

I love the Pareto Principle, the “law of the vital few”. In this case, 79 ingredients out of 11K recipes cover 70% of the the recipe-ingredient-rating combinations in our data set. That’s only 0.68% of the ingredients in the list. A ‘vital few’, indeed.

Note: red means perishable, blue means nonperishable

This is the second of several posts on food and data, and there is more to come. Stay tuned!