An important concept in both statistics and methods is variability. And this analysis conducted by Wells Fargo consultants does a nice job illustrating the concept of variability. They'd been seeing social media complaints about a popular restaurant chain wasn't always providing a full burrito bowl, and decided to look into it.
Chipotle recently got a wave of social media attention from users claiming the fast-and-casual chain was holding back. Analysts at Wells Fargo sought to put the debate to rest and ordered 75 identical burrito bowl orders across eight locations in New York City.
You need to see the results of the burrito bowl analysis, so click here for a figure from Sherwood.com (a business news site).
a) Orient yourself to this figure. What do you see on the y-axis? What's on the x-axis? What does one bar represent?
b) Find the bar on the graph representing the largest/fullest burrito bowl. Find the bar on the graph representing the lightest one.
c) If you were this fast food chain, what would be the shape of the graph that you'd most like to see? What actions would you take in your stores and franchises to achieve this shape?
d) The phenomenon depicted in the graph originally reminded me of inter-rater reliability, but it's really not the same. Why not?
e) How do you think they sampled the restaurants for this analysis? Would it be important to use a random sample, or not?
Selected answers
a) Here, one bar represents a single burrito bowl; its height corresponds to how heavy it was.
b) The lightest bowl is all the way on the right.
c) The restaurant chain probably wants to see a rectangular distribution, in which all of the bars are the same height--this would indicate that every burrito bowl weighed the same.
To get closer to this shape, you'd probably need to do some training of your chefs, setting clearer standards for how many ingredients to use in each burrito bowl.
d) In inter-rater reliability, you're looking to see if two or more raters make consistent observations of a set of people or objects. In other words, you want to make sure that the behavior of a set of targets (say, schoolchildren) is rated the same no matter who rates them. (In this example, Wells Fargo might have wanted to make sure that two analysts who weighed a set of burrito bowls, would come up with exactly the same weights.) In contrast, if you are a restaurant owner, you'd be looking for something we might call "inter-burrito bowl reliability," in which burrito bowls weigh the same no matter who makes them.
That said, if people have poor inter-rater reliability, you'd see too much variability between them. And if burrito chefs have poor inter-bowl reliability, you'd see a graph that looks a lot like the one we're talking about.
e) Sure, random sampling of restaurants would support external validity--it would mean that we could generalize from this data--we'd know that the median and the standard deviation in this sample could be generalized to the population restaurants. However, we probably don't need a random sample to support a proof of concept--that is, it's probably a problem that in this sample of 75 burrito bowls, there's a great deal of variability, even if those 75 bowls and restaurants weren't sampled randomly.
Thanks to Jess Hartnett and Stephen Chew, who shared this one on twitter.