Share this story

Mean Statistics: How to Avoid Misleading Data Results

Whether you are trying to gain insights about your employees, customers, operations within your business, or other data based research, there is a pretty good chance that at some point in your career you have been presented with a bar chart similar to the one above.

Maybe the bars represented the levels of employee satisfaction across business units, product preferences, or maybe business performance among various geographic locations.

These types of graphs are widely used across industries, and tend to be helpful in shedding quick insights. Yet, they can be some of the most misleading graphs out there…IF not complemented by other key statistics.

Why? Let’s start by thinking about the meaning of each bar. Suppose the above bar chart represents product testing ratings of four new smoothie flavors.  A market research company asks a sample of 100 individuals to rate the four flavors on a scale of 1 to 10. Overall, the purple flavor gets a 9, orange a 5, yellow a 3 and the green an 8. The market research results are presented and Juice Co. concludes that purple is the top favorite followed by the green, where participants had no particular strong preference for the orange, and disliked yellow.

Now let’s think about the orange for a second. For simplicity’s sake let us assume that the market research company sampled 10 individuals. One scenario to get an average of 5 could be: 5, 5, 4, 5, 6, 5, 3, 5, 7, 5. A second scenario to get an average of 5 for the orange could be: 10, 10, 10, 10, 1, 1, 2, 2, 2, 2.

Notice that while the average of both scenarios is still 5, the distribution of the data leads to a very different conclusion. In the first scenario the conclusion is that for the most part people don’t have a strong preference for the orange flavor at all, while in the second scenario the conclusion is that orange is divisive. People either hate it or love it.

The solution to avoiding misleading results is actually pretty simple, yet not widely practiced. You always want to make sure that you ask for all measures of Central Tendency. You’ve heard of them before, but here they are again in all of their glory: mean, mode, and median. All three summarize an entire distribution of scores by describing the most common score (the mode), the score of the middle case (the median), and the average score (the mean) of that distribution.  Here are some basic characteristics about each one of them:

  1. The mode is useful when you are interested in the most common score and when you are working with a limited number of variables. The mode becomes less meaningful when the distribution has many
  2. The median is always at the exact center of a distribution of scores. Half of the cases are higher and half of the cases are lower
  3. The mean as you well know is the average score of a distribution, and we’re pretty sure no further explanation is needed

The mean, which gets reported the most can be misleading if the distribution is skewed. It is also affected by every score in the distribution, while the mode and the median not so much.

So, next time you get presented with bar charts, make sure that you know more about the central tendency of the distribution (mean, mode, and median). You should also know something about the dispersion of the distribution, such as the standard deviation and sample size. We will discuss these fun concepts and their applications to your business in a later post. Statistics shouldn’t be scary. They can and should be used in everyday business decisions.

More stories

Automating Curiosity