Understanding and Describing Groups with Averages

Mean, Median & Mode

Mar 02, 2023

Most people learn about averages in math class when they are young. We learn that there is mean, median and mode. There are all supposedly ways to understand and summarize a group of… anything (but mostly just a list of numbers in the class setting). Colloquially, we all think of mean when we hear average. Often times it is actually median or mode that we are intuitively experiencing. This disconnect between what we experience and the associated calculations we do in our head can lead to confusion and results that are far from what we expect.

The mean (average) of a data set is found by adding all numbers in the data set and then dividing by the number of values in the set. The median is the middle value when a data set is ordered from least to greatest. The mode is the number that occurs most often in a data set. Source.

It’s important to remember that the reason we are doing these average analyses is to better understand a set of anything by summarizing it.

I was sitting with a friend yesterday (shoutout Andrew C) and working through some examples of where we may say one thing when we really mean another (no pun intended). A few that came up:

The Average Hamburger

What we likely want to say here is median: this hamburger we are currently eating is better than half of the hamburgers that we have eaten and worse than the other half of the hamburgers we have eaten. Because this is not objectively quantifiable, we are instead relating hamburgers to each other and then ranking them and placing the current one in that set. If we attempted to quantify the experience and rank all of our hamburger experiences from one to ten, we could then, in theory, take a mean. But again, when we do the calculations in our head, we are relating to a set of experiences that we are comparing and ranking, then placing the current experience in the set - a median.

The Average Car Driving By

Sitting in a café, we watched hundred of cars drive by on the adjacent street. When we say “the average car,” there are many different things we could be referring to, but in this case, we are most likely relating to the mode: the most commonly occurring vehicle that drove by. In this specific situation, it was a tie between the Kia Picanto and the Toyota CH-R. Using a mode here allows us to identify the representative car(s) of the set and thereby summarize it. The mode works here because there is a limited set, a challenge of quantifying the cars and a desire to identify a specific car as the average. If instead we broken the cars down into various characteristics: color, number of doors, continent of origin, etc., we could instead have said that the average car was a white, 4-door hatchback, made in Asia - an amalgamation of medians. This is also useful, but harder to conceptualize and experience, hence the mode is very helpful here to understand and summarize the broader set with specific car. This approach with a mean would be a car with 3.2 doors, a color that doesn’t exist and geographic coordinates for origin - maybe interesting, but not really useful or applicable.

The Average Investment in an Early Stage Venture Portfolio

In this case, we really do want to understand the mean, but we can only calculate it after the fact - up to 10 years down the line. In the meantime, portfolio investments tend to be evaluated as a median. Given that many (good) early stage portfolios have a median and mode of 0, it’s critical to remember that mean is really what matters. It’s all about the total return and history shows that the most performant portfolios are ones that have a single massive standout winner (50-200x), a handful (~20% of the fund) of decent to solid outcomes (2-5x), and then the remaining ~75% is effectively a write-off. Easy math to do, much harder math to live. This is where medians come back into the picture. Along the way, success is tracking the entire portfolio in a positive direction - in order to get to that elusive 100x, you need to first survive, then get through various other steps as a company. As a result, we need to optimize for what we can understand in the moment, which is median progress of each individual company. (This is also why fund vintages matter so much when benchmarking). Put another way, it would not be reasonable to expect great returns if, after 1 year, 90% of a portfolio has already gone to 0, but that doesn’t mean it’s impossible. Having watched a lot of soccer lately, it’s pretty clear that good shots don’t count but they have a very high correlation with goals scored and games won. That being said, there is always room for outlier situations. Being able to understand the current median and how it maps to the future mean is the critical exercise. For example, if every company in a seed portfolio raises a Series A from a global multistage fund, that means the median is currently high (relative performance is good), which we hope translates to a high mean outcome down the road, but there is also a very real chance that it may not.

This is all to say that the math matters. Our intuitive understanding of statistics may need to be reevaluated as we try to understand and summarize the world around us. My view is that simply calling out these small differences and going through the thought exercises sharpens the intuition and makes us better critical thinkers.

SaaS Engineering

Discussion about this post