One of my favorite lines in the movie Moneyball is when a scout named Grady tries to convince general manager Billy Beane that league veteran David Justice is not worth hiring:

“He's going to really help our season tickets at the beginning of the year…But by June he's not going to be hitting his weight.”

For those less familiar with baseball, the joke here is that batting averages — the percent of at-bats that result in a hit — are quoted in a similar fashion to weights in pounds: if your batting average is .250 (i.e. one out of four at-bats leads to a hit), and you weigh 260lbs, then you don’t quite hit your weight (250 < 260). Since batting averages typically range in the mid .200’s, while weights are in the high 100’s, failing to hit one’s weight typically implies significant underperformance.

But just how much of an underperformance is it to not hit one’s weight? Has it become harder or easier over time? And do we have examples of “good” players who have failed to hit their weight? In this post, I conduct an in-depth analysis of the Moneyball insult with a view toward answering these questions.

Stylized Facts

To begin, it is useful to look at the history of batting averages and player weights in the MLB. I construct both measures using the excellent Lahman baseball database, and take averages across the MLB weighted by plate appearances — that is, giving more leverage to players that have more plate appearances. I focus on the modern era (post 1960), when data on player weights are both more complete and more trustworthy.

Figure 1

Figure 1

Fig. 1 plots the time series averages of both batting average (in decimals$\times$1000) and player weights (in pounds). There are three striking patterns in the data. The first is that league-average BA (red) lies well above the average weight (blue). Indeed, this is basic gist of Grady’s insult: in 2001 (the year the Oakland A’s traded for David Justice), the league average BA was 0.264, while the league-average weight was 194. The “average” player could afford to gain 70 pounds or achieve a hit on 7% fewer at-bats, and still manage to “hit his weight”.

The second pattern is that player weights started increasing dramatically in the early 1990’s. From 1960-1980, the mean average-weight across seasons was 183lbs. Since 2010, it has been 211lbs. In other words, players have become heavier (and taller) over time.

Finally, and slightly less easy to see, batting averages have decreased in the recent years. In the early 2000’s, the league-wide batting average was close to 0.280; in the most recent season, it was 0.242. This is a well-studied trend that reflects a confluence of factors, including increased emphasis on power hitting, the use of defensive shifts, and improved pitching techniques and specialization.

From Averages to Probabilities

On average, MLB batting averages significantly exceed player weights. But we are less interested in the average batting average vs. the average weight as we are in the relative frequency of players who manage to hit their weight. For example, from Fig. 1 alone, it is difficult to get a sense of precisely how common it is to fail to hit one’s weight. And while the recent trends of increasing weights and decreasing batting averages suggest that this phenomenon might have become more probable over time, this is not a necessary consequence of the data, without some additional assumptions.

To see why, consider the case where batting averages and player weights are jointly normally distributed:

$$ \begin{bmatrix} BA \\ Weight \end{bmatrix} \sim N\left( \begin{bmatrix} \mu_B \\ \mu_W \end{bmatrix}, \begin{bmatrix} \sigma_B^2 & \sigma_{BW} \\ \sigma_{BW} & \sigma_W^2 \end{bmatrix} \right) $$

One can then show that the fraction of players that fail to hit their weight, $P(BA<Weight)$, is given by:

$$ \begin{equation} P(BA<Weight) = \Phi\left( \frac{\mu_W - \mu_B}{ \sqrt{\sigma^2_B + \sigma^2_W - 2\sigma_{BW}}} \right) \end{equation} $$

where $\Phi$ is the normal CDF function, $\mu_{B}$ and $\mu_{W}$ are the means of batting average and player weight respectively; $\sigma_{BW}$ is the covariance between batting average and weight; and $\sigma_B^2$ and $\sigma_W^2$ are the variances of batting average and weight.