Mere Mortals: Retract this article

It seems I can’t stop writing about Bill Barnwell (here, here, and here) and his article, Mere Mortals, which presents “evidence” that baseball players who played during the years 1959 through 1988 have a higher mortality rate than football players.  it seemed immediately obvious to me when I read the article that the two groups he was comparing were not directly comparable, and it seemed likely that the difference in mortality rates was probably due to differences in ages between the cohort, rather than the sport itself.  Up to this point, however, I was just making well educated guesses as to how to explain the results.

So, I went and collected data myself and ran a quick analysis to check.  The findings?  When age is added to a model predicting death, the effect of the sport on mortality rate completely disappears.  This means that if two players are the exact same age and one played professional football and the other played professional baseball for at least five years and one of those years was between 1959 and 1988 there is no evidence that the football player nor the baseball player is more likely to be deceased.  

Data collection

Football

Using R, I scraped http://www.football-almanac.com to get a list of players names.  I then used this list of players names to scrape http://www.pro-football-reference.com to get information about each players date of birth, age at death (if they have died), the start and end years of their careers, height, and weight.  (A note about a shortcoming of my data collection for football: If a player had the same name as another player, I only collected one. I believe this is a small issue and will not affect the overall results, but it is worth noting.)  In total, the football player data set had 14, 396 players.

Baseball

Using R, I scraped http://www.baseball-almanac.com to get a list of players names.  I then used this list of players names to scrape http://www.baseballl-reference.com to get information about each players date of birth, age at death (if they have died), the start and end years of their careers, height, and weight.  For baseball players, I was able to collect all players, including those who had the same name as another player.  In total, the baseball player data set had 5,587 players.

Time Frame

Both the baseball and football data sets were whittled down to only consider players who played at least five seasons and any of those seasons fell between 1959 and 1988.  (These are slightly different standards than in the Barnwell article, but, again, the larger point should remain the same.)  This left  2,436 football players and 967 baseball players.  The mean age of baseball players in my sample was 64.19 while the mean age of football players was 60.91.  (Barnwell tweeted that the difference in ages between his two groups, which were defined slightly differently, was about 24 months.)  The mean ages of my two groups is significantly different with a p-value of <0.00000000000001.  That’s a big deal.

The distributions of the ages of the football and baseball players is displayed below using a density estimator in R.  You’ll notice that there are many more young players in the football group than in the baseball group.  This indicates that mortality rates cannot be compared directly to one another as is done in the Barnwell article.

Think for a minute about the graph below.  Without knowing anything about which color represents which sport, which of these two groups should have a higher mortality rate?  (Hint: The blue one)

Analysis

Fisher Exact Test

259 out of the 2436 qualifying football players was deceased according to http://www.pro-football-reference.com for a mortality rate of 10.63%.  Among baseball players, 137 out of 967 were dead for a mortality rate of 14.17%.  Both of these rates are lower than Barnwell’s, but are of similar relative magnitudes.  Using a Fisher exact test, the null hypothesis of no association is rejected with a p-value of 0.004407, which is essentially identical to Barnwell’s p-value of 0.004.  So there is a statistically significant difference between these groups.  That’s a fact. But….

Logistic Regression

This type of analysis estimates the probability of a certain event, in this case, death, while taking into account multiple factors that could be related to the event.  Running a logistic regression model with death as an outcome and only sport as a dummy variable predictor yields a p-value of 0.00384 for the significance of sport being associated with death.  This is largely the same result as the Fisher exact test as neither are controlling for any other variables besides sport.

When age, actually, it’s technically years since birth since some people are deceased, is added to the model, the effect of sport disappears entirely.  The p-value for age is < 2^{-16} and the p-value for sport is 0.441, which is not significant.

Conclusions and Future work

To reiterate, what we can conclude from this is that if two players are the exact same age and one played professional football and the other played professional baseball for at least five years and one of those years was between 1959 and 1988 that neither the football player nor the baseball player is more likely to be deceased.

The purpose of this work is to demonstrate that the conclusions reached in Barnwell’s article Mere Mortals is at the very least misleading.  The author makes the case that baseball players are dying more often than football players.  While it is true that baseball players from this time period are more likely to be deceased than their football counterparts, I have demonstrated that it is not BECAUSE they played baseball, rather it is their age, a pretty serious risk factor for death, that is a more significant predictor of being deceased.

I think a more interesting analysis than the one presented here by myself would be to look at survival times after retiring from each of the sports looking at risk factors including age, BMI, and years in the respective league.

A Final Request

Is it possible that baseball players die at a younger age than football players?  I suppose it is possible, but I think it’s unlikely.  What is for sure is that Bill Barnwell’s article, due to the flawed application of statistical methods, does not in any way demonstrate that baseball players are dying more often than football players.  I believe it to be irresponsible to present work which falsely understates the potential dangers of playing football especially with the recent concussion and CTE studies involving NFL players.  Therefore, I am requesting that Bill Barnwell openly retract his article, Mere Mortals, in writing on Grantland.com due the major statistical flaws of the study.

Posted on August 23, 2012, in Sports. Bookmark the permalink. 9 Comments.

  1. There appears to be a typo here: “The p-value for age is < 2^{-16} and the p-value for age is 0.441, which is not significant." I think the 0.441 refers to sport and not age.

  2. An even more interesting “apples to apples” comparison would be to compare mortality rates across positions. Did you collect data on the primary position of a player in the NFL? One might expect that linebackers, running backs, full backs and safeties are more prone to CTE which seems to be the case from the high profile suicides that have been reported.

    • I agree, it would be more interesting to look at players by position. After controlling for other risk factors such as age. And maybe BMI. Not controlling for age is a major, major mistake. That’s why Bill Barnwell should retract his article. In writing. On Grantland.com.

  3. Yes, the Bill Barnwell study is flawed. In order to conduct a mortality study you need to calculate life years of exposure on a well defined group and have a basis for expected deaths (e.g., general population). You may want to read my article on mortality differences in NFL/MLB players published in the July/August 2012 issue of Contingencies Magazine (cut and paste below):
    http://www.contingenciesonline.com/contingenciesonline/20120708#pg41

  4. Hi there,

    Interesting article and conclusion. I’m in charge of replicating a bunch of R analyses for a project and I’d appreciate being able to to take a look at your code behind this. Seems like tons of data required to do this.

    Thanks,

    Ryan

    • This was a few years ago, so I’ll have to go back and find my code. Can you DM me your email address on Twitter and I’ll email you what I find.

      Sorry for the delay in responding.

      Cheers,
      Greg

  1. Pingback: Presidential Candidates, Search Engine Auto-Complete, and Word Clouds: Bicycles, Unicorns, and American Presidential Politics « Stats in the Wild

  2. Pingback: There is no way NFL teams care what Warren Sharp thinks | Stats in the Wild

Leave a comment