NBA MVP Comparisons - Part 2

Kivan Polimis, Wed 27 February 2019, Sports

NBA MVP Comparisons - Part 2

Part 2

In the previous post, Part 1 of the NBA MVP Comparison series, we gathered the relevant NBA MVP finalist data from basketball-reference.com and processed it.

Before going further, it is important to note that all analysis in Part 2 deals with MVP finalists from 1983 to the present (2018). This decision was made to separate the pre-3 point era from the post-3 point era which analytically and stylistically have different types of players.

Our goal with this blog post is to understand which MVP finalist, but non-MVP winner was most deserving of the award. To do this, we will predict the continuous dependent variable, vote share (ranging from 0 to 1), of a MVP finalist with a vector (list) of the following predictors (independent variables) from the processed MVP finalist data: age, games_played, avg_minutes, avg_points, avg_rebounds, avg_assists, avg_steals, avg_blocks, field_goal_pct, three_pt_pct, free_throw_pct, win_shares, win_shares_per_48.

The specific questions that will allow us to assess which finalist was most deserving of a MVP award are:

  1. Which MVP winner(s) outperformed their predicted vote share?
  2. Which MVP winner(s) should have finished second to another finalist in predicted vote share?

In this post, we will:

  1. Compare machine learning (ML) models for selecting the MVP award
  2. Identify the finalists deserving of a MVP award

While the language of statistical models deals more with over- and under-performance, I will interchange these terms with more colloquial language such as robbed/robbery.

Outline

  1. Compare Machine Learning Regression Models
  2. Determine Controversial MVPs
  3. Choose New MVPs

Models Overview

ML Regression Models

Compare ML Regression Models

$$ RMSE = \sqrt{\frac{1}{n}\sum_{i=1}^{n}(Actual_{i} - Predicted_{i})^2} $$
RMSE = function(actual, predicted) {
  RMSE = sqrt(sum(mean(actual-predicted)^2))
  return(RMSE)
  }
RMSE Table
Machine Learning Model RMSE
Latent Discriminant Analysis 0.002
XGBoost - Linear 0.004
Random Forest Classifier 0.005
Gradient Boost 0.011
XGBoost - Tree 0.012

Latent Discriminant Analysis (LDA)

Because our predicted variable, vote share, is continuous we need a rule to predict the binary (winner/loser(s)) MVP award. The rule we will use is the player with the maximum predicted vote share (LDA Prediction) for a given year will be awarded the MVP. This rule will need additional complexity to handle potential ties. The order of tiebreakers for playoff seeding is near the bottom of this article. Perhaps we could adopt similar rules for vote share ties; fortunately for this analysis, there were no ties.

mvp_finalist_data = mvp_finalist_data %>% 
  group_by(Year) %>% 
  mutate(`LDA MVP` = ifelse(`LDA Prediction` == max(`LDA Prediction`), 1,
                     ifelse(`LDA Prediction` != max(`LDA Prediction`), 0,0)))
LDA Correct MVP Predictions
Player Age Year Team Rank Vote Share MVP LDA MVP
Moses Malone 27 1983 PHI 1 0.96 1 1
Larry Bird 28 1985 BOS 1 0.978 1 1
Larry Bird 29 1986 BOS 1 0.981 1 1
Magic Johnson 27 1987 LAL 1 0.94 1 1
Michael Jordan 24 1988 CHI 1 0.831 1 1
Magic Johnson 29 1989 LAL 1 0.782 1 1
Michael Jordan 27 1991 CHI 1 0.928 1 1
Michael Jordan 28 1992 CHI 1 0.938 1 1
Charles Barkley 29 1993 PHO 1 0.852 1 1
Hakeem Olajuwon 31 1994 HOU 1 0.88 1 1
David Robinson 29 1995 SAS 1 0.858 1 1
Michael Jordan 32 1996 CHI 1 0.986 1 1
Michael Jordan 34 1998 CHI 1 0.934 1 1
Shaquille O'Neal 27 2000 LAL 1 0.998 1 1
Allen Iverson 25 2001 PHI 1 0.904 1 1
Tim Duncan 25 2002 SAS 1 0.757 1 1
Tim Duncan 26 2003 SAS 1 0.808 1 1
Kevin Garnett 27 2004 MIN 1 0.991 1 1
Dirk Nowitzki 28 2007 DAL 1 0.882 1 1
Kobe Bryant 29 2008 LAL 1 0.873 1 1
LeBron James 24 2009 CLE 1 0.969 1 1
LeBron James 25 2010 CLE 1 0.98 1 1
Derrick Rose 22 2011 CHI 1 0.977 1 1
LeBron James 27 2012 MIA 1 0.888 1 1
LeBron James 28 2013 MIA 1 0.998 1 1
Kevin Durant 25 2014 OKC 1 0.986 1 1
Stephen Curry 26 2015 GSW 1 0.922 1 1
Stephen Curry 27 2016 GSW 1 1 1 1
Russell Westbrook 28 2017 OKC 1 0.879 1 1
James Harden 28 2018 HOU 1 0.955 1 1



Controversial MVPs

  1. Which MVP winner(s) outperformed their predicted vote share?
  2. Which MVP winner(s) should have finished second to another finalist in predicted vote share?

Identify controversial MVPs Identify controversial MVPs

mvp_finalist_data = mvp_finalist_data %>% 
  mutate(Residual = abs(`Vote Share`-`LDA Prediction`))
Controversial MVPs
Player Age Year Team Vote Share LDA Prediction Residual LDA MVP
Larry Bird 27 1984 BOS 0.858 0.347 0.511 0
Magic Johnson 30 1990 LAL 0.691 0.518 0.173 0
Karl Malone 33 1997 UTA 0.857 0.857 0 0
Karl Malone 35 1999 UTA 0.701 0.701 0 0
Steve Nash 30 2005 PHO 0.839 0.739 0.1 0
Steve Nash 31 2006 PHO 0.739 0.739 0 0
LDA Preferred MVPs
Player Age Year Team Rank Vote Share LDA Prediction Residual LDA MVP
Bernard King 27 1984 NYK 2 0.491 0.491 0 1
Charles Barkley 26 1990 PHI 2 0.667 0.667 0 1
Michael Jordan 33 1997 CHI 2 0.832 0.934 0.102 1
Shaquille O'Neal 26 1999 LAL 6 0.075 0.888 0.813 1
Shaquille O'Neal 32 2005 MIA 2 0.813 0.813 0 1
Chauncey Billups 29 2006 DET 5 0.344 0.977 0.633 1

The Robbers

The Robbed

Review

Download a .pdf of this post