NBA MVP Comparisons - Part 3

sports-analytics
python
Author

Kivan Polimis

Published

March 6, 2019

NBA MVP Comparisons

Part 3

  • 6 th March 2019

In the last two blog posts we:
1. Gathered the relevant data for all NBA MVPs 2. Evaluated several machine learning models to select the MVP and chose the Latent Discriminant Analysis model as the best predictor.

The goal of this blog post is to gather the data necessary to predict the 2018-2019 MVP

We will look select from the MVP finalists in 2017-2018 * MVP will likely be a finalist from the previous year * Is this a valid assumption?

Update

On May 17th, the NBA announced the 2018-2019 MVP finalists: * Giannis Antetokounmpo, Milwaukee Bucks * Paul George, Oklahoma City Thunder * James Harden, Houston Rockets * source: https://www.nba.com/article/2019/05/17/finalists-2019-nba-awards

Outline

  1. Import modules
  2. Examine html structure of webpage
  3. Use a function with Beautiful Soup to parse webpages into .csv
  4. Analyze .csv of webpage as a Pandas DataFrame
  5. Process the data
True
  • get the html

  • create the BeautifulSoup object

  • Extract the necessary values for the column headers from the table and store them as a list

raw column names in finalist table: ['Per Game', 'Shooting', 'Advanced', 'Rank', 'Player', 'Age', 'Tm', 'First', 'Pts Won', 'Pts Max', 'Share', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48', '1', '2', '3', '4']
formatted column names in finalist table: ['Rank', 'Player', 'Age', 'Tm', 'First', 'Pts Won', 'Pts Max', 'Share', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48']
20 columns in finalist table
  • The data is found within the tr elements of the first tbody element
  • We want the elements from the 3rd row and on
  • take a look at the last row to examine data quality
the subset soup object is of type: <class 'bs4.element.ResultSet'>
<tr><th class="right " data-stat="rank" scope="row">13</th><td class="left " csk="Oladipo,Victor" data-append-csv="oladivi01" data-stat="player"><a href="/players/o/oladivi01.html">Victor Oladipo</a></td><td class="right " data-stat="age">25</td><td class="left " data-stat="team_id"><a href="/teams/IND/2018.html">IND</a></td><td class="right " data-stat="votes_first">0.0</td><td class="right " data-stat="points_won">2.0</td><td class="right " data-stat="points_max">1010</td><td class="right " data-stat="award_share">0.002</td><td class="right " data-stat="g">75</td><td class="right " data-stat="mp_per_g">34.0</td><td class="right " data-stat="pts_per_g">23.1</td><td class="right " data-stat="trb_per_g">5.2</td><td class="right " data-stat="ast_per_g">4.3</td><td class="right " data-stat="stl_per_g">2.4</td><td class="right " data-stat="blk_per_g">0.8</td><td class="right " data-stat="fg_pct">.477</td><td class="right " data-stat="fg3_pct">.371</td><td class="right " data-stat="ft_pct">.799</td><td class="right " data-stat="ws">8.2</td><td class="right " data-stat="ws_per_48">.155</td></tr>
  • to get a player’s 2018-2019 statistical information, we can extract the player’s web page from the BeautifulSoup object
'/players/o/oladivi01.html'
  • however, the player link is just a stub url and needs a base url for basketball-reference.com appended to it to access the player’s web page
'https://www.basketball-reference.com/players/o/oladivi01.html'
True
  • create a function to extract MVP finalist data

  • extract the data we want

  • and then store it in a DataFrame

the MVP finalist dataframe has 13 rows (player-year observations) and 21 columns
Rank Player Age Tm First Pts Won Pts Max Share G MP ... TRB AST STL BLK FG% 3P% FT% WS WS/48 player_link
7 8 DeMar DeRozan 28 TOR 0.0 32.0 1010 0.032 80 33.9 ... 3.9 5.2 1.1 0.3 .456 .310 .825 9.6 .170 https://www.basketball-reference.com/players/d...
8 9 LaMarcus Aldridge 32 SAS 0.0 6.0 1010 0.006 75 33.5 ... 8.5 2.0 0.6 1.2 .510 .293 .837 10.9 .209 https://www.basketball-reference.com/players/a...
9 10T Jimmy Butler 28 MIN 0.0 5.0 1010 0.005 59 36.7 ... 5.3 4.9 2.0 0.4 .474 .350 .854 8.9 .198 https://www.basketball-reference.com/players/b...
10 10T Stephen Curry 29 GSW 0.0 5.0 1010 0.005 51 32.0 ... 5.1 6.1 1.6 0.2 .495 .423 .921 9.1 .267 https://www.basketball-reference.com/players/c...
11 12 Joel Embiid 23 PHI 0.0 4.0 1010 0.004 63 30.3 ... 11.0 3.2 0.6 1.8 .483 .308 .769 6.2 .155 https://www.basketball-reference.com/players/e...
12 13 Victor Oladipo 25 IND 0.0 2.0 1010 0.002 75 34.0 ... 5.2 4.3 2.4 0.8 .477 .371 .799 8.2 .155 https://www.basketball-reference.com/players/o...

6 rows × 21 columns

  • Extract the necessary values for the column headers from the table and store them as a list
  • create the BeautifulSoup object
the columns in the career data tables are: 
 ['Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS'] 

there are 29 total columns in the career table
  • let’s examine the data to make sure it conforms with our expectations
player name: James Harden
player years active: ['2009-10', '2010-11', '2011-12', '2012-13', '2013-14', '2014-15', '2015-16', '2016-17', '2017-18', '2018-19']
player career data: ['20', 'OKC', 'NBA', 'SG', '76', '0', '22.9', '3.1', '7.6', '.403', '1.2', '3.3', '.375', '1.8', '4.3', '.424', '.484', '2.6', '3.2', '.808', '0.6', '2.6', '3.2', '1.8', '1.1', '0.3', '1.4', '2.6', '9.9', '21', 'OKC', 'NBA', 'SG', '82', '5', '26.7', '3.6', '8.3', '.436', '1.4', '4.0', '.349', '2.3', '4.4', '.514', '.518', '3.5', '4.2', '.843', '0.5', '2.6', '3.1', '2.1', '1.1', '0.3', '1.3', '2.5', '12.2', '22', 'OKC', 'NBA', 'SG', '62', '2', '31.4', '5.0', '10.1', '.491', '1.8', '4.7', '.390', '3.1', '5.4', '.579', '.582', '5.0', '6.0', '.846', '0.5', '3.6', '4.1', '3.7', '1.0', '0.2', '2.2', '2.4', '16.8', '23', 'HOU', 'NBA', 'SG', '78', '78', '38.3', '7.5', '17.1', '.438', '2.3', '6.2', '.368', '5.2', '10.9', '.477', '.504', '8.6', '10.2', '.851', '0.8', '4.1', '4.9', '5.8', '1.8', '0.5', '3.8', '2.3', '25.9', '24', 'HOU', 'NBA', 'SG', '73', '73', '38.0', '7.5', '16.5', '.456', '2.4', '6.6', '.366', '5.1', '9.9', '.515', '.529', '7.9', '9.1', '.866', '0.8', '3.9', '4.7', '6.1', '1.6', '0.4', '3.6', '2.4', '25.4', '25', 'HOU', 'NBA', 'SG', '81', '81', '36.8', '8.0', '18.1', '.440', '2.6', '6.9', '.375', '5.4', '11.3', '.480', '.511', '8.8', '10.2', '.868', '0.9', '4.7', '5.7', '7.0', '1.9', '0.7', '4.0', '2.6', '27.4', '26', 'HOU', 'NBA', 'SG', '82', '82', '38.1', '8.7', '19.7', '.439', '2.9', '8.0', '.359', '5.8', '11.7', '.494', '.512', '8.8', '10.2', '.860', '0.8', '5.3', '6.1', '7.5', '1.7', '0.6', '4.6', '2.8', '29.0', '27', 'HOU', 'NBA', 'PG', '81', '81', '36.4', '8.3', '18.9', '.440', '3.2', '9.3', '.347', '5.1', '9.6', '.530', '.525', '9.2', '10.9', '.847', '1.2', '7.0', '8.1', '11.2', '1.5', '0.5', '5.7', '2.7', '29.1', '28', 'HOU', 'NBA', 'SG', '72', '72', '35.4', '9.0', '20.1', '.449', '3.7', '10.0', '.367', '5.4', '10.1', '.531', '.541', '8.7', '10.1', '.858', '0.6', '4.8', '5.4', '8.8', '1.8', '0.7', '4.4', '2.3', '30.4', '29', 'HOU', 'NBA', 'PG', '78', '78', '36.8', '10.8', '24.5', '.442', '4.8', '13.2', '.368', '6.0', '11.3', '.528', '.541', '9.7', '11.0', '.879', '0.8', '5.8', '6.6', '7.5', '2.0', '0.7', '5.0', '3.1', '36.1', '', '', 'NBA', '', '765', '552', '34.1', '7.2', '16.2', '.443', '2.6', '7.3', '.365', '4.5', '9.0', '.506', '.525', '7.3', '8.5', '.857', '0.8', '4.5', '5.2', '6.2', '1.6', '0.5', '3.6', '2.6', '24.3', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'HOU', 'NBA', '', '545', '545', '37.1', '8.5', '19.3', '.443', '3.1', '8.6', '.364', '5.4', '10.7', '.506', '.524', '8.8', '10.2', '.861', '0.8', '5.1', '6.0', '7.7', '1.8', '0.6', '4.4', '2.6', '29.0', '', 'OKC', 'NBA', '', '220', '7', '26.7', '3.8', '8.6', '.444', '1.5', '3.9', '.370', '2.4', '4.7', '.506', '.529', '3.6', '4.3', '.835', '0.5', '2.9', '3.4', '2.5', '1.1', '0.3', '1.6', '2.5', '12.7']
  • the player career data is a list that includes every year the player was active

    • this list is not broken up by year
  • to create yearly player data from the player career data list, we need a function to seperate the list

  • we can use the following function is from StackOverflow

  • every 29 items in the list represents a year of player data

    • we will split the player_career_data object every 29 steps (using the language of the slice_per function) to create a year of data
  • then we add the column headers to the dataframe as a sanity check that the information was extracted correctly

<class 'list'>
Age Tm Lg Pos G GS MP FG FGA FG% ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 20 OKC NBA SG 76 0 22.9 3.1 7.6 .403 ... .808 0.6 2.6 3.2 1.8 1.1 0.3 1.4 2.6 9.9
1 21 OKC NBA SG 82 5 26.7 3.6 8.3 .436 ... .843 0.5 2.6 3.1 2.1 1.1 0.3 1.3 2.5 12.2
2 22 OKC NBA SG 62 2 31.4 5.0 10.1 .491 ... .846 0.5 3.6 4.1 3.7 1.0 0.2 2.2 2.4 16.8
3 23 HOU NBA SG 78 78 38.3 7.5 17.1 .438 ... .851 0.8 4.1 4.9 5.8 1.8 0.5 3.8 2.3 25.9
4 24 HOU NBA SG 73 73 38.0 7.5 16.5 .456 ... .866 0.8 3.9 4.7 6.1 1.6 0.4 3.6 2.4 25.4
5 25 HOU NBA SG 81 81 36.8 8.0 18.1 .440 ... .868 0.9 4.7 5.7 7.0 1.9 0.7 4.0 2.6 27.4
6 26 HOU NBA SG 82 82 38.1 8.7 19.7 .439 ... .860 0.8 5.3 6.1 7.5 1.7 0.6 4.6 2.8 29.0
7 27 HOU NBA PG 81 81 36.4 8.3 18.9 .440 ... .847 1.2 7.0 8.1 11.2 1.5 0.5 5.7 2.7 29.1
8 28 HOU NBA SG 72 72 35.4 9.0 20.1 .449 ... .858 0.6 4.8 5.4 8.8 1.8 0.7 4.4 2.3 30.4
9 29 HOU NBA PG 78 78 36.8 10.8 24.5 .442 ... .879 0.8 5.8 6.6 7.5 2.0 0.7 5.0 3.1 36.1
10 NBA 765 552 34.1 7.2 16.2 .443 ... .857 0.8 4.5 5.2 6.2 1.6 0.5 3.6 2.6 24.3
11 ...
12 HOU NBA 545 545 37.1 8.5 19.3 .443 ... .861 0.8 5.1 6.0 7.7 1.8 0.6 4.4 2.6 29.0
13 OKC NBA 220 7 26.7 3.8 8.6 .444 ... .835 0.5 2.9 3.4 2.5 1.1 0.3 1.6 2.5 12.7

14 rows × 29 columns

  • now we need a function to extract the player career data using the link to the player’s profile as the input

  • use list comprehension to extract career data for each player link in mvp_finalist_2018_data

  • use Pandas concat function to store all finalist data in one DataFrame

  • store all finalist data in one DataFrame

the MVP finalist dataframe has 13 rows (player-year observations) and 21 columns
Player Year 0 1 2 3 4 5 6 7 ... 19 20 21 22 23 24 25 26 27 28
0 James Harden 2009-10 20 OKC NBA SG 76 0 22.9 3.1 ... .808 0.6 2.6 3.2 1.8 1.1 0.3 1.4 2.6 9.9
1 James Harden 2010-11 21 OKC NBA SG 82 5 26.7 3.6 ... .843 0.5 2.6 3.1 2.1 1.1 0.3 1.3 2.5 12.2
2 James Harden 2011-12 22 OKC NBA SG 62 2 31.4 5.0 ... .846 0.5 3.6 4.1 3.7 1.0 0.2 2.2 2.4 16.8
3 James Harden 2012-13 23 HOU NBA SG 78 78 38.3 7.5 ... .851 0.8 4.1 4.9 5.8 1.8 0.5 3.8 2.3 25.9
4 James Harden 2013-14 24 HOU NBA SG 73 73 38.0 7.5 ... .866 0.8 3.9 4.7 6.1 1.6 0.4 3.6 2.4 25.4

5 rows × 31 columns

  • rename columns by concatentating column_headers_player with the two new columns we added in our extract_career_data function
Index(['Player', 'Year', 'Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 'MP', 'FG',
       'FGA', 'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT',
       'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF',
       'PTS'],
      dtype='object')
  • examine the mvp_finalist_2019_career_data data
Player Year Age Tm Lg Pos G GS MP FG ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
1 Victor Oladipo 2014-15 22 ORL NBA SG 72 71 35.7 6.6 ... .819 0.7 3.5 4.2 4.1 1.7 0.3 2.8 2.6 17.9
2 Victor Oladipo 2015-16 23 ORL NBA SG 72 52 33.0 5.9 ... .830 0.7 4.1 4.8 3.9 1.6 0.8 2.1 2.4 16.0
3 Victor Oladipo 2016-17 24 OKC NBA SG 67 67 33.2 6.1 ... .753 0.6 3.8 4.3 2.6 1.2 0.3 1.8 2.3 15.9
4 Victor Oladipo 2017-18 25 IND NBA SG 75 75 34.0 8.5 ... .799 0.6 4.6 5.2 4.3 2.4 0.8 2.9 2.3 23.1
5 Victor Oladipo 2018-19 26 IND NBA SG 36 36 31.9 6.9 ... .730 0.6 5.0 5.6 5.2 1.7 0.3 2.3 2.0 18.8

5 rows × 31 columns

Now that we fixed up the necessary columns, let’s write out the raw data to a CSV file.

  • Write out the career data for 2018 MVP finalists to the raw_data folder in the data folder

Cleaning the Data

  • Now that we have the raw MVP data, we need to clean it up a bit for data exploration
Player Year Age Tm Lg Pos G GS MP FG ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
0 James Harden 2009-10 20 OKC NBA SG 76 0 22.9 3.1 ... 0.808 0.6 2.6 3.2 1.8 1.1 0.3 1.4 2.6 9.9
1 James Harden 2010-11 21 OKC NBA SG 82 5 26.7 3.6 ... 0.843 0.5 2.6 3.1 2.1 1.1 0.3 1.3 2.5 12.2
2 James Harden 2011-12 22 OKC NBA SG 62 2 31.4 5.0 ... 0.846 0.5 3.6 4.1 3.7 1.0 0.2 2.2 2.4 16.8
3 James Harden 2012-13 23 HOU NBA SG 78 78 38.3 7.5 ... 0.851 0.8 4.1 4.9 5.8 1.8 0.5 3.8 2.3 25.9
4 James Harden 2013-14 24 HOU NBA SG 73 73 38.0 7.5 ... 0.866 0.8 3.9 4.7 6.1 1.6 0.4 3.6 2.4 25.4

5 rows × 31 columns

Index(['Player', 'Year', 'Age', 'Tm', 'Lg', 'Pos', 'G', 'GS', 'MP', 'FG',
       'FGA', 'FG%', '3P', '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT',
       'FTA', 'FT%', 'ORB', 'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF',
       'PTS'],
      dtype='object')
  • create dictionaries for renaming columns
  • rename all columns with dictionaries
player year age team league Pos games_played games_started avg_minutes field_goals_made_per_game ... free_throw_pct offensive_rebounds_per_game defensive_rebounds_per_game total_rebounds_per_game assists_per_game steals_per_game blocks_per_game turnovers_per_game fouls_committed_per_game points_per_game
0 James Harden 2009-10 20 OKC NBA SG 76 0 22.9 3.1 ... 0.808 0.6 2.6 3.2 1.8 1.1 0.3 1.4 2.6 9.9
1 James Harden 2010-11 21 OKC NBA SG 82 5 26.7 3.6 ... 0.843 0.5 2.6 3.1 2.1 1.1 0.3 1.3 2.5 12.2
2 James Harden 2011-12 22 OKC NBA SG 62 2 31.4 5.0 ... 0.846 0.5 3.6 4.1 3.7 1.0 0.2 2.2 2.4 16.8
3 James Harden 2012-13 23 HOU NBA SG 78 78 38.3 7.5 ... 0.851 0.8 4.1 4.9 5.8 1.8 0.5 3.8 2.3 25.9
4 James Harden 2013-14 24 HOU NBA SG 73 73 38.0 7.5 ... 0.866 0.8 3.9 4.7 6.1 1.6 0.4 3.6 2.4 25.4

5 rows × 31 columns

Index(['player', 'year', 'age', 'team', 'league', 'Pos', 'games_played',
       'games_started', 'avg_minutes', 'field_goals_made_per_game',
       'field_goals_attempted_per_game', 'field_goal_pct',
       'three_pt_fg_made_per_game', 'three_pt_fg_attempted_per_game',
       'three_pt_pct', 'two_pt_fg_made_per_game',
       'two_pt_fg_attempted_per_game', 'two_pt_fg_pct', 'effective_fg_pct',
       'free_throws_made_per_game', 'free_throws_attempted_per_game',
       'free_throw_pct', 'offensive_rebounds_per_game',
       'defensive_rebounds_per_game', 'total_rebounds_per_game',
       'assists_per_game', 'steals_per_game', 'blocks_per_game',
       'turnovers_per_game', 'fouls_committed_per_game', 'points_per_game'],
      dtype='object')

Cleaning Up the Rest of the Data

  • convert the data to proper numeric types
<class 'pandas.core.frame.DataFrame'>
Index: 121 entries, 0 to 120
Data columns (total 31 columns):
player                            121 non-null object
year                              121 non-null object
age                               121 non-null int64
team                              121 non-null object
league                            121 non-null object
Pos                               121 non-null object
games_played                      121 non-null int64
games_started                     121 non-null int64
avg_minutes                       121 non-null float64
field_goals_made_per_game         121 non-null float64
field_goals_attempted_per_game    121 non-null float64
field_goal_pct                    121 non-null float64
three_pt_fg_made_per_game         121 non-null float64
three_pt_fg_attempted_per_game    121 non-null float64
three_pt_pct                      121 non-null float64
two_pt_fg_made_per_game           121 non-null float64
two_pt_fg_attempted_per_game      121 non-null float64
two_pt_fg_pct                     121 non-null float64
effective_fg_pct                  121 non-null float64
free_throws_made_per_game         121 non-null float64
free_throws_attempted_per_game    121 non-null float64
free_throw_pct                    121 non-null float64
offensive_rebounds_per_game       121 non-null float64
defensive_rebounds_per_game       121 non-null float64
total_rebounds_per_game           121 non-null float64
assists_per_game                  121 non-null float64
steals_per_game                   121 non-null float64
blocks_per_game                   121 non-null float64
turnovers_per_game                121 non-null float64
fouls_committed_per_game          121 non-null float64
points_per_game                   121 non-null float64
dtypes: float64(23), int64(3), object(5)
memory usage: 30.2+ KB
  • Get the column names for the numeric columns
  • Replace all NaNs with 0
<class 'pandas.core.frame.DataFrame'>
Index: 121 entries, 0 to 120
Data columns (total 31 columns):
player                            121 non-null object
year                              121 non-null object
age                               121 non-null int64
team                              121 non-null object
league                            121 non-null object
Pos                               121 non-null object
games_played                      121 non-null int64
games_started                     121 non-null int64
avg_minutes                       121 non-null float64
field_goals_made_per_game         121 non-null float64
field_goals_attempted_per_game    121 non-null float64
field_goal_pct                    121 non-null float64
three_pt_fg_made_per_game         121 non-null float64
three_pt_fg_attempted_per_game    121 non-null float64
three_pt_pct                      121 non-null float64
two_pt_fg_made_per_game           121 non-null float64
two_pt_fg_attempted_per_game      121 non-null float64
two_pt_fg_pct                     121 non-null float64
effective_fg_pct                  121 non-null float64
free_throws_made_per_game         121 non-null float64
free_throws_attempted_per_game    121 non-null float64
free_throw_pct                    121 non-null float64
offensive_rebounds_per_game       121 non-null float64
defensive_rebounds_per_game       121 non-null float64
total_rebounds_per_game           121 non-null float64
assists_per_game                  121 non-null float64
steals_per_game                   121 non-null float64
blocks_per_game                   121 non-null float64
turnovers_per_game                121 non-null float64
fouls_committed_per_game          121 non-null float64
points_per_game                   121 non-null float64
dtypes: float64(23), int64(3), object(5)
memory usage: 30.2+ KB
  • We are finally done cleaning the data and now we can save it to a CSV file.
 the dimensions for the final data are: (121, 31) (rows, columns)
player year age team league Pos games_played games_started avg_minutes field_goals_made_per_game ... free_throw_pct offensive_rebounds_per_game defensive_rebounds_per_game total_rebounds_per_game assists_per_game steals_per_game blocks_per_game turnovers_per_game fouls_committed_per_game points_per_game
120 Victor Oladipo 2018-19 26 IND NBA SG 36 36 31.9 6.9 ... 0.730 0.6 5.0 5.6 5.2 1.7 0.3 2.3 2.0 18.8
25 LeBron James 2018-19 34 LAL NBA SF 55 55 35.2 10.1 ... 0.665 1.0 7.4 8.5 8.3 1.3 0.6 3.6 1.7 27.4
91 LaMarcus Aldridge 2018-19 33 SAS NBA C 81 81 33.2 8.4 ... 0.847 3.1 6.1 9.2 2.4 0.5 1.3 1.8 2.2 21.3
39 Damian Lillard 2018-19 28 POR NBA PG 80 80 35.5 8.5 ... 0.912 0.9 3.8 4.6 6.9 1.1 0.4 2.7 1.9 25.8
32 Anthony Davis 2018-19 25 NOP NBA C 56 56 33.0 9.5 ... 0.794 3.1 8.9 12.0 3.9 1.6 2.4 2.0 2.4 25.9

5 rows × 31 columns

Review

  • In this tutorial, we learned how to:
    • examine the html structure of webpage
    • use functions based on the Beautiful Soup module to parse tables on multiple webpage into .csv
    • analyzed a .csv file using the Pandas module

Download this notebook or see a static view here

last updated: 2019-07-01 09:13 

System and module version information: 

Python version: sys.version_info(major=3, minor=7, micro=1, releaselevel='final', serial=0)
urllib.request version: 3.7
pandas version: 0.23.4
Beautiful Soup version: 4.6.3
Source: NBA MVP Comparisons