True
NBA MVP Comparisons - Part 1
Kivan Polimis
December 25, 2018
NBA MVP Comparisons
Part 1
- 25 th December 2018
It’s Christmas and that means a full slate of NBA games. This time of year also provokes some great NBA discussions and the best NBA discussions are comparative. Arguments like: would Jordan’s ’96 Bulls would beat the ’17 Warriors?
Another comparison that is sure to create great debate is who had the best (or worst) Most Valuable Player (MVP) season in history? This question has a strong empirical dimension, in that we can observe quantifiable aspects across seasons, and can leverage programming to both gather and and analyze the data.
The MVP data will gather comes from basketball-reference.com. Basketball-reference is part of the Sports-Reference sites, “a group of sites providing both basic and sabermetric statistics and resources for sports fans everywhere. [Sports-Reference aims] to be the easiest-to-use, fastest, most complete sources for sports statistics anywhere” (sports-reference.com).
In this post, we will gather and pre-process all the data for the a multi-part series to determine: 1. the MVP finalist with best case for winning their year 2. predict the 2018-2019 MVP
Let the shade begin
Outline
- Import modules
- Examine html structure of webpage
- Use a function with
Beautiful Soupto parse webpages into .csv - Analyze .csv of webpage as a
PandasDataFrame - Process the data
- import relevant modules
- standard library modules:
- open source modules:
- Let’s examine the webpage with all the MVP data from the 1966-1956 season to the 2017-2018 season
Scraping the Column Headers
The column headers we need for our DataFrame are found in the th element
['Shooting', 'Advanced', 'Season', 'Lg', 'Player', 'Voting', 'Age', 'Tm', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48', '2017-18', '2016-17', '2015-16', '2014-15', '2013-14', '2012-13', '2011-12']
['Season', 'Lg', 'Player', 'Voting', 'Age', 'Tm', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48']
18
Scraping the Data
- Note that
table_rowsis a list of tag elements.
<class 'list'>
<tr><th class="left " data-stat="season" scope="row"><a href="/leagues/NBA_2018.html">2017-18</a></th><td class="left " data-stat="lg_id"><a href="/leagues/NBA_2018.html">NBA</a></td><td class="left " csk="Harden,James" data-append-csv="hardeja01" data-stat="player"><a href="/players/h/hardeja01.html">James Harden</a></td><td class="center " data-stat="voting"> (<a href="/awards/awards_2018.html#mvp">V</a>)</td><td class="right " data-stat="age">28</td><td class="left " data-stat="team_id"><a href="/teams/HOU/2018.html">HOU</a></td><td class="right " data-stat="g">72</td><td class="right " data-stat="mp_per_g">35.4</td><td class="right " data-stat="pts_per_g">30.4</td><td class="right " data-stat="trb_per_g">5.4</td><td class="right " data-stat="ast_per_g">8.8</td><td class="right " data-stat="stl_per_g">1.8</td><td class="right " data-stat="blk_per_g">0.7</td><td class="right " data-stat="fg_pct">.449</td><td class="right " data-stat="fg3_pct">.367</td><td class="right " data-stat="ft_pct">.858</td><td class="right " data-stat="ws">15.4</td><td class="right " data-stat="ws_per_48">.289</td></tr>
The data we want for each player is found within the the
td(or table data) elements.
Below I’ve created a function that extracts the data we want from
table_rows. The comments should walk you through what each part of the function does.now we can create a
DataFramewith the MVP data
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-18 | NBA | James Harden | (V) | 28 | HOU | 72 | 35.4 | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | .449 | .367 | .858 | 15.4 | .289 |
- rename the columns
- view the data
the MVP dataframe has 113 rows (player-year observations) and 18 columns
| Season | Lg | Player | Voting | Age | Tm | G | MP | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-18 | NBA | James Harden | (V) | 28 | HOU | 72 | 35.4 | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | .449 | .367 | .858 | 15.4 | .289 |
| 1 | 2016-17 | NBA | Russell Westbrook | (V) | 28 | OKC | 81 | 34.6 | 31.6 | 10.7 | 10.4 | 1.6 | 0.4 | .425 | .343 | .845 | 13.1 | .224 |
| 2 | 2015-16 | NBA | Stephen Curry | (V) | 27 | GSW | 79 | 34.2 | 30.1 | 5.4 | 6.7 | 2.1 | 0.2 | .504 | .454 | .908 | 17.9 | .318 |
| 3 | 2014-15 | NBA | Stephen Curry | (V) | 26 | GSW | 80 | 32.7 | 23.8 | 4.3 | 7.7 | 2.0 | 0.2 | .487 | .443 | .914 | 15.7 | .288 |
| 4 | 2013-14 | NBA | Kevin Durant | (V) | 25 | OKC | 81 | 38.5 | 32.0 | 7.4 | 5.5 | 1.3 | 0.7 | .503 | .391 | .873 | 19.2 | .295 |
- now we need the data on all the finalist for the question:
- which finalist had the best argument for winning that year?
raw column names in finalist table: ['Per Game', 'Shooting', 'Advanced', 'Rank', 'Player', 'Age', 'Tm', 'First', 'Pts Won', 'Pts Max', 'Share', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48', '1', '2', '3', '4']
formatted column names in finalist table: ['Rank', 'Player', 'Age', 'Tm', 'First', 'Pts Won', 'Pts Max', 'Share', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48']
20 columns in finalist table
<class 'bs4.element.ResultSet'>
<tr><th class="right " data-stat="rank" scope="row">13</th><td class="left " csk="Oladipo,Victor" data-append-csv="oladivi01" data-stat="player"><a href="/players/o/oladivi01.html">Victor Oladipo</a></td><td class="right " data-stat="age">25</td><td class="left " data-stat="team_id"><a href="/teams/IND/2018.html">IND</a></td><td class="right " data-stat="votes_first">0.0</td><td class="right " data-stat="points_won">2.0</td><td class="right " data-stat="points_max">1010</td><td class="right " data-stat="award_share">0.002</td><td class="right " data-stat="g">75</td><td class="right " data-stat="mp_per_g">34.0</td><td class="right " data-stat="pts_per_g">23.1</td><td class="right " data-stat="trb_per_g">5.2</td><td class="right " data-stat="ast_per_g">4.3</td><td class="right " data-stat="stl_per_g">2.4</td><td class="right " data-stat="blk_per_g">0.8</td><td class="right " data-stat="fg_pct">.477</td><td class="right " data-stat="fg3_pct">.371</td><td class="right " data-stat="ft_pct">.799</td><td class="right " data-stat="ws">8.2</td><td class="right " data-stat="ws_per_48">.155</td></tr>
- create a function to extract MVP finalist data
the MVP finalist dataframe has 13 rows (player-year observations) and 20 columns
| Rank | Player | Age | Tm | First | Pts Won | Pts Max | Share | G | MP | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7 | 8 | DeMar DeRozan | 28 | TOR | 0.0 | 32.0 | 1010 | 0.032 | 80 | 33.9 | 23.0 | 3.9 | 5.2 | 1.1 | 0.3 | .456 | .310 | .825 | 9.6 | .170 |
| 8 | 9 | LaMarcus Aldridge | 32 | SAS | 0.0 | 6.0 | 1010 | 0.006 | 75 | 33.5 | 23.1 | 8.5 | 2.0 | 0.6 | 1.2 | .510 | .293 | .837 | 10.9 | .209 |
| 9 | 10T | Jimmy Butler | 28 | MIN | 0.0 | 5.0 | 1010 | 0.005 | 59 | 36.7 | 22.2 | 5.3 | 4.9 | 2.0 | 0.4 | .474 | .350 | .854 | 8.9 | .198 |
| 10 | 10T | Stephen Curry | 29 | GSW | 0.0 | 5.0 | 1010 | 0.005 | 51 | 32.0 | 26.4 | 5.1 | 6.1 | 1.6 | 0.2 | .495 | .423 | .921 | 9.1 | .267 |
| 11 | 12 | Joel Embiid | 23 | PHI | 0.0 | 4.0 | 1010 | 0.004 | 63 | 30.3 | 22.9 | 11.0 | 3.2 | 0.6 | 1.8 | .483 | .308 | .769 | 6.2 | .155 |
| 12 | 13 | Victor Oladipo | 25 | IND | 0.0 | 2.0 | 1010 | 0.002 | 75 | 34.0 | 23.1 | 5.2 | 4.3 | 2.4 | 0.8 | .477 | .371 | .799 | 8.2 | .155 |
Scraping the Data for All MVP Finalists Since 1956
Scraping the for finalist data since 1956 follows is essentially the same process as above, just repeated for each year, using a for loop.
As we loop over the years, we will create a
DataFramefor each year of MVP finalist data, and append it to a large list ofDataFrames that contains all the MVP finalists data. We will also have a separate list that will contain any errors and the url associated with that error. This will let us know if there are any issues with our scraper, and which url is causing the error.let’s time how long this loop takes
2019-05-25 15:10:57.567450
2019-05-25 15:12:03.441050
the loop took 0:01:05.873600
- the loop took ~ 1 minute
0
[]
- We don’t get any errors, so that’s good.
- Now we can concatenate all the
DataFrameswe scraped and create one largeDataFramecontaining all the finalist data
<class 'list'>
63
[ Year Rank Player Age Tm First Pts Won Pts Max Share G ... \
0 1956 1 Bob Pettit 23 STL 33.0 33.0 80 0.413 72 ...
1 1956 2 Paul Arizin 27 PHW 21.0 21.0 80 0.263 72 ...
2 1956 3 Bob Cousy 27 BOS 11.0 11.0 80 0.138 72 ...
3 1956 4 Mel Hutchins 27 FTW 9.0 9.0 80 0.113 66 ...
4 1956 5T Dolph Schayes 27 SYR 2.0 2.0 80 0.025 72 ...
5 1956 5T Bill Sharman 29 BOS 2.0 2.0 80 0.025 72 ...
6 1956 7T Tom Gola 23 PHW 1.0 1.0 80 0.013 68 ...
7 1956 7T Maurice Stokes 22 ROC 1.0 1.0 80 0.013 67 ...
PTS TRB AST STL BLK FG% 3P% FT% WS WS/48
0 25.7 16.2 2.6 .429 .736 13.8 .236
1 24.2 7.5 2.6 .448 .810 12.2 .214
2 18.8 6.8 8.9 .360 .844 6.8 .119
3 12.0 7.5 2.7 .425 .643 4.4 .095
4 20.4 12.4 2.8 .387 .858 11.8 .225
5 19.9 3.6 4.7 .438 .867 8.8 .157
6 10.8 9.1 5.9 .412 .733 6.5 .132
7 16.8 16.3 4.9 .354 .714 6.0 .125
[8 rows x 21 columns]]
['Year', 'Rank', 'Player', 'Age', 'Tm', 'First', 'Pts Won', 'Pts Max', 'Share', 'G', 'MP', 'PTS', 'TRB', 'AST', 'STL', 'BLK', 'FG%', '3P%', 'FT%', 'WS', 'WS/48']
21
(972, 21)
Year 1956
Rank 1
Player Bob Pettit
Age 23
Tm STL
First 33.0
Pts Won 33.0
Pts Max 80
Share 0.413
G 72
MP 38.8
PTS 25.7
TRB 16.2
AST 2.6
STL
BLK
FG% .429
3P%
FT% .736
WS 13.8
WS/48 .236
Name: 0, dtype: object
Now that we fixed up the necessary columns, let’s write out the raw data to a CSV file.
| Year | Rank | Player | Age | Tm | First | Pts Won | Pts Max | Share | G | ... | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1956 | 1 | Bob Pettit | 23 | STL | 33.0 | 33.0 | 80 | 0.413 | 72 | ... | 25.7 | 16.2 | 2.6 | .429 | .736 | 13.8 | .236 | |||
| 1 | 1956 | 2 | Paul Arizin | 27 | PHW | 21.0 | 21.0 | 80 | 0.263 | 72 | ... | 24.2 | 7.5 | 2.6 | .448 | .810 | 12.2 | .214 | |||
| 2 | 1956 | 3 | Bob Cousy | 27 | BOS | 11.0 | 11.0 | 80 | 0.138 | 72 | ... | 18.8 | 6.8 | 8.9 | .360 | .844 | 6.8 | .119 | |||
| 3 | 1956 | 4 | Mel Hutchins | 27 | FTW | 9.0 | 9.0 | 80 | 0.113 | 66 | ... | 12.0 | 7.5 | 2.7 | .425 | .643 | 4.4 | .095 | |||
| 4 | 1956 | 5T | Dolph Schayes | 27 | SYR | 2.0 | 2.0 | 80 | 0.025 | 72 | ... | 20.4 | 12.4 | 2.8 | .387 | .858 | 11.8 | .225 |
5 rows × 21 columns
Cleaning the Data
- Now that we have the raw MVP data, we need to clean it up a bit for data exploration
| Season | Lg | Player | Voting | Age | Tm | G | MP | PTS | TRB | AST | STL | BLK | FG% | 3P% | FT% | WS | WS/48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-18 | NBA | James Harden | (V) | 28.0 | HOU | 72.0 | 35.4 | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | 0.449 | 0.367 | 0.858 | 15.4 | 0.289 |
| 1 | 2016-17 | NBA | Russell Westbrook | (V) | 28.0 | OKC | 81.0 | 34.6 | 31.6 | 10.7 | 10.4 | 1.6 | 0.4 | 0.425 | 0.343 | 0.845 | 13.1 | 0.224 |
| 2 | 2015-16 | NBA | Stephen Curry | (V) | 27.0 | GSW | 79.0 | 34.2 | 30.1 | 5.4 | 6.7 | 2.1 | 0.2 | 0.504 | 0.454 | 0.908 | 17.9 | 0.318 |
| 3 | 2014-15 | NBA | Stephen Curry | (V) | 26.0 | GSW | 80.0 | 32.7 | 23.8 | 4.3 | 7.7 | 2.0 | 0.2 | 0.487 | 0.443 | 0.914 | 15.7 | 0.288 |
| 4 | 2013-14 | NBA | Kevin Durant | (V) | 25.0 | OKC | 81.0 | 38.5 | 32.0 | 7.4 | 5.5 | 1.3 | 0.7 | 0.503 | 0.391 | 0.873 | 19.2 | 0.295 |
- create dictionaries for renaming columns
- rename all columns with dictionaries
| season | league | player | voting | age | team | games_played | avg_minutes | avg_points | avg_rebounds | avg_assists | avg_steals | avg_blocks | field_goal_pct | three_pt_pct | free_throw_pct | win_shares | win_shares_per_48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-18 | NBA | James Harden | (V) | 28.0 | HOU | 72.0 | 35.4 | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | 0.449 | 0.367 | 0.858 | 15.4 | 0.289 |
| 1 | 2016-17 | NBA | Russell Westbrook | (V) | 28.0 | OKC | 81.0 | 34.6 | 31.6 | 10.7 | 10.4 | 1.6 | 0.4 | 0.425 | 0.343 | 0.845 | 13.1 | 0.224 |
| 2 | 2015-16 | NBA | Stephen Curry | (V) | 27.0 | GSW | 79.0 | 34.2 | 30.1 | 5.4 | 6.7 | 2.1 | 0.2 | 0.504 | 0.454 | 0.908 | 17.9 | 0.318 |
| 3 | 2014-15 | NBA | Stephen Curry | (V) | 26.0 | GSW | 80.0 | 32.7 | 23.8 | 4.3 | 7.7 | 2.0 | 0.2 | 0.487 | 0.443 | 0.914 | 15.7 | 0.288 |
| 4 | 2013-14 | NBA | Kevin Durant | (V) | 25.0 | OKC | 81.0 | 38.5 | 32.0 | 7.4 | 5.5 | 1.3 | 0.7 | 0.503 | 0.391 | 0.873 | 19.2 | 0.295 |
| year | rank | player | age | team | first_place_votes | points_won | points_max | vote_share | games_played | ... | avg_points | avg_rebounds | avg_assists | avg_steals | avg_blocks | field_goal_pct | three_pt_pct | free_throw_pct | win_shares | win_shares_per_48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1956 | 1 | Bob Pettit | 23 | STL | 33.0 | 33.0 | 80 | 0.413 | 72 | ... | 25.7 | 16.2 | 2.6 | NaN | NaN | 0.429 | NaN | 0.736 | 13.8 | 0.236 |
| 1 | 1956 | 2 | Paul Arizin | 27 | PHW | 21.0 | 21.0 | 80 | 0.263 | 72 | ... | 24.2 | 7.5 | 2.6 | NaN | NaN | 0.448 | NaN | 0.810 | 12.2 | 0.214 |
| 2 | 1956 | 3 | Bob Cousy | 27 | BOS | 11.0 | 11.0 | 80 | 0.138 | 72 | ... | 18.8 | 6.8 | 8.9 | NaN | NaN | 0.360 | NaN | 0.844 | 6.8 | 0.119 |
| 3 | 1956 | 4 | Mel Hutchins | 27 | FTW | 9.0 | 9.0 | 80 | 0.113 | 66 | ... | 12.0 | 7.5 | 2.7 | NaN | NaN | 0.425 | NaN | 0.643 | 4.4 | 0.095 |
| 4 | 1956 | 5T | Dolph Schayes | 27 | SYR | 2.0 | 2.0 | 80 | 0.025 | 72 | ... | 20.4 | 12.4 | 2.8 | NaN | NaN | 0.387 | NaN | 0.858 | 11.8 | 0.225 |
5 rows × 21 columns
Cleaning Up the Rest of the Data
<class 'pandas.core.frame.DataFrame'>
Index: 113 entries, 0 to 112
Data columns (total 18 columns):
season 113 non-null object
league 113 non-null object
player 113 non-null object
voting 73 non-null object
age 73 non-null float64
team 73 non-null object
games_played 73 non-null float64
avg_minutes 73 non-null float64
avg_points 73 non-null float64
avg_rebounds 73 non-null float64
avg_assists 73 non-null float64
avg_steals 50 non-null float64
avg_blocks 50 non-null float64
field_goal_pct 73 non-null float64
three_pt_pct 48 non-null float64
free_throw_pct 73 non-null float64
win_shares 73 non-null float64
win_shares_per_48 73 non-null float64
dtypes: float64(13), object(5)
memory usage: 16.8+ KB
<class 'pandas.core.frame.DataFrame'>
Index: 972 entries, 0 to 971
Data columns (total 21 columns):
year 972 non-null int64
rank 972 non-null object
player 972 non-null object
age 972 non-null int64
team 972 non-null object
first_place_votes 972 non-null float64
points_won 972 non-null float64
points_max 972 non-null int64
vote_share 972 non-null float64
games_played 972 non-null int64
avg_minutes 972 non-null float64
avg_points 972 non-null float64
avg_rebounds 972 non-null float64
avg_assists 972 non-null float64
avg_steals 756 non-null float64
avg_blocks 756 non-null float64
field_goal_pct 972 non-null float64
three_pt_pct 621 non-null float64
free_throw_pct 972 non-null float64
win_shares 972 non-null float64
win_shares_per_48 972 non-null float64
dtypes: float64(14), int64(4), object(3)
memory usage: 167.1+ KB
We are not done yet. A lot of out numeric columns are missing data because players didn’t accumulate any of those stats. For example, the 3 point line is introduced in 1982 and all players in preceding seasons don’t have this statistic. Additionally, we want to select the columns with numeric data and then replace the NaNs (the current value that represents the missing data) with 0s, as that is a more appropriate value.
<class 'pandas.core.frame.DataFrame'>
Index: 972 entries, 0 to 971
Data columns (total 21 columns):
year 972 non-null int64
rank 972 non-null object
player 972 non-null object
age 972 non-null int64
team 972 non-null object
first_place_votes 972 non-null float64
points_won 972 non-null float64
points_max 972 non-null int64
vote_share 972 non-null float64
games_played 972 non-null int64
avg_minutes 972 non-null float64
avg_points 972 non-null float64
avg_rebounds 972 non-null float64
avg_assists 972 non-null float64
avg_steals 972 non-null float64
avg_blocks 972 non-null float64
field_goal_pct 972 non-null float64
three_pt_pct 972 non-null float64
free_throw_pct 972 non-null float64
win_shares 972 non-null float64
win_shares_per_48 972 non-null float64
dtypes: float64(14), int64(4), object(3)
memory usage: 167.1+ KB
- remove ABA winners
- remove MVP summary table
We are finally done cleaning the data and now we can save it to a CSV file.
(63, 18)
| season | league | player | voting | age | team | games_played | avg_minutes | avg_points | avg_rebounds | avg_assists | avg_steals | avg_blocks | field_goal_pct | three_pt_pct | free_throw_pct | win_shares | win_shares_per_48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2017-18 | NBA | James Harden | (V) | 28.0 | HOU | 72.0 | 35.4 | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | 0.449 | 0.367 | 0.858 | 15.4 | 0.289 |
| 1 | 2016-17 | NBA | Russell Westbrook | (V) | 28.0 | OKC | 81.0 | 34.6 | 31.6 | 10.7 | 10.4 | 1.6 | 0.4 | 0.425 | 0.343 | 0.845 | 13.1 | 0.224 |
| 2 | 2015-16 | NBA | Stephen Curry | (V) | 27.0 | GSW | 79.0 | 34.2 | 30.1 | 5.4 | 6.7 | 2.1 | 0.2 | 0.504 | 0.454 | 0.908 | 17.9 | 0.318 |
| 3 | 2014-15 | NBA | Stephen Curry | (V) | 26.0 | GSW | 80.0 | 32.7 | 23.8 | 4.3 | 7.7 | 2.0 | 0.2 | 0.487 | 0.443 | 0.914 | 15.7 | 0.288 |
| 4 | 2013-14 | NBA | Kevin Durant | (V) | 25.0 | OKC | 81.0 | 38.5 | 32.0 | 7.4 | 5.5 | 1.3 | 0.7 | 0.503 | 0.391 | 0.873 | 19.2 | 0.295 |
(972, 21)
| year | rank | player | age | team | first_place_votes | points_won | points_max | vote_share | games_played | ... | avg_points | avg_rebounds | avg_assists | avg_steals | avg_blocks | field_goal_pct | three_pt_pct | free_throw_pct | win_shares | win_shares_per_48 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 959 | 2018 | 1 | James Harden | 28 | HOU | 86.0 | 965.0 | 1010 | 0.955 | 72 | ... | 30.4 | 5.4 | 8.8 | 1.8 | 0.7 | 0.449 | 0.367 | 0.858 | 15.4 | 0.289 |
| 960 | 2018 | 2 | LeBron James | 33 | CLE | 15.0 | 738.0 | 1010 | 0.731 | 82 | ... | 27.5 | 8.6 | 9.1 | 1.4 | 0.9 | 0.542 | 0.367 | 0.731 | 14.0 | 0.221 |
| 961 | 2018 | 3 | Anthony Davis | 24 | NOP | 0.0 | 445.0 | 1010 | 0.441 | 75 | ... | 28.1 | 11.1 | 2.3 | 1.5 | 2.6 | 0.534 | 0.340 | 0.828 | 13.7 | 0.241 |
| 962 | 2018 | 4 | Damian Lillard | 27 | POR | 0.0 | 207.0 | 1010 | 0.205 | 73 | ... | 26.9 | 4.5 | 6.6 | 1.1 | 0.4 | 0.439 | 0.361 | 0.916 | 12.6 | 0.227 |
| 963 | 2018 | 5 | Russell Westbrook | 29 | OKC | 0.0 | 76.0 | 1010 | 0.075 | 80 | ... | 25.4 | 10.1 | 10.3 | 1.8 | 0.3 | 0.449 | 0.298 | 0.737 | 10.1 | 0.166 |
5 rows × 21 columns
Review
- In this tutorial, we learned how to
- examine the html structure of webpage
- use functions based on the
Beautiful Soupmodule to parse tables on multiple webpage into .csv - analyzed a .csv file using the
Pandasmodule
Download this notebook or see a static view here
last updated: 2019-05-27 22:35
System and module version information:
Python version: sys.version_info(major=3, minor=7, micro=1, releaselevel='final', serial=0)
urllib.request version: 3.7
pandas version: 0.23.4
Beautiful Soup version: 4.6.3