Kivan Polimis, Thu 14 January 2021, Tutorials
In the previous two posts, we downloaded mortality data from the National Center for Health Statistics (NCHS) and downloaded population data from the Census. The goal of this post is to create mortality statistics by combining the mortality data with population data.
library(here)
library(reshape2)
library(tidyverse)
library(data.table)
national_population_1999_2020 = read_csv(here("data/national_population_1999_2020.csv"))
yearly_deaths_by_state_1999_2020 = read_csv(here("data/yearly_deaths_by_state_1999_2020.csv"))
names(national_population_1999_2020)
names(yearly_deaths_by_state_1999_2020)
## [1] "state_name" "pop_estimate_1999" "pop_estimate_2000"
## [4] "pop_estimate_2001" "pop_estimate_2002" "pop_estimate_2003"
## [7] "pop_estimate_2004" "pop_estimate_2005" "pop_estimate_2006"
## [10] "pop_estimate_2007" "pop_estimate_2008" "pop_estimate_2009"
## [13] "pop_estimate_2010" "pop_estimate_2011" "pop_estimate_2012"
## [16] "pop_estimate_2013" "pop_estimate_2014" "pop_estimate_2015"
## [19] "pop_estimate_2016" "pop_estimate_2017" "pop_estimate_2018"
## [22] "pop_estimate_2019" "pop_estimate_2020"
## [1] "state_name" "year" "all_deaths"
yearly_deaths_by_state_1999_2020_long = reshape2::melt(yearly_deaths_by_state_1999_2020, id.vars = c("state_name", "year"))
national_population_1999_2020_long = reshape2::melt(national_population_1999_2020, id.vars = c("state_name"))
national_population_1999_2020_long$year = mapply(FUN= function(variable) strsplit(as.character(variable),"estimate_")[[1]][2], national_population_1999_2020_long$variable)
national_population_1999_2020_long$variable = mapply(FUN= function(variable) substr(as.character(variable),1,12), national_population_1999_2020_long$variable)
mortality_time_series_national = rbindlist(list(national_population_1999_2020_long,
yearly_deaths_by_state_1999_2020_long),
use.names=TRUE) %>% filter(state_name=="United States")
Now that we have a dataset with mortality and population data, we can create a wide data set to calculate mortality rates and a rate of change metric for the mortality rate
us_mortality_data_1999_2020 = reshape2::dcast(mortality_time_series_national, state_name + year ~ variable) %>%
arrange(year) %>%
mutate(mortality_rate = round((all_deaths/pop_estimate)*100000),
mortality_rate_lag = lag(mortality_rate, order_by = year),
mortality_rate_roc = (mortality_rate - mortality_rate_lag)/mortality_rate_lag)
Let's use the government data to recreate the table from the social media post that spurred this blog series
Year | Population | Deaths | Mortality Rate |
---|---|---|---|
1999 | 280,466,621 | 2,391,399 | 853 |
2000 | 281,424,600 | 2,403,351 | 854 |
2001 | 284,968,955 | 2,416,425 | 848 |
2002 | 287,625,193 | 2,443,387 | 850 |
2003 | 290,107,933 | 2,448,288 | 844 |
2004 | 292,805,298 | 2,397,615 | 819 |
2005 | 295,516,599 | 2,448,017 | 828 |
2006 | 298,379,912 | 2,426,264 | 813 |
2007 | 301,231,207 | 2,423,712 | 805 |
2008 | 304,093,966 | 2,471,984 | 813 |
2009 | 306,771,529 | 2,437,163 | 794 |
2010 | 308,745,538 | 2,468,435 | 800 |
2011 | 311,583,481 | 2,515,458 | 807 |
2012 | 313,877,662 | 2,543,279 | 810 |
2013 | 316,059,947 | 2,596,993 | 822 |
2014 | 318,386,329 | 2,626,418 | 825 |
2015 | 320,738,994 | 2,712,630 | 846 |
2016 | 323,071,755 | 2,744,248 | 849 |
2017 | 325,122,128 | 2,813,503 | 865 |
2018 | 326,838,199 | 2,839,076 | 869 |
2019 | 328,329,953 | 2,852,609 | 869 |
2020 | 329,484,123 | 3,258,883 | 989 |
We now have all the data we need to compare the mortality rates obtained from government data with the mortality rates shown in the social media post. Continue to Part 4 to view the results of this comparison.