Mortality Data and COVID-19 Disinformation - Part 3

In the previous two posts, we downloaded mortality data from the National Center for Health Statistics (NCHS) and downloaded population data from the Census. The goal of this post is to create mortality statistics by combining the mortality data with population data.

load libraries

library(here)
library(reshape2)
library(tidyverse)
library(data.table)

read in Census population data
read in NCHS mortality data

national_population_1999_2020 =  read_csv(here("data/national_population_1999_2020.csv"))
yearly_deaths_by_state_1999_2020 = read_csv(here("data/yearly_deaths_by_state_1999_2020.csv"))

names(national_population_1999_2020)
names(yearly_deaths_by_state_1999_2020)

what are the variable names in each dataset?

##  [1] "state_name"        "pop_estimate_1999" "pop_estimate_2000"
##  [4] "pop_estimate_2001" "pop_estimate_2002" "pop_estimate_2003"
##  [7] "pop_estimate_2004" "pop_estimate_2005" "pop_estimate_2006"
## [10] "pop_estimate_2007" "pop_estimate_2008" "pop_estimate_2009"
## [13] "pop_estimate_2010" "pop_estimate_2011" "pop_estimate_2012"
## [16] "pop_estimate_2013" "pop_estimate_2014" "pop_estimate_2015"
## [19] "pop_estimate_2016" "pop_estimate_2017" "pop_estimate_2018"
## [22] "pop_estimate_2019" "pop_estimate_2020"

## [1] "state_name" "year"       "all_deaths"

reshape mortality and population datasets into long datasets

yearly_deaths_by_state_1999_2020_long = reshape2::melt(yearly_deaths_by_state_1999_2020, id.vars = c("state_name", "year"))
national_population_1999_2020_long = reshape2::melt(national_population_1999_2020, id.vars = c("state_name"))
national_population_1999_2020_long$year = mapply(FUN= function(variable) strsplit(as.character(variable),"estimate_")[[1]][2], national_population_1999_2020_long$variable)
national_population_1999_2020_long$variable = mapply(FUN= function(variable) substr(as.character(variable),1,12), national_population_1999_2020_long$variable)

create national mortality and population data

mortality_time_series_national = rbindlist(list(national_population_1999_2020_long,
                                                yearly_deaths_by_state_1999_2020_long),
                                           use.names=TRUE) %>% filter(state_name=="United States")

Now that we have a dataset with mortality and population data, we can create a wide data set to calculate mortality rates and a rate of change metric for the mortality rate

reshape national mortality and population data into a wide mortality/population dataset from 1999 to 2020
create mortality rate for each year (y). we will create mortality rates per 100,000 people to compare with the social media post that used the same scale
also create mortality rate of change metric to understand how each current year's mortality rate compares to the preceding year's mortality rate

$$\text{Mortality Rate}_{y}= \frac{\text{All Deaths}_{y}}{\text{Population Estimate}_{y}} * \text{100, 000}$$

$$\text{Rate of Change}_{y}= \frac{\text{Mortality Rate}_{y}-\text{Mortality Rate}_{y-1}}{\text{Mortality Rate}_{y-1}}$$

us_mortality_data_1999_2020 = reshape2::dcast(mortality_time_series_national, state_name + year ~ variable) %>%
  arrange(year) %>%
  mutate(mortality_rate = round((all_deaths/pop_estimate)*100000),
         mortality_rate_lag = lag(mortality_rate, order_by = year),
         mortality_rate_roc = (mortality_rate - mortality_rate_lag)/mortality_rate_lag)

Let's use the government data to recreate the table from the social media post that spurred this blog series

US Mortality Rate 1999 to 2020
Year	Population	Deaths	Mortality Rate
1999	280,466,621	2,391,399	853
2000	281,424,600	2,403,351	854
2001	284,968,955	2,416,425	848
2002	287,625,193	2,443,387	850
2003	290,107,933	2,448,288	844
2004	292,805,298	2,397,615	819
2005	295,516,599	2,448,017	828
2006	298,379,912	2,426,264	813
2007	301,231,207	2,423,712	805
2008	304,093,966	2,471,984	813
2009	306,771,529	2,437,163	794
2010	308,745,538	2,468,435	800
2011	311,583,481	2,515,458	807
2012	313,877,662	2,543,279	810
2013	316,059,947	2,596,993	822
2014	318,386,329	2,626,418	825
2015	320,738,994	2,712,630	846
2016	323,071,755	2,744,248	849
2017	325,122,128	2,813,503	865
2018	326,838,199	2,839,076	869
2019	328,329,953	2,852,609	869
2020	329,484,123	3,258,883	989

We now have all the data we need to compare the mortality rates obtained from government data with the mortality rates shown in the social media post. Continue to Part 4 to view the results of this comparison.