Mortality Data and COVID-19 Disinformation - Part 3

Kivan Polimis, Thu 14 January 2021, Tutorials

Mortality Data and COVID-19 Disinformation - Part 3

In the previous two posts, we downloaded mortality data from the National Center for Health Statistics (NCHS) and downloaded population data from the Census. The goal of this post is to create mortality statistics by combining the mortality data with population data.

library(here)
library(reshape2)
library(tidyverse)
library(data.table)
national_population_1999_2020 =  read_csv(here("data/national_population_1999_2020.csv"))
yearly_deaths_by_state_1999_2020 = read_csv(here("data/yearly_deaths_by_state_1999_2020.csv"))
names(national_population_1999_2020)
names(yearly_deaths_by_state_1999_2020)
##  [1] "state_name"        "pop_estimate_1999" "pop_estimate_2000"
##  [4] "pop_estimate_2001" "pop_estimate_2002" "pop_estimate_2003"
##  [7] "pop_estimate_2004" "pop_estimate_2005" "pop_estimate_2006"
## [10] "pop_estimate_2007" "pop_estimate_2008" "pop_estimate_2009"
## [13] "pop_estimate_2010" "pop_estimate_2011" "pop_estimate_2012"
## [16] "pop_estimate_2013" "pop_estimate_2014" "pop_estimate_2015"
## [19] "pop_estimate_2016" "pop_estimate_2017" "pop_estimate_2018"
## [22] "pop_estimate_2019" "pop_estimate_2020"
## [1] "state_name" "year"       "all_deaths"
yearly_deaths_by_state_1999_2020_long = reshape2::melt(yearly_deaths_by_state_1999_2020, id.vars = c("state_name", "year"))
national_population_1999_2020_long = reshape2::melt(national_population_1999_2020, id.vars = c("state_name"))
national_population_1999_2020_long$year = mapply(FUN= function(variable) strsplit(as.character(variable),"estimate_")[[1]][2], national_population_1999_2020_long$variable)
national_population_1999_2020_long$variable = mapply(FUN= function(variable) substr(as.character(variable),1,12), national_population_1999_2020_long$variable)
mortality_time_series_national = rbindlist(list(national_population_1999_2020_long,
                                                yearly_deaths_by_state_1999_2020_long),
                                           use.names=TRUE) %>% filter(state_name=="United States")

Now that we have a dataset with mortality and population data, we can create a wide data set to calculate mortality rates and a rate of change metric for the mortality rate

$$\text{Mortality Rate}_{y}= \frac{\text{All Deaths}_{y}}{\text{Population Estimate}_{y}} * \text{100, 000}$$
$$\text{Rate of Change}_{y}= \frac{\text{Mortality Rate}_{y}-\text{Mortality Rate}_{y-1}}{\text{Mortality Rate}_{y-1}}$$
us_mortality_data_1999_2020 = reshape2::dcast(mortality_time_series_national, state_name + year ~ variable) %>%
  arrange(year) %>%
  mutate(mortality_rate = round((all_deaths/pop_estimate)*100000),
         mortality_rate_lag = lag(mortality_rate, order_by = year),
         mortality_rate_roc = (mortality_rate - mortality_rate_lag)/mortality_rate_lag)

Let's use the government data to recreate the table from the social media post that spurred this blog series

US Mortality Rate 1999 to 2020
Year Population Deaths Mortality Rate
1999 280,466,621 2,391,399 853
2000 281,424,600 2,403,351 854
2001 284,968,955 2,416,425 848
2002 287,625,193 2,443,387 850
2003 290,107,933 2,448,288 844
2004 292,805,298 2,397,615 819
2005 295,516,599 2,448,017 828
2006 298,379,912 2,426,264 813
2007 301,231,207 2,423,712 805
2008 304,093,966 2,471,984 813
2009 306,771,529 2,437,163 794
2010 308,745,538 2,468,435 800
2011 311,583,481 2,515,458 807
2012 313,877,662 2,543,279 810
2013 316,059,947 2,596,993 822
2014 318,386,329 2,626,418 825
2015 320,738,994 2,712,630 846
2016 323,071,755 2,744,248 849
2017 325,122,128 2,813,503 865
2018 326,838,199 2,839,076 869
2019 328,329,953 2,852,609 869
2020 329,484,123 3,258,883 989

We now have all the data we need to compare the mortality rates obtained from government data with the mortality rates shown in the social media post. Continue to Part 4 to view the results of this comparison.