Mortality Data and COVID-19 Disinformation - Part 3
Mortality Data and COVID-19 Disinformation - Part 3
In the previous two posts, we downloaded mortality data from the National Center for Health Statistics (NCHS) and downloaded population data from the Census. The goal of this post is to create mortality statistics by combining the mortality data with population data.
- load libraries
library(here)
library(reshape2)
library(tidyverse)
library(data.table)- read in Census population data
- read in NCHS mortality data
national_population_1999_2020 = read_csv(here("data/national_population_1999_2020.csv"))
yearly_deaths_by_state_1999_2020 = read_csv(here("data/yearly_deaths_by_state_1999_2020.csv"))names(national_population_1999_2020)
names(yearly_deaths_by_state_1999_2020)- what are the variable names in each dataset?
## [1] "state_name" "pop_estimate_1999" "pop_estimate_2000"
## [4] "pop_estimate_2001" "pop_estimate_2002" "pop_estimate_2003"
## [7] "pop_estimate_2004" "pop_estimate_2005" "pop_estimate_2006"
## [10] "pop_estimate_2007" "pop_estimate_2008" "pop_estimate_2009"
## [13] "pop_estimate_2010" "pop_estimate_2011" "pop_estimate_2012"
## [16] "pop_estimate_2013" "pop_estimate_2014" "pop_estimate_2015"
## [19] "pop_estimate_2016" "pop_estimate_2017" "pop_estimate_2018"
## [22] "pop_estimate_2019" "pop_estimate_2020"
## [1] "state_name" "year" "all_deaths"
- reshape mortality and population datasets into long datasets
yearly_deaths_by_state_1999_2020_long = reshape2::melt(yearly_deaths_by_state_1999_2020, id.vars = c("state_name", "year"))
national_population_1999_2020_long = reshape2::melt(national_population_1999_2020, id.vars = c("state_name"))
national_population_1999_2020_long$year = mapply(FUN= function(variable) strsplit(as.character(variable),"estimate_")[[1]][2], national_population_1999_2020_long$variable)
national_population_1999_2020_long$variable = mapply(FUN= function(variable) substr(as.character(variable),1,12), national_population_1999_2020_long$variable)- create national mortality and population data
mortality_time_series_national = rbindlist(list(national_population_1999_2020_long,
yearly_deaths_by_state_1999_2020_long),
use.names=TRUE) %>% filter(state_name=="United States")Now that we have a dataset with mortality and population data, we can create a wide data set to calculate mortality rates and a rate of change metric for the mortality rate
- reshape national mortality and population data into a wide mortality/population dataset from 1999 to 2020
- create mortality rate for each year (y). we will create mortality rates per 100,000 people to compare with the social media post that used the same scale
- also create mortality rate of change metric to understand how each current year’s mortality rate compares to the preceding year’s mortality rate
\[\text{Mortality Rate}_{y}= \frac{\text{All Deaths}_{y}}{\text{Population Estimate}_{y}} * \text{100, 000}\]
\[\text{Rate of Change}_{y}= \frac{\text{Mortality Rate}_{y}-\text{Mortality Rate}_{y-1}}{\text{Mortality Rate}_{y-1}}\]
us_mortality_data_1999_2020 = reshape2::dcast(mortality_time_series_national, state_name + year ~ variable) %>%
arrange(year) %>%
mutate(mortality_rate = round((all_deaths/pop_estimate)*100000),
mortality_rate_lag = lag(mortality_rate, order_by = year),
mortality_rate_roc = (mortality_rate - mortality_rate_lag)/mortality_rate_lag)Let’s use the government data to recreate the table from the social media post that spurred this blog series
| Year | Population | Deaths | Mortality Rate |
|---|---|---|---|
| 1999 | 280,466,621 | 2,391,399 | 853 |
| 2000 | 281,424,600 | 2,403,351 | 854 |
| 2001 | 284,968,955 | 2,416,425 | 848 |
| 2002 | 287,625,193 | 2,443,387 | 850 |
| 2003 | 290,107,933 | 2,448,288 | 844 |
| 2004 | 292,805,298 | 2,397,615 | 819 |
| 2005 | 295,516,599 | 2,448,017 | 828 |
| 2006 | 298,379,912 | 2,426,264 | 813 |
| 2007 | 301,231,207 | 2,423,712 | 805 |
| 2008 | 304,093,966 | 2,471,984 | 813 |
| 2009 | 306,771,529 | 2,437,163 | 794 |
| 2010 | 308,745,538 | 2,468,435 | 800 |
| 2011 | 311,583,481 | 2,515,458 | 807 |
| 2012 | 313,877,662 | 2,543,279 | 810 |
| 2013 | 316,059,947 | 2,596,993 | 822 |
| 2014 | 318,386,329 | 2,626,418 | 825 |
| 2015 | 320,738,994 | 2,712,630 | 846 |
| 2016 | 323,071,755 | 2,744,248 | 849 |
| 2017 | 325,122,128 | 2,813,503 | 865 |
| 2018 | 326,838,199 | 2,839,076 | 869 |
| 2019 | 328,329,953 | 2,852,609 | 869 |
| 2020 | 329,484,123 | 3,258,883 | 989 |
We now have all the data we need to compare the mortality rates obtained from government data with the mortality rates shown in the social media post. Continue to Part 4 to view the results of this comparison.