Photo: Syda Productions/Shutterstock
Here is an article that amplifies Chapter 8 concepts. Using U.S. states as the unit of analysis, it depicts the correlation between mask-wearing and rates of knowing a person with COVID. Please scroll to the middle of this article from the Washington Post to see the scatterplot (we don't have permissions to show it here).
The journalist's writing makes the concepts extremely accessible to students and laypeople. Here's an example:
For all 50 states plus D.C., this chart plots the percentage of state residents who say they wear a mask in public all or most of the time (on the horizontal axis) and the percentage who say they know someone in their community with virus symptoms (on the vertical axis).
The journalist's orienting remarks are really helpful for a person viewing a scatterplot for the first time:
Take Wyoming and South Dakota, for instance, in the upper left-hand corner of the chart. Roughly 60 to 70 percent of state residents report frequent mask use, as shown on the bottom axis, which puts them at the bottom for mask rates. They also have some of the highest levels of observed covid-19 symptoms, approaching 40 and 50 percent.
Questions for students:
a) Looking at the scatterplot, what is one of the states with the highest rates of mask-wearing?
What is a state with one of the highest rates of "knowing someone with covid-19 symptoms"?
b) Is this correlation positive or negative? Is the association strong or weak? (btw, the r is about .85)
c) Can this association, on its own, support the claim that "wearing a mask prevents covid-19?" Apply the three causal criteria.
d) Here is a quote from the article that is relevant to the causation issue. Is Dr. Reinhart's comment referring to temporal precedence or internal validity? (and why?)
“There could be other explanations for the correlation,” [Reinhart] said. “For example, states that had worse outbreaks earlier in the pandemic both have higher mask usage now and more immunity.”
e) Let's talk about how they measured the two variables in this correlation--the construct validity. This is a special case in which construct validity overlaps with external validity. Can you explain why?
[to measure these two variables]...the CovidCast team partnered with Facebook, which is used by 70 percent of U.S. adults and has the ability to survey tens of thousands of them every day at relatively low cost. While the resulting state-level samples aren’t perfect representations of the general population, the researchers weight the responses using Census Bureau demographic data to ensure they’re a good approximation.
e) Challenge question: How do you think this scatterplot would have looked if the data came from June (when infection rates were mostly higher in the Northeast), instead of October?
Instructors:
The article's data comes from a Carnegie Mellon website that depicts data for several pandemic-related variables for the U.S. For example, you can see a map of the frequency of mask-wearing here. The data from both x and y axis appear to come from here. You can also search how often people in different states do Google searches for the term, "covid" and the percentages of physician appointments that are COVID-related. The site includes R code for their figures, so students might be able to download the raw data and run their own analyses, too.