Researchers have used twitter posts to assess risks for cardiovascular disease in different communities. How did they do it? Here's a summary of some of the methods, as summarized by Wray Herbert in the Huffington Post. As you read it, ask yourself, "what is the unit of analysis of this study?" (Hint--the unit of analysis is not "the person," which it usually is in psychology. It's something different):
The Penn scientists are pioneers in an emerging field called digital epidemiology, and their aim is to use social media as a cheap and flexible method to assess the psychological traits -- and thus health risks -- of entire communities. To test this method's potential, the scientists collected 148 million tweets from across the U.S., sorted into their 1,347 counties of origin. The scientists also gathered socioeconomic and demographic data on these counties, which are home to 88 percent of Americans.
So the unit of analysis is not person, but county, right? That means that if you were to sketch a scatterplot from this study, each dot will represent a county.
Why were they interested in emotional content of the tweets from each county, anyway? Here's why:
Scientists have identified many of the key risk factors for heart disease, such as smoking, inactivity, obesity and hypertension, and these insights have significantly diminished risk of the world's leading killer. Psychological traits such as chronic stress and depression are also important risk factors, while optimism and social support are known to be protective. These psychological characteristics often affect entire communities, putting large numbers of people at risk for disease.
Now, on to the results. You should be able to make a scatterplot of some of them--in fact, I'll ask you to do that in a minute.
They measured specific words and topics, both negative (hostility, cursing, aggression, boredom and fatigue) and positive (wonder, hope, triumph, opportunity), and used these linguistic patterns to characterize communities at risk for heart disease. They then compared these risk patterns to the actual mortality rates for each county, obtained from the Centers for Disease Control. [They found that}...negative emotions, disengagement and (especially) anger were all significantly correlated with heart disease. ... By contrast, positive emotions and engagement were associated with lower heart disease mortality. Engagement with life -- considered a key component of successful aging -- emerged as the most potent protective factor.
a) Sketch a well-labelled scatterplot of the correlation, described above, between negative emotions and heart disease.
b) Now sketch a well-labelled scatterplot of the correlation, described above, between anger and heart disease. (Compare it to the scatterplot you made in a)
c) Now sketch a well-labelled scatterplot of the correlation, described above, between positive emotions and heart disease mortality.
Here's an additional comment about the study.
This held true even after controlling for income and education, suggesting that Twitter language captures important information not accounted for by socioeconomic status.
d) The above quote should alert you that they used multiple regression in their analyses. Why would that be important for their investigation?
e) What would have been the DV, or criterion variable, in their regression analysis? What would have been the predictor variables?
f) Challenge question! The following quote from the story might tell you something important about the sizes of the betas in their regression table. What do you think it is telling you?
What's more, Twitter language was a better predictor of heart disease mortality than 10 common demographic and behavioral risk factors, including such infamous ones as smoking and high blood pressure.