According to several previous studies in psychological science, men with wider faces--a greater ratio of width to height (like in the photo on the left, compared to the right)--tend to show antisocial tendencies such as racial bias, exploitation, and even aggression. Researchers attributed this link to exposure to testosterone during development, which, they say, causes both wider facial structure and antisocial behavior.
However, a study led by Michal Kosinski has questioned this basic relationship. Here's how the APS website describes the inspiration for the work:
Kosinski found that previous studies often had methodological shortcomings such as small sample sizes. Half of the previous studies that he identified involved fewer than 25 participants and the average sample size was 40. And seven out of ten of the studies only just crossed the conventional threshold for significance of p=.05.
These factors led Kosinski to conduct a large-scale study of face measurements and behavioral tendencies. His research, published in Psychological Science, finds no relationship between facial width-to-height ratios (fWHR) and behavioral tendencies in a large sample of over 135,000 participants.
Questions
a) Review the material in Chapters 11 and 14, and explain why studies based on small samples can lead to results that are difficult to replicate. (You might also want to review the "kindergarten height" example in this recent blog post).
b) Why is it a problem that, in 7 out of 10 studies, the results "only just crossed the conventional threshold for significance?"
Now read a bit more about the "big data" methods that Kosinski employed in his research:
Kosinski turned to a very large dataset collected via a Facebook app called MyPersonality.org. The app comprised a collection of psychometric tests and surveys that Facebook users could take and then see how they scored — they could also volunteer their scores and Facebook profile data to be used in research projects. Using this bank of over 800,000 users’ surveys and over 2 million profile pictures, Kosinski tested his research question: Do broad faces indicate antisocial tendencies? [...]
After a preliminary experiment with 1,692 users showed that a computer could measure width-to-height ratios with the same accuracy that humans could, Kosinski analyzed 173,241 photos from 137,163 male and female participants (some users had multiple profile pictures and their measurements were averaged before analysis).
The results showed that facial broadness didn’t substantially correlate with any of the 55 personality measures tested....For example, broader-faced people reported themselves to be more prosocial, sympathetic, trusting, and cooperative,” says Kosinski. “Also, broader-faced people reported less interest in drug use, weapons, piercing, and tattoos. Moreover, broader-faced people did not score significantly higher on any of the traits positively related to antisocial and aggressive behavioral tendencies, including the personality facets of excitement-seeking and anger, impulsiveness, and militarism (i.e., interest in paramilitary groups, the armed forces, bodybuilding, martial arts, and survivalism).”
c) According to this description, Kosinski is basically running a series of bivariate correlations. Each one was between a self-reported trait and _________?
d) Pick one of the personality variables tested in the study. Now sketch a scatterplot of the result, labelling your axes carefully.
e) Kosinski's sample included more than a hundred thousand users. Why might this lead to a more stable estimate of the true relationship between facial broadness and personality? (This is the complement to question a), above)
f) Kosinki's study is an example of a "failure to replicate." Review the concepts in Table 14.1 and indicate which elements might apply in this case.
g) What questions might you ask about the construct validity of the personality measures used in Kosinski's study?
Suggested answers
a) and e) Small samples are more likely to be affected by one or two extreme scores, whereas in very large samples, the extreme scores are much more likely to be balanced out by other scores. The gifs in this blog post show the principle dynamically.
b) Some researchers have proposed that when a manuscript reports p-values very close to the conventional cutoff of .05 (p-values of .04 or .03), it's a sign that a researcher might have "p-hacked" the study. P-hacking is when a researcher goes through a series of options when analyzing the data, such as eliminating outliers, adding covariates, or testing multiple dependent measures, stopping analysis only when p just crosses under the .05 threshold. Therefore, when, in a body of literature, most of the p-values are just below .05, we might suspect that the underlying finding is a fluke, not a real result.
c) Facial broadness, as measured by width-to-height ratio.
d) One axis should be labelled "facial broadness" and the other might be labelled "interest in drug use." The cloud of points should be extremely spread out, showing no pattern or discernible slope.
e) see a) answer above.
f) The concepts in Table 14.1 that seem to apply best are the third (the original study's sample was very small) and perhaps the fourth (the original study may have tried multiple statistical analyses). (We cannot be sure without more investigation into the original studies, but these are the two issues raised in the APS summary of Kosinski's work.)
g) Indeed, we don't know much about the personality measures used in the study. The full manuscript might report more about whether data collected with these personality measures shows that they are reliable and valid.