Parents of young children might be alarmed if they read coverage of a recent study on baby IQ. According to one journalist who covered it, the study found that babies tested IQ dropped over 20 points:
In the decade preceding the pandemic, the mean IQ score on standardised tests for children aged between three months and three years of age hovered around 100, but for children born during the pandemic that number tumbled to 78, according to the analysis, which is yet to be peer-reviewed.
When interviewed by the journalist, the authors of the research suggested it was the stress of the pandemic, preventing parents from having quality interactions:
The biggest reason behind the falling scores is likely the lack of stimulation and interaction at home, said Deoni. “Parents are stressed and frazzled … that interaction the child would normally get has decreased substantially.”
Any parent of small children is likely to be shocked by this finding. Twenty-two points is a huge drop in IQ.
Fortunately, Brown University economist Emily Oster provides a much-needed reality check on these results. Dr. Oster summarizes scientific studies on parenting and child development and summarizes key findings for her subscription blog. Oster summarized this empirical research for her readers as follows:
With these scores, the authors do three things.
- First, they compare the overall scores for children between three months and three years of age evaluated between 2011 and 2021. They show huge reductions (27 to 37 IQ points) in the 2021 scores relative to earlier years.
- Second, they analyze 39 children who were born shortly before the pandemic and were evaluated both before and during the pandemic. They actually do not find any changes in IQ for this group.
- Finally, they compare children born in the year before the pandemic with those born during the pandemic year. This sample is about 400 children in total. The reductions in IQ (Model 5, Table 5) at the lowest level of maternal education range from a 49-point reduction to an 82-point reduction. An 82-point reduction in IQ is … unfathomably large.
Again, taken at face value, these results are worrisome. However — and I cannot stress this enough — they are completely implausible.
Dr. Oster's position is related to statistical validity. In general, when asking about a study's statistical validity, we inquire about effect size ("How strong?"), precision of the estimate, or 95% CI ("How wrong?") and replication ("What else is known?").
Oster's argument that the results are "are completely implausible" is, in part, addressing effect size. It does not seem reasonable that children tested during the pandemic would have such a large decrease in IQ. One thing that can help us evaluate effect size is to compare our study's effect size to that of other known effect sizes. For example, in your textbook, Table 8.4 presents the size of several well-known relationships, such as difference in weight between males and females, or the correlation between people's height and their weight. In that table, it is suggested that certain results, such as a correlation of r = .40 in psychology might be so large as to be "too good to be true". Oster is using similar reasoning in her post when she writes that the results "are completely implausible."
Oster compares the result of the pandemic IQ study to one of the most well-known effect sizes in developmental risk:
Extremely low birth weight is among the most significant reducers of IQ, and even that is a fraction of the size of these effects.
This statement helps put the pandemic IQ result in context, and marks it as implausible. I think her perspective is valuable and helpful, and is likely to reassure parents.
So if the results are implausible, why did they occur? Oster proposes an explanation for the result. Rather than blame pandemic-stressed parents (as the researchers do), she suggests that the IQ testing in the study occurred while the adult testers were wearing masks. When a new adult is administering a verbal test to young children, it may have inadvertently added a high level of difficulty to the test for these 0 to 3 year olds.
Questions
a) The Guardian summarized the IQ study even though it had not yet been peer-reviewed. Is that a wise choice for a journalist?
b) Oster focused on the effect size of this study, but there are two other statistical validity questions we usually also ask. What question could you ask to get at the precision of this IQ estimate?
c) What question could you ask to get at replication of this estimate? And, why is replication important for any study?
d) Oster suggests that masked adult testers might be responsible for this unusual result. With this suggestion, which of the four big validities is she critiquing?
e) You could design a study to test whether having a masked adult tester does cause lower IQ estimates in infants they are evaluating. Propose the independent and dependent variable in such a study, and indicate what the results would be if masks do lead to a distortion of IQ scores.