Researcher Joy Buolamwini of MIT has conducted a fascinating study of how accurately image-processing algorithms detect and classify the faces of people from different racial and gender backgrounds. She coined the term "the coded gaze" for a bias she found in these algorithms. Her research was inspired by a personal experience. Joy is a Black woman, and she noticed that computers were able to correctly recognize the faces of her light-skinned friends. However, when she gave the facial recognition computers an image of herself, they did not accurately detect it was a face at all. Other times, it detected her face, but misgendered it as male. The tools she tried included commercially-available image processors from Microsoft, Kairos, IBM, and Face++.
Joy's research is an example of how artificial intelligence (AI) tools can recreate biases in the world. Specifically, because AI is trained on a corpus of images and text that contains gender and racial biases, its algorithms will tend to reproduce those biases.
To get started with the questions, watch this 5-minute video in which she describes the overall patterns.
The research Dr. Buolamwini designed can be described as a factorial design. Let's walk through it. In addition to watching the video, you could also explore the website, www.gendershades.org.
Buolamwini's project systematically tested how accurate software from three companies (IBM, Microsoft, and Face ++) was, across a large sample of faces that varied in terms of skin type and gender. Buolamwini used publicly-available photos from members of parliament from 8 countries. Because she used known people, she knew ahead of time which faces were men and which were women.
In her analysis, the dependent variable was accuracy, and its levels were yes (the computer correctly identified the target's gender) and no (the computer did not)
Here are the accuracy results from the IBM tool, based on data presented in the video (minute 2:42-3:00)
Darker subjects accuracy: 77.6%
Lighter subjects accuracy: 96.8%
However, these two percentages could be further broken down by target sex:
Darker male accuracy: 88%
Darker female accuracy: 65.3%
Lighter male accuracy: 99.7%
Lighter female accuracy: 92.9%
As you can see, the results mean that IBM's product failed to identify darker skinned women up to 1/3 of the time! The other two products were just a little bit better.
Buolamwini points to the fact that although we need to know more about why this bias exists, it can almost certainly traced back to the fact that the companies didn't train their AI tools on enough dark-skinned male and female faces.
Questions
a) If you consider only the data from IBM, what were the two independent variables in this analysis? What were each IV's levels?
b) Convert the data above (The four categories) to a carefully labelled 2x2 grid, as you see throughout Chapter 12 of your text.
c) Use your grid (and your calculator) to estimate the marginal means for each of the IVs. (Hint: You can see two of the marginal means above!)
d) Now make a bar graph or a line graph of the four data points. Label your axes mindfully.
e) Next, assuming that differences larger than 2% are statistically significant, write sentences for each main effect and interaction. Use these sentence templates:
There is/is not a main effect for _[IV #1]_, such that ____ faces are recognized more accurately than ____ faces.
There is/is not a main effect for _[IV #2]_, such that ____ faces are recognized more accurately than ____ faces.
There is/is not an interaction between _[IV #1] and [IV #2]_, such that for darker skinned faces, ____ are recognized more accurately than ____, but for lighter skinned faces, ____ are recognized more accurately than_____.
f) Buolamwini's research illustrates that machines are not neutral. What is one real-world application of this research that you can anticipate?