Racism and Diversity in Data Analysis

In light of the recent election, it is more important than ever to look at how and where we are responsible for perpetuating prejudice. In a previous article, DataCrunch introduced the concept of “Weapons of Math Destruction,” which are data models built from a limited or biased sample of data that result in toxic feedback loops. Since this explanation is most often attributed to artificial intelligence, there is little discussion about how this description could also illuminate the workings of the human mind. While many might want to think of this narrow-mindedness as below the mental capacity of human beings, such a viewpoint is dangerous in that makes having a conversation about prejudice difficult. Like artificial intelligence, humans create internal models in their minds, which often lead to the creations of ideals such a stereotypes. The flaw of these models is that they are based on the past, and if not updated or addressed they will continue to run off of the same data they were originally fed – even if times have changed.

At an individual level, racism is a toxic feedback loop. People want to be able to predict how other people will behave, and it can be far too easy to create “a binary prediction that all people of that race [or gender, sexuality, religion, political group, etc] will behave that same way,” (O’Neil, 23). If an issue like this develops in a technical model, it isn’t too difficult to go back and manually adjust the data input or change the important factors. But people with racist beliefs “don’t spend a lot of time hunting down reliable data to train their twisted models,” (23) They will continue to gladly absorb the data that seems to confirm their beliefs, and will refute data that tests them.

While we’re thinking deeply about how we can address the racism problem that this election season has pushed out from under the rug for those who don’t experience it, it’s important to acknowledge how art and technology imitate life, and vice-versa. It’s important to think about “how A.I. systems can become more diverse and incorporate more diverse players,” (Riley). For example, a popular form of artificial intelligence are language analysis systems built off of “standardized English.” While this input might not seem like a problem when analyzing Shakespeare or Dickens, it can have a negative effect when branching out into the texts and spaces of minority communities. To demonstrate this, a group of analysts studied the activity of the black community on Twitter using one of these systems. It turns out that when “fed only articles from the Wall Street Journal, language from societal groups like Black Twitter isn’t seen by the machine as English at all,” which is a huge blunder if these machines are trying to analyze and represent the world as it actually is (Riley).

The danger of these faulty algorithms isn’t just in the scorning of Black dialects. As reliance on internet and A.I. grows, these algorithms are capable of recreating the systematic roadblocks that African-Americans face in reality. For example, due to language processing’s struggle to understand the syntax of African-American language, these sites “could actually be pushed down in search results” by search engines, reproducing the systematic issues of the non-digital world (Riley). Hate is taught, and the same is true of big data systems. What you put in is what you get out, so a limited and biased starting sample is not going to yield results that are representative of the world we aim to serve.

While big data might never have a malicious intent, it always has a social impact, and we have a responsibility to think about this as we train and grow future technology students. Opinions on the best way to approach these issues are mixed (Gershgorn). Perhaps the answer is “better data,” – but how do we achieve this? Since the data supposedly representing an objective sample is chosen by the person creating algorithms, who is teaching them, and the opinions and life experiences of these people, matter. How do we get a more diverse set of people on board? That involves getting the attention of big tech companies, who “usually don’t care about racial bias, as they just want to get to market quickly,” (Pearson). It is imperative that they start to care.  At companies like Google and Facebook, the percentage of minority employees is still hovering around 2% (Riley). We need to get more opinions and viewpoints into the creation process. It is hard to show diversity without accurate representation. In this two-way street, it should be our goal to monitor the impact of our algorithms and support minority involvement in technological analysis as best we can. We need to get more minorities into the room where it happens.

Language analysis systems are not the only example of this issue. This flaw in the training of A.I. systems can be seen everywhere from algorithms “found to rate white names as more “pleasant” than black names,” (Pearson) to image processing systems that evaluate light-skinned faces as more beautiful than dark-skinned ones. If the input data of these systems reflects the stereotype of reality (meaning only white names and faces are fed), “then the output of the learning algorithm also captures these stereotypes,” (Riley). While changing the stereotypes held in the minds of people can be very difficult, it is certainly something we need to be working on. But during the moments in-between, we can focus on the much easier task of thinking critically about our algorithms, diversifying our data, and stopping online activity from mirroring reality.

This article is part of a series inspired by Weapons of Math Destruction by Cathy O’Neil. The book chronicles O’Neil’s growing disillusionment with the data economy as she learns that data can be used to fuel toxic feedback loops. DataCrunch hopes to expand on certain examples given in her book in order to educate readers about the responsibilities that come with data analysis. 


Gershgorn, Dave. “Artificial Intelligence Judged a Beauty Contest, and Almost All the Winners Were White.” Quartz. N.p., 06 Sept. 2016. Web. <http://qz.com/774588/artificial-intelligence-judged-a-beauty-contest-and-almost-all-the-winners-were-white/>.

O’Neil, Cathy. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. New York: Crown, 2016. Print.

Pearson, Jordan. “Why An AI-Judged Beauty Contest Picked Nearly All White Winners.” Motherboard. N.p., 5 Sept. 2016. Web. <http://motherboard.vice.com/read/why-an-ai-judged-beauty-contest-picked-nearly-all-white-winners>.

Riley, Tonya. “A.I. Doesn’t Get Black Twitter.” Inverse. N.p., 22 Sept. 2016. Web. <https://www.inverse.com/article/21316-a-i-doesn-t-get-black-twitter-yet>.