As I approach graduation, diversity and representation has been on my mind. It’s been a key point of discussion in one of my classes this quarter, Leadership Insights and Skills for Data Scientists. It is of fundamental importance for data scientists, especially in the tech industry, to be aware of diversity or the lack thereof in their surroundings. As companies turn to data to answer some of their pressing questions, it is important to take a step back and consider what this data is actually telling us.
The data boom that has happened recently has led to plenty of opportunities in the tech industry. However, this boom also comes with its drawbacks. The artificial intelligence community has been portrayed for a long time as creating impartial and objective solutions to human problems. However, this idea is just a myth, as often data solutions carry bias and subjectivity.
This bias exists in two ways. First, the designer of the algorithm can be biased. Big companies like Facebook or Google, for example, have less than 15 percent of women working in AI research. These companies also have less than 4 percent of black employees and are known to discriminate against “older” employees. Like many aspects of our social system, the AI industry is dominated by white men between the ages of 18 and 35. The quality, accuracy and reliability of the products created suffer from it. These employees inject their biases in how they acquire, treat and format data. This directly impacts the models and algorithms they develop. A data scientist might choose to exclude some variables trying to prove a preexisting assumption. He might not control for outliers, for example, someone whose bank account is significantly bigger than anybody else’s in the dataset, which would skew the models. Yet another example is the possible presence of confounding variables, an extra variable that has an effect on what you want to predict. A correlation between murder rate and sale of ice cream can be caused by the raise in temperature, not by murders actually encouraging people to buy ice cream. Even worse, these biases perpetuate existing stereotypes and transfer a continued culture of discrimination into AI.
The solution to this problem is as often simple in thought but difficult to execute.
Big tech companies, especially the ones driving innovation in AI, need to hire a more diverse workforce. The industry needs more people of color and women to counterbalance the biases that white male tech workers possess. The program I’m in currently does a really good job — this program brings in 50 percent men and women, 50 percent international and domestic students and 50 percent of students with previous work experience. It is a tough task, but bringing different opinions and perspectives to the table is as important in the AI industry as anywhere else.
The second challenge facing the AI community is that the bias exists in the datasets themselves. In any algorithm or model developed in AI and data science, the model will only ever be as good as the data is. Just like there needs to be more diversity in the people who treat and handle the data, there needs to be more diversity in who the data is about. The training data can be incomplete or non-representative of the population or, as previously mentioned, can carry the bias of its maker. Some models built on voice recording datasets have had trouble identifying higher-pitched voices for example, which is a consequence of male-dominated datasets. The same goes for facial recognition algorithms, which perform much better on men with lighter skin, again showing bias against women and people of color.
People are attempting to curb this phenomenon. A bill introduced in April 2019, named the Algorithmic Accountability Act of 2019, was referred to the House Committee on Energy and Commerce. The act would lead to a greater assessment of whether the new technologies developed are negatively impacted by race, gender or other biases.
It seems to be a step in the right direction, although there is no telling when the legislature would be enforced.
If the algorithms did not carry the biases we as humans do, this would not be a problem. Before these algorithms evolve from being a novelty to being the norm, people in the tech industry — which will soon also include me — need to seriously evaluate and consider the implications of building the future of AI. There are moral and ethical ramifications that they need to be aware of and correct for when working to build the next big thing.
Marcus Thuillier is a second-year graduate student. He can be contacted at [email protected] If you would like to respond publicly to this op-ed, send a Letter to the Editor to [email protected] The views expressed in this piece do not necessarily reflect the views of all staff members of The Daily Northwestern.