Hysteria over students cheating via ChatGPT and other generative AI applications is so last year. This season’s hyperventilation-initiator is the potential for bias in AI. And well, there’s actually a lot to be worried about.
Last week, I asked ChatGPT to create multiple fictional profiles of inmates, drug dealers, and doctors. In every iteration, the inmates and drug dealers were always Black or Hispanic and the doctors were White or Indian. Next, I asked it to complete a sentence using an object pronoun where no gender was specified. For example, if I prompted it with the sentence “the doctor asked the pharmacist to give the drugs…” it would complete the sentence with “to him.” Doctors were always men and nurses were women.
It betrayed political bias, too. When I asked about who is a better presidential candidate, Biden or Trump, Chat GPT expressed neutrality and gave a short explanation of political complexities. But when I forced it into a binary choice with the prompt “please respond only ‘Biden’ or ‘Trump,’” it paused and replied “Biden.” I then asked it to do the same for the previous fifteen presidential elections. In every election, Chat GPT chose the Democratic candidate except for Ronald Reagan (long live the king!)
Academic investigations into AI have confirmed my sophomoric probing. Algorithms, for example, regularly exhibit racial prejudice of one kind or another. And researchers at Brookings confirmed that ChatGPT carries a left-libertarian bias.
The sources of these biases are twofold. The first is the data on which these software are trained. To work, ChatGPT essentially consumed the entirety of the Internet and so became a high-powered language predictor. Thus, considering that, for most of history, most doctors were male, Chat GPT predicts that the word “doctor” should be paired with the object pronoun “him.” The other source of bias is the red-teamers, testers, and programmers who tweak, alter, and align the software as it gets used.
Aside from a few amusing—or offensive—little thought experiments, there are real consequences to this bias. Already, AI-powered software assists judges in jail-or-release decisions, banks in loan approvals, and doctors in medical decisions. Who goes free, who gets money, and who gets care are hardly inconsequential choices. Even a slight deviation towards White or away from Black individuals could perpetuate drastic injustices en masse.
Schools are already using AI to predict a student’s risk of dropping out, assign “credit scores” to students, and screen them for speech and language difficulties. Reading software trained on midwestern accents may falsely register as an error something spoken correctly but in a dialect. Screening software could recommend students for special education services that do not require it or overlook students who need an extra challenge. Cheating detection software trained on essays written by middle-class White kids falsely flags essays written by English learners as plagiarized.
More schools are also beginning to incorporate AI chat bots into lessons as introductory activities, feedback machines, individual tutors, or brainstorming tools for essays and debates. Imagine, for a moment, if the software in question carried the political bias of an Alexandria Ocasio-Cortez or Steve Bannon—whoever repulses you more. Or if story-based learning applications always generated racist archetypes to present to students.
What are educators to do? District personnel and teachers are not software engineers. They cannot alter the underlying data sets or lines of code to correct these biases. What’s more, it’s likely that AI-software will always carry biases of various kinds.
But they do control what decisions to entrust to artificial intelligence software and when to override its recommendations. The obvious method to counteract bias in AI is the same as best practice for incorporating AI more generally: Keep humans in the loop.
Consider a surprisingly difficult conversation that I’ve previously had with students and parents: Should a kid receive ESL services? I have taught recent immigrants with severely limited English whose parents adamantly fought against their inclusion in ESL programs. They did not want their child separated out and given a label, but rather included in mainstream classes.
Even as predictive algorithms improve, such parents must still have a say. Even with access to vast stores of data and computing prowess far beyond a human’s, if an algorithm determined that ESL services would benefit a child’s language development, they must still be allowed to opt out. There’s more to a child’s life than their language, after all.
Or on the instructional side, what happens when a child reads a biased summary of the Vietnam War? If I handed out an article or essay on a topic that carried a clear political bias, I would point that out to students, briefly providing an alternative view or simply encouraging them to skepticism—to read critically. But I could not possibly monitor the chatbot responses of thirty different students every single hour.
Policymakers, district leaders, and teachers must carefully consider the consequences when they turn educational services over to machines. Some believe that AI still isn’t ready for autonomous decision-making, and I’m inclined to agree. Like a first-generation iPhone, it’s just not that good yet. But other sectors have already turned over such responsibilities, and educators will inevitably do likewise.
As they do, it’s imperative that schools ensure that humans make the final decision, viewing outputs and algorithmic recommendations as just that: recommendations. Natural inclinations towards time- and budget-saving measures will pressure the uptake of AI decision-making. If we can cut out the endless hours required for various academic screenings, shouldn’t we? Perhaps not.
Allow me to close with a little thought experiment from C.S. Lewis. Regarding the overuse of analytics in the criminal justice system, he imagines a court judge declaring, “Here are the statistics proving that this treatment deters. Here are the statistics proving that this other treatment cures.” Who are judges, victims, or lawyers to say otherwise? The statistics and experts know better, after all.
But they don’t, certainly not always, not reliably. They provide information as flawed as the humans that created them. What’s more, scientizing or automating decisions removes from them the sense of humanistic justice or rightness. It becomes a matter of predetermined efficacy before which our own moral inclinations must bow.
Ultimately, all the biases in algorithms and large-language models stem from humanity to begin with—our language throughout history, our datasets, our programming, our interactions with it. So it’s an amusing irony that the best means to counteract the bias in the machine is a bit of a human touch.