Annette Zimmermann, Elena Di Rosa, Hochan Kim – Boston Review
We need greater democratic oversight of AI not just from developers and designers, but from all members of society.
A great deal of recent public debate about artificial intelligence has been driven by apocalyptic visions of the future. Humanity, we are told, is engaged in an existential struggle against its own creation. Such worries are fueled in large part by tech industry leaders and futurists, who anticipate systems so sophisticated that they can perform general tasks and operate autonomously, without human control. Stephen Hawking, Elon Musk, and Bill Gates have all publicly expressed their concerns about the advent of this kind of “strong” (or “general”) AI—and the associated existential risk that it may pose for humanity. In Hawking’s words, the development of strong AI “could spell the end of the human race.”
Never mind the far-off specter of doomsday; AI is already here, working behind the scenes of many of our social systems.
These are legitimate long-term worries. But they are not all we have to worry about, and placing them center stage distracts from ethical questions that AI is raising here and now. Some contend that strong AI may be only decades away, but this focus obscures the reality that “weak” (or “narrow”) AI is already reshaping existing social and political institutions. Algorithmic decision making and decision support systems are currently being deployed in many high-stakes domains, from criminal justice, law enforcement, and employment decisions to credit scoring, school assignment mechanisms, health care, and public benefits eligibility assessments. Never mind the far-off specter of doomsday; AI is already here, working behind the scenes of many of our social systems.
What responsibilities and obligations do we bear for AI’s social consequences in the present—not just in the distant future? To answer this question, we must resist the learned helplessness that has come to see AI development as inevitable. Instead, we should recognize that developing and deploying weak AI involves making consequential choices—choices that demand greater democratic oversight not just from AI developers and designers, but from all members of society.
The first thing we must do is carefully scrutinize the arguments underpinning the present use of AI.
Some are optimistic that weak AI systems can contribute positively to social justice. Unlike humans, the argument goes, algorithms can avoid biased decision making, thereby achieving a level of neutrality and objectivity that is not humanly possible. A great deal of recent work has critiqued this presumption, including Safiya Noble’s Algorithms of Oppression (2018), Ruha Benjamin’s Race After Technology (2019), Meredith Broussard’s Artificial Unintelligence (2018), Hannah Fry’s Hello World (2018), Virginia Eubanks’s Automating Inequality (2018), Sara Wachter-Boettcher’s Technically Wrong (2017), and Cathy O’Neil’s Weapons of Math Destruction (2016). As these authors emphasize, there is a wealth of empirical evidence showing that the use of AI systems can often replicate historical and contemporary conditions of injustice, rather than alleviate them.
In response to these criticisms, practitioners have focused on optimizing the accuracy of AI systems in order to achieve ostensibly objective, neutral decision outcomes. Such optimists concede that algorithmic systems are not neutral at present, but argue that they can be made neutral in the future, ultimately rendering their deployment morally and politically unobjectionable. As IBM Research, one of the many corporate research hubs focused on AI technologies, proclaims:
AI bias will explode. But only the unbiased AI will survive. Within five years, the number of biased AI systems and algorithms will increase. But we will deal with them accordingly—coming up with new solutions to control bias in AI and champion AI systems free of it.
The emerging computer science subfield of FAT ML (“fairness, accountability and transparency in machine learning”) includes a number of important contributions in this direction—described in an accessible way in two new books: Michael Kearns and Aaron Roth’s The Ethical Algorithm (2019) and Gary Marcus and Ernest Davis’s Rebooting AI: Building Artificial Intelligence We Can Trust (2019). Kearns and Roth, for example, write:
We . . . . believe that curtailing algorithmic misbehavior will itself require more and better algorithms—algorithms that can assist regulators, watchdog groups, and other human organizations to monitor and measure the undesirable and unintended effects of machine learning.
There are serious limitations, however, to what we might call this quality control approach to algorithmic bias. Algorithmic fairness, as the term is currently used in computer science, often describes a rather limited value or goal, which political philosophers might call “procedural fairness”—that is, the application of the same impartial decision rules and the use of the same kind of data for each individual subject to algorithmic assessments, as opposed to a more “substantive” approach to fairness, which would involve interventions into decision outcomes and their impact on society (rather than decision processes only) in order to render the former more just.
Even if code is modified with the aim of securing procedural fairness, however, we are left with the deeper philosophical and political issue of whether neutrality constitutes fairness in background conditions of pervasive inequality and structural injustice. Purportedly neutral solutions in the context of widespread injustice risk further entrenching existing injustices. As many critics have pointed out, even if algorithms themselves achieve some sort of neutrality in themselves, the data that these algorithms learn from is still riddled with prejudice. In short, the data we have—and thus the data that gets fed into the algorithm—is neither the data we need nor the data we deserve. Thus, the cure for algorithmic bias may not be more, or better, algorithms. There may be some machine learning systems that should not be deployed in the first place, no matter how much we can optimize them.
There is a wealth of empirical evidence showing that the use of AI systems can often replicate historical and contemporary conditions of injustice, rather than alleviate them.
For a concrete example, consider the machine learning systems used in predictive policing, whereby historical crime rate data is fed into algorithms in order to predict future geographic distributions of crime. The algorithms flag certain neighborhoods as prone to violent crime. On that basis, police departments make decisions about where to send their officers and how to allocate resources. While the concept of predictive policing is worrisome for a number of reasons, one common defense of the practice is that AI systems are uniquely “neutral” and “objective,” compared to their human counterparts. On the face of it, it might seem preferable to take decision making power out of the hands of biased police departments and police officers. But what if the data itself is biased, so that even the “best” algorithm would yield biased results?
This is not a hypothetical scenario: predictive policing algorithms are fed historical crime rate data that we know is biased. We know that marginalized communities—in particular black, indigenous, and Latinx communities—have been overpoliced. Given that more crimes are discovered and more arrests are made under conditions of disproportionately high police presence, the associated data is skewed. The problem is one of overrepresentation: particular communities feature disproportionately highly in crime activity data in part because of how (unfairly) closely they have been surveilled, and how inequitably laws have been enforced.
It should come as no surprise, then, that these algorithms make predictions that mirror past patterns. This new data is then fed back into the technological model, creating a pernicious feedback loop in which social injustice is not only replicated, but in fact further entrenched. It is also worth noting that the same communities that have been overpoliced have been severely neglected, both intentionally and unintentionally, in many other areas of social and political life. While they are overrepresented in crime rate data sets, they are underrepresented in many other data sets (e.g. those concerning educational achievement).
Structural injustice thus yields biased data through a variety of mechanisms—prominently including under- and overrepresentation—and worrisome feedback loops result. Even if the quality control problems associated with an algorithm’s decision rules were resolved, we would be left with a more fundamental problem: these systems would still be learning from and relying on data born out of conditions of pervasive and long-standing injustice.
Conceding that these issues pose genuine problems for the possibility of a truly neutral algorithm, some might advocate for implementing countermeasures to correct for the bias in the data—a purported equalizer at the algorithmic level. While this may well be an important step in the right direction, it does not amount to a satisfactory solution on its own. Countermeasures might be able to help account for the over- and underrepresentation issues in the data, but they cannot correct for the problem what kind of data has been collected in the first place.
The data we have—and thus the data that gets fed into the algorithm—is often neither the data we need nor the data we deserve.
Consider, for instance, another controversial application of weak AI: algorithmic risk scoring in the criminal justice process, which has been shown to lead to racially biased outcomes. As a well-known study by ProPublica showed in 2016, one such algorithm classified black defendants as having a “high recidivism risk” at disproportionately higher rates in comparison to white defendants even after controlling for variables such as the type and severity of the crime committed. As ProPublica put it, “prediction fails differently for black defendants”—in other words, algorithmic predictions did a much worse job of accurately predicting recidivism rates for black defendants, compared to white defendants, given the crimes that individual defendants had previously committed; black defendants who were in fact low risk were much more likely to receive a high-risk score than similar white defendants. Often, such algorithmic systems rely on socio-demographic data, such as age, gender, educational background, residential stability, and familial arrest record. Even though the algorithm in this case does not explicitly rely on race as a variable, these other socio-demographic features can function as proxies for race. The result is a digital form of redlining, or, as computer scientists call it, “redundant encoding.”
In response to this problem, some states—including New Jersey—recently implemented more minimalist algorithmic risk scoring systems, relying purely on behavioral data, such as arrest records. The goal of such systems is to prevent redundant encoding by reducing the amount of socio-demographic information fed into the algorithm. However, given that communities of color are policed disproportionately heavily, and, in turn, arrested at disproportionately high rates, “purely behavioral” data about arrest history, for example, is still heavily raced (and classed, and gendered). Thus, the redundant encoding problem is not in fact solved. New Jersey’s example, among many others, shows that abstracting away from the social circumstances of defendants does not lead to true impartiality.
In light of these issues, any approach focused on optimizing for procedural fairness—without attention to the social context in which these systems operate—is going to be insufficient. Algorithmic design cannot be fixed in isolation. Developers cannot just ask, “What do I need to do to fix my algorithm?” They must rather ask: “How does my algorithm interact with society at large, and as it currently is, including its structural inequalities?” We must carefully examine the relationship and contribution of AI systems to existing configurations of political and social injustice, lest these systems continue to perpetuate those very conditions under the guise of neutrality. As many critical race theorists and feminist philosophers have argued, neutral solutions might well secure just outcomes in a just society, but only serve to preserve the status quo in an unjust one.
What, then, does algorithmic and AI fairness require, when we attend to the place of this technology in society at large?
The first—but far from only—step is transparency about the choices that go into AI development and the responsibilities that such choices present.
There may be some machine learning systems that should not be deployed in the first place, no matter how much we can optimize them.
Some might be inclined to absolve AI developers and researchers of moral responsibility despite their expertise on the potential risks of deploying these technologies. After all, the thought goes, if they followed existing regulations and protocols and made use of the best available information and data sets, how can they be held responsible for any errors and accidents that they did not foresee? On this view, such are the inevitable, necessary costs of technological advancement—growing pains that we will soon forget as the technology improves over time.
One significant problem with this quietism is the assumption that existing regulations and research are adequate for ethical deployment of AI—an assumption that even industry leaders themselves, including Microsoft’s president and chief legal officer Brad Smith, have conceded is unrealistic.
The biggest problem with this picture, though, is its inaccurate portrayal of how AI systems are developed and deployed. Developing algorithmic systems entails making many deliberate choices. For example, machine learning algorithms are often “trained” to navigate massive data sets by making use of certain pre-defined key concepts or variables, such as “creditworthiness” or “high-risk individual.” The algorithm does not define these concepts itself; human beings—developers and data scientists—choose which concepts to appeal to, at least as an initial starting point. It is implausible to think that these choices are not informed by cultural and social context—a context deeply shaped by a history of inequality and injustice. The variables that tech practitioners choose to include, in turn, significantly influence how the algorithm processes the data and the recommendations it ultimately makes.
Making choices about the concepts that underpin algorithms is not a purely technological problem. For instance, a developer of a predictive policing algorithm inevitably makes choices that determine which members of the community will be affected and how. Making the right choices in this context is as much a moral enterprise as it is a technical one. This is no less true when the exact consequences are difficult even for developers to foresee. New pharmaceutical products often have unexpected side effects, but that is precisely why they undergo extensive rounds of controlled testing and trials before they are approved for use—not to mention the possibility of recall in cases of serious, unforeseen defect.
Unpredictability is thus not an excuse for moral quiescence when the stakes are so high. If AI technology really is unpredictable, this only presents more reason for caution and moderation in deploying these technologies. Such caution is particularly called for when the AI is used to perform such consequential tasks as allocating the resources of a police department or evaluating the creditworthiness of first-time homebuyers.
Developers cannot just ask, “What do I need to do to fix my algorithm?” They must rather ask: “How does my algorithm interact with society at large, and as it currently is, including its structural inequalities?”
To say that these choices are deliberate is not to suggest that their negative consequences are always, or even often, intentional. There may well be some clearly identifiable “bad” AI developers who must be stopped, but our larger point is that all developers, in general, know enough about these technologies to be regarded as complicit in their outcomes—a point that is obscured when we act as if AI technology is already escaping human control. We must resist the common tendency to think that an AI-driven world means that we are freed not only from making choices, but also from having to scrutinize and evaluate these automated choices in the way that we typically do with human decisions. (This psychological tendency to trust the outputs of an automated decision making system is what researchers call “automation bias.”)
Complicity here means that the responsibility for AI is shared by individuals involved in its development and deployment, regardless of their particular intentions, simply because they know enough about the potential harms. As computer scientist Joshua Kroll has argued, “While structural inscrutability frustrates users and oversight entities, system creators and operators always determine that the technologies they deploy are fit for certain uses, making no system wholly inscrutable.”
The apocalypse-saturated discourse on AI, by contrast, encourages a mentality of learned helplessness. The popular perception that strong AI will eventually grow out of our control risks becoming a self-fulfilling prophecy, despite the present reality that weak AI is very much the product of human deliberation and decision making. Avoiding learned helplessness and automation bias will require adopting a model of responsibility that recognizes that a variety of (even well-intentioned) agents must share the responsibility for AI given their role in its development and deployment.
In the end, the responsible development and deployment of weak AI will involve not just developers and designers, but the public at large. This means that we need, among other things, to scrutinize current narratives about AI’s potential costs and benefits. As we have argued, AI’s alleged neutrality and inevitability are harmful, yet pervasive, myths. Debunking them will require an ongoing process of public, democratic contestation about the social, political, and moral dimensions of algorithmic decision making.
We must resist the apocalypse-saturated discourse on AI that encourages a mentality of learned helplessness.
This is not an unprecedented proposal: similar suggestions have been made by philosophers and activists seeking to address other complex, collective moral problems, such as climate change and sweatshop labor. Just as their efforts have helped raise public awareness and spark political debate about those issues, it is high time for us as a public to take seriously our responsibilities for the present and looming social consequences of AI. Algorithmic bias is not a purely technical problem for researchers and tech practitioners; we must recognize it as a moral and political problem in which all of us—as democratic citizens—have a stake. Responsibility cannot simply be offloaded and outsourced to tech developers and private corporations.
This also means that we need, in part, to think critically about government decisions to procure machine learning tools from private corporations—especially because these tools are subsequently used to partially automate decisions that were previously made by democratically authorized, if not directly elected, public officials. But we will also have to ask uncomfortable questions about our own role as a public in authorizing and contesting the use of AI technologies by corporations and the state. Citizens must come to view issues surrounding AI as a collective problem for all of us rather than a technical problem just for them. Our proposal is aligned with an emerging “second wave” of thinking about algorithmic accountability, as legal scholar Frank Pasquale puts it: a perspective which critically questions “whether
[certain algorithmic systems]
should be used at all—and, if so, who gets to govern them,” rather than asking how such systems might be improved in order to make them more fair. This “second” perspective has immediate implications for how we ought to think about the relationship between democratic power and AI. As Julia Powles and Helen Nissenbaum emphasize, “Any AI system that is integrated into people’s lives must be capable of contest, account, and redress to citizens and representatives of the public interest.”
If using algorithmic decision making means making deliberate choices for which we are all on the hook, how exactly should we as a democratic society respond? What is our role, as citizens, in shaping technology for a more just society?
One might be tempted to think that the unprecedented novelty of machine learning can be regulated effectively only if we manage to create entirely new democratic procedures and institutions. A range of such measures have been proposed: creating new governmental departments (such as the new Department of Technology proposed by Democratic presidential candidate Andrew Yang), new laws (such as consumer protection laws, enforced by a kind of “FDA for algorithms”), as well as (rather weak) new measures for voluntary individual self-regulation, such as a Hippocratic oath for developers.
One problem with these proposals is that they reinforce learned helplessness about algorithmic bias; they suggest that until we implement large-scale, complex institutional and social change, we will be unable to address algorithmic bias in any meaningful way. But there is another way to think about algorithmic accountability. Rather than advocating for entirely new democratic institutions and procedures, why not first try to shift existing democratic agendas? Democratic agenda-setting means enabling citizens to contest and control the concepts that underpin algorithmic decision rules and to deliberate about whether algorithmic decision-making ought to be used in a particular domain in the first place.
Citizens must come to view issues surrounding AI as a collective problem for all of us rather than a technical problem just for them.
San Francisco, for example, recently banned the use of facial recognition tools in policing due to the increasing amount of empirical evidence of algorithmic bias in law enforcement technology; the city’s board of supervisors passed the “Stop Secret Surveillance” ordinance in an 8-1 vote. The ordinance, which states that “the technology will exacerbate racial injustice and threaten our ability to live free of continuous government monitoring,” also establishes more wide-ranging accountability mechanisms beyond the use of facial recognition technology in policing, such as a requirement that city agencies obtain approval before purchasing and implementing other types of surveillance technology.
Broaching questions of algorithmic justice via the democratic process would give members of communities most impacted by algorithmic bias more direct democratic power over crucial decisions concerning weak AI—not merely after its deployment, but also at the design stage. San Francisco’s new ordinance exemplifies the importance of providing meaningful opportunities for bottom-up democratic deliberation about, and democratic contestation of, algorithmic tools ideally before they are deployed: “Decisions regarding if and how surveillance technologies should be funded, acquired, or used, and whether data from such technologies should be shared, should be made only after meaningful public input has been solicited and given significant weight.”
Similar local, bottom-up procedures akin to San Francisco’s model can and should be implemented in other communities, rather than simply waiting for the creation of more comprehensive, top-down regulatory institutions. Significant public oversight over AI development and deployment is already possible. Rather than allowing tech practitioners to navigate the ethics of AI by themselves, we the public should be included in decisions about whether and how AI will be deployed and to what ends. Furthermore, even once we do create new, larger-scale democratic institutions empowered to legally regulate AI in a meaningful way, bottom-up democratic procedures will still be essential: they play a crucial role in identifying which agendas such institutions ought to pursue, and they can shine a light on whose interests are most affected by emerging technologies.
Moving to this agenda-setting approach means incorporating an ex ante perspective when we think about algorithmic accountability, rather than resigning ourselves to a callous, ex post, “wait and see” attitude. When we wait and see, we let corporations and practitioners take the lead and set the terms. In the meantime, we expect those who are already experiencing significant social injustice to continue bearing its burden.
To take full responsibility for how technology shapes our lives, we will have to make the deployment of AI democratically contestable by putting it on our democratic agendas.
Of course, shifting democratic agendas toward decisions about algorithmic tools will not entirely resolve the problem of algorithmic bias. To take full responsibility for how technology shapes our lives going forward, we will have to make the deployment of weak AI democratically contestable by putting it on our democratic agendas. That being said, we will eventually have to combine that strategy with an effort to establish institutions that can enforce the just and equitable use of technology after it has been deployed.
In other words, a democratic critique of algorithmic injustice requires both an ex ante and an ex post perspective. In order for us to start thinking about ex post accountability in a meaningful way—that is, in a way that actually reflects the concerns and lived experiences of those most affected by algorithmic tools—we need to first make it possible for society as a whole, not just tech industry employees, to ask the deeper ex ante questions (e.g. “Should we even use weak AI in this domain at all?”). Changing the democratic agenda is a prerequisite to tackling algorithmic injustice, not just one policy goal among many.
Democratic agenda setting can be a powerful mechanism for exercising popular control over state and corporate use of technology, and for contesting technology’s threats to our rights. Effective agenda-setting, of course, will mean coupling the public’s agenda setting power with tangible bottom-up decision making power, rather than merely exercising our rights of deliberation and consultation.
This is where we can learn from other recent democratic innovations, such as participatory budgeting, in which local and municipal decisions about how to allocate resources for infrastructure, energy, healthcare, and environmental policy are being made directly by residents themselves after several rounds of collective deliberation. Enabling more robust democratic participation from the outset helps us identify the kinds of concerns and problems that we ought to prioritize. Rather than rushing to quick, top-down solutions aimed at quality control, optimization, and neutrality, we must first clarify what particular kind of problem we are trying to solve in the first place. Until we do so, algorithmic decision making will continue to entrench social injustice, even as tech optimists herald it as the cure for the very ills it exacerbates.
*Annette Zimmermann is a political philosopher working on the role of risk and uncertainty for the ethics of artificial intelligence, and their impact on democratic values like equality and justice. They are currently a postdoctoral research fellow at Princeton University. Elena Di Rosa is a PhD student in Philosophy at Princeton University. She works primarily on topics in ethics, contemporary social and political philosophy, and moral psychology. Hochan “Sonny” Kim is a PhD student in Politics at Princeton University. His research interests lie primarily in contemporary political philosophy, social philosophy, and philosophy of law.