Notes
CHAPTER 4 We’ve Been Here Before: Historical Precedents for Managing Artificial Intelligence
Marc Aidinoff and David I. Kaiser
Introduction
Scientific and technological innovations are made by people, and so they can be governed by people. Notwithstanding breathless popular descriptions of disempowered citizens cowed by technical complexity or bowing to the inevitable march of the new, history teaches that novel technologies like artificial intelligence can—indeed, must—be developed with ongoing and meaningful democratic oversight. Self-policing by technical experts is never enough to sustain an innovation ecosystem worthy of public trust. Contemporary artificial intelligence (AI) and related computing techniques might be distinct technological phenomena, but they too can be governed in the public interest.
Rather than treat AI governance as an abstract problem, we urge policymakers to rely on the rich, empirical record of past engagements to conceptualize and respond appropriately to present-day challenges. History offers a repository of multiple, overlapping, real-world instances in which technical experts, policymakers, and broader publics have grappled with once-new technologies. Commentators and policymakers too often focus narrowly on one historical episode or analogy when thinking about the challenges of novel technologies—most commonly turning to the sprawling Manhattan Project during the Second World War or life scientists’ famous meeting at Asilomar in the mid-1970s. Yet considering multiple analogies and disanalogies can elucidate complementary axes along which to assess likely harms and potential benefits.
In this brief paper, we consider three historical episodes: the early nuclear weapons complex during the 1940s and 1950s; biotechnology, biomedicine, and the implementation of various safeguards in the 1970s; and the adoption and oversight of forensic technologies within the US legal and criminal justice systems over the course of the past century. Each example offers distinct insights for understanding opportunities and risks associated with AI today. As we discuss, each of the past examples required a broad range of actors to think at different scales: national and global security, the health of local communities, and individuals’ civil rights. No example offers a perfect analogy with present-day challenges; yet even the disanalogies can help clarify realistic options for decision-making today.
As each of the previous historical episodes make clear, the scientific and technical communities have often taken on special roles in establishing norms regarding how to define and protect the public interest. Yet in none of these previous instances did scientists and technologists hold unilateral sway over how the new technologies would be assessed, deployed, or governed. History offers the opportunity to consider how each previous effort succeeded in some ways but fell short in others. Across each example, we therefore identify three key themes for thinking about the governance of AI today: the inadequacy of researchers’ self-policing to produce meaningful safeguards on impactful technologies that move beyond controlled laboratory settings; the necessity of broad-gauge input and oversight to sustain an innovation ecosystem; and finally, the need for recurring reviews to regularly reassess evolving technologies and the shifting social practices within which they are embedded.
Part 1: Nuclear Secrets
The path from basic discoveries in nuclear science to sprawling weapons programs was dizzyingly short. The first indication of nuclear fission caught chemists Otto Hahn and Fritz Strassmann by surprise in their Berlin laboratory in December 1938. Immediately upon receiving an update from Hahn by letter, the recently exiled theoretical physicist Lise Meitner and her nephew, Otto Robert Frisch, developed a remarkable interpretation: Under certain circumstances, bombardment of a heavy nucleus such as uranium by neutrons could split the nucleus and release additional energy.1 Hahn, Strassmann, Meitner, and Frisch each communicated their results in rapid-fire scientific publications as well as via informal discussions with colleagues; within weeks, scientists around the world began pursuing follow-up studies. Frisch’s mentor, Niels Bohr, teamed up with another protégé, American physicist John Wheeler, to produce a detailed theoretical analysis of nuclear fission. Their landmark article was published in the Physical Review on September 1, 1939, just as Nazi tanks invaded Poland, triggering the start of the Second World War.2
Even before the Bohr–Wheeler paper had been published, scientists in at least five countries had recognized the possibility that nuclear fission could be used to create a new type of weapon and had initiated discussions with government officials. In April 1939, the German Reich Ministry of Education held a secret meeting on military applications of nuclear fission and banned uranium exports. That same month, the Japanese government launched “Project Ni” to study possible weapons effects of fission, while, independently, several physicists in Britain urged their government to jump-start a nuclear weapons project by securing uranium ore from the Belgian Congo. In August 1939, Albert Einstein signed a letter to US President Franklin Roosevelt—which had been written by concerned émigré physicists Leo Szilard and Eugene Wigner—alerting Roosevelt of the possibility that nuclear weapons could exploit runaway fission chain reactions. A few weeks later, Leningrad physicist Igor Kurchatov informed the Soviet government about possible military applications of nuclear fission.3
Given the plausible connections between nuclear fission and new types of weapons—and set against the drumbeat of worsening international relations—some scientists sought to control the flow of information about nuclear fission. Beginning in spring 1939, Hungarian physicist Leo Szilard, who had fled Europe and landed in New York City, urged his colleagues to adopt a voluntary moratorium on publishing new results. When some physicists refused to withhold their latest findings, Szilard concocted a new plan to allow researchers to submit their articles to scientific journals—which would enable clear cataloging of priority claims—but coordinate with the journal editors to hold back publication of certain papers until their release could be deemed safe. This scheme, too, proved difficult to implement in practice, not least because it depended upon voluntary compliance, with no means of enforcement.4 It also had some unintended consequences. When Kurchatov and his colleagues in the Soviet Union noticed a distinct falloff of publications in the Physical Review regarding nuclear fission, they considered their suspicions confirmed and doubled down on their efforts to convince Soviet officials to take the matter seriously.5
Szilard’s proposals focused on controlling the flow of information rather than regulating research itself. That distinction disappeared once the Allied efforts on nuclear weapons became more formalized, scaling up from lackluster study groups to the Manhattan Engineer District in June 1942. Under the auspices of the newly formed Office of Scientific Research and Development (OSRD) and administered by the US Army Corps of Engineers, officials in the Manhattan Project imported older procedures for military secrecy and provisioning—some dating to the 1917 US Espionage Act, enacted in a hurry after the United States had entered the First World War—to exert control over the circulation of information, materials, and personnel. The US Federal Bureau of Investigation (FBI) and the Military Intelligence Division conducted background checks on researchers; General Leslie Groves imposed strict compartmentalization rules to try to limit how much information any single individual could glean about the sprawling project; massive infrastructure was devoted to producing fissionable materials within secret facilities at places like Oak Ridge, Tennessee, and Hanford, Washington; while more mundane materials, such as rubber and gasoline—by then under strict wartime rationing—were diverted to the high-priority project.6
Over the course of the war, older conventions regarding secrecy and classification were updated and specialized to the case of nuclear weapons. These newer routines were formalized with passage of the US Atomic Energy Act in August 1946. Although the Act transferred control over the nuclear complex from the War Department to a new civilian agency—the Atomic Energy Commission (AEC)—in many ways the AEC reinforced wartime procedures. Under the new law, for example, whole categories of information about nuclear science and technology were deemed to be “born secret,” that is, classified by default and only released following careful review. The Act also established a government monopoly over the development and circulation of various fissile materials within the US, effectively foreclosing efforts by private companies to pursue civilian nuclear power generation. (Several of these provisions of the Act were amended in 1954, with the explicit goal of fostering private-sector efforts in nuclear power, but with mixed results.7)
Policymakers crafted these regulatory developments amid specific domestic and international considerations. On the international front, mutual suspicions between officials in the United States and the Soviet Union—exacerbated by the shocking revelation in February 1946, following the defection of a Soviet cipher clerk, that the Soviets had conducted espionage at several Manhattan Project sites during the war—derailed early efforts to establish international control of nuclear science and technology. Domestically, long-standing rivalries between various military branches shaped debates over nuclear weapons policies, including whether the United States should pursue next-generation weapons such as thermonuclear (or fusion) bombs.8
Much as Szilard had done as early as 1939, after the war many scientists and engineers worked hard to help shape the evolving landscape of practices and norms around nuclear science and technology. Some, like J. Robert Oppenheimer, moved from leadership positions in the wartime program into influential consulting roles after the war. Oppenheimer helped draft several proposals for postwar nuclear policies and chaired the new General Advisory Committee of the AEC. Others, especially younger colleagues, formed new organizations like the Federation of Atomic Scientists to lobby lawmakers for their preferred policy outcomes, such as civilian (rather than military) control of the postwar nuclear complex, and in support of nuclear disarmament.9
Before long, however, the scientists’ illusions of control collapsed amid Cold War realities. Right on the heels of their major legislative victory—ensuring passage of the Atomic Energy Act that enshrined civilian oversight—groups like the Federation of Atomic Scientists became targets of a concerted campaign. The FBI and the US House Committee on Un-American Activities targeted the Federation and several of its individual members, smearing them with selective leaks and high-profile hearings, alleging Communist sympathies.10 Oppenheimer’s infamous hearing in June 1954 before an AEC personnel security board was a late example of what had long since become routine. In fact, a disproportionate number of younger, more vulnerable nuclear physicists were affected by domestic anti-communism than representatives of any other academic discipline during the decade after the end of the Second World War. The elaborate system of nuclear classification became a cudgel with which to silence critics, whose attorneys were often denied access to information under the guise of protecting national security.11
Beyond the impact on individuals and groups, the postwar nuclear classification regime strained relationships with US allies—most notably the United Kingdom—while remaining relatively ineffective at halting nuclear proliferation. Within a few years after the war, the Soviet Union built both fission and fusion bombs with a speed that caught many US authorities off guard; those efforts were aided, in part, by wartime espionage that had pierced military control. Arguably, overzealous efforts at nuclear secrecy helped to accelerate the arms race, exacerbating the precarious brinksmanship of a protracted Cold War and triggering all-too-hot proxy wars around the globe.12
As policymakers ask questions today about allowing researchers to deploy, withhold, or partially disclose new computational models and techniques, the example of nuclear secrecy infrastructure provides important cautions about bureaucratic overreach and political abuse. During the postwar years, few scientists, engineers, or policymakers suggested that all information about nuclear weapons or related technologies should be openly shared—proliferation concerns were real and some safeguards were clearly appropriate. Yet the complex system of nuclear classification and control quickly grew so byzantine that legitimate research inquiries were cut off, responsible private-sector investment was stymied, and open political debate was squashed.13 As the secrecy regimes grew in complexity and extensiveness, the academic community often served as a weak but crucial counterbalance to maintain, or at least seek to maintain, the levels of openness necessary for robust scientific progress and democratic oversight.
Part 2: Biotechnology and Biomedicine
Leo Szilard’s first impulse, upon learning about nuclear fission in 1939, had been to try to convince his fellow scientists to adopt a voluntary moratorium on publishing certain findings. Several decades later, in the mid-1970s, a group of molecular biologists followed a similar route, urging their colleagues to pause research involving the new techniques of recombinant DNA (rDNA). The call by Stanford biologist Paul Berg, together with colleagues from several other elite US universities and research sites, moved beyond Szilard’s earlier intervention: They pressed for a voluntary moratorium on certain types of research, not only on publication.14
By the spring of 1974, Berg and his colleagues had grown concerned about potential risks of rDNA research, even as they anticipated many beneficial outcomes. What if pathogenic bacteria acquired antibiotic-resistant genes, or carcinogenic genes were transferred to otherwise harmless microorganisms? Unlike the massive, top secret industrial sites of the wartime Manhattan Project, rDNA experimentation involved relatively small-scale, benchtop apparatus, and hence could be pursued within nondescript laboratories in urban centers—such as at Berg’s and colleagues’ universities. What types of containment facilities and safety protocols could protect researchers as well as their neighbors from possible leaks of dangerous biological materials? How could the risks of various research projects be assessed and mitigated?15 As MIT’s David Baltimore recalled soon after Berg and colleagues met in his office to brainstorm about their concerns, “we sat around for the day and said, ‘How bad does the situation look?’ And the answer that most of us came up with was that … just the simple scenarios that you could write down on paper were frightening enough that, for certain kinds of limited experiments using this technology, we didn’t want to see them done at all.”16 Berg, Baltimore, and their small group published a brief, open letter calling for a voluntary moratorium on rDNA research—it appeared in Science, Nature, and the Proceedings of the National Academy of Sciences—until the scientific community could address such concerns.17
By the time their letter appeared in print, the Berg group had been deputized by the US National Academy of Sciences to convene a meeting of colleagues and develop recommendations for the US National Institutes of Health (NIH). Famously, that meeting was held in February 1975 at the Asilomar Conference Grounds in Pacific Grove, California. Berg, Baltimore, and their original discussion mates were joined by other eminent biologists, including Maxine Singer and Sydney Brenner. Much like the group that had met at MIT the previous spring, the Asilomar group consisted almost entirely of researchers in the life sciences.18 They recommended a temporary extension of the voluntary research moratorium combined with a framework for assessing risks and appropriate containment facilities for various types of rDNA experiments. In late June 1976, the US Department of Health, Education, and Welfare released the official guidelines that would govern rDNA research by NIH-funded researchers throughout the United States, which drew extensively upon the Asilomar recommendations.19
To this day, the Asilomar meeting is routinely hailed as the preeminent example of how scientists can successfully and responsibly govern risky research: Concerned scientists spoke up, urged restraint upon their colleagues, and forged new guidelines among themselves. Yet much like Szilard’s calls for nuclear scientists to self-censor during the early days of nuclear fission, the biologists’ self-policing around rDNA was a small part of what grew into a much larger process—one that involved input and negotiation among a much wider set of stakeholders.20 On the very evening in June 1976 that federal officials announced the new NIH guidelines, the mayor of Cambridge, Massachusetts—home to famously difficult-to-govern research institutions like Harvard University and MIT—convened a special Hearing on Recombinant DNA Experimentation. As Mayor Alfred Vellucci announced upon opening the special session, “No one person or group has a monopoly on the interests at stake. Whether this research takes place here or elsewhere, whether it produces good or evil, all of us stand to be affected by the outcome. As such, the debate must take place in the public forum with you, the public, taking a major role.”21 And so began a remarkable months-long effort by local university researchers, private-practice physicians, city officials, and other concerned citizens to devise an appropriate regulatory framework that would govern rDNA research within Cambridge city limits—under threat of a complete ban if the new Cambridge Experimentation Review Board (CERB) failed to converge on rules that could pass muster with the city council.22
The CERB group held open, public meetings twice weekly throughout the autumn of 1976. During the sessions, Harvard and MIT researchers had opportunities to explain details of their proposed research to nonspecialists; on other evenings, CERB hosted public debates over proposals for competing safety protocols. Similar civic groups met to hash out local regulations in cities across the United States, including Ann Arbor, Michigan; Bloomington, Indiana; Madison, Wisconsin; Princeton, New Jersey; as well as Berkeley and San Diego in California. In none of these jurisdictions did citizens simply adopt the scientists’ Asilomar recommendations without thorough discussion, scrutiny, and debate. For example, the CERB group called for the formation of a new five-person Cambridge Biohazards Committee plus regular site inspections of rDNA labs within city limits, exceeding the requirements of the federal NIH guidelines. Only after CERB’s extensive, at times thorny, negotiations did the Cambridge city council vote unanimously, in early February 1977, to adopt the locally written Ordinance for the Use of Recombinant DNA Molecule Technology within the city—two years after the Asilomar meeting.23
With the carefully negotiated Cambridge ordinance in place, the city quickly became a biotechnology juggernaut, earning the nickname “Genetown.” City officials, university administrators, laboratory scientists, and neighboring nonscientists had worked together to construct a clear regulatory scheme within which new types of scientific research could thrive—both within university settings and quickly within spin-off biotech companies as well.24 The extended effort of public participation and debate helped to establish a new level of public trust, while avoiding Manhattan Project–style monopolies.
In parallel with the rDNA efforts, biomedical researchers, policymakers, and regulators across the United States forged a separate regulatory framework during the 1970s, which likewise required life scientists to work closely with concerned nonscientists. Following headline-grabbing revelations of egregious abuses of participants in previous biomedical studies—including the long-running Tuskegee Syphilis Study on Black men in rural Alabama—the US Congress passed the National Research Act in 1974. The Act stipulated the creation of a new national commission that would recommend uniform requirements to protect individuals who were involved in research studies.25
In 1979, the commission published the Belmont Report, articulating general principles and specific practices regarding the treatment of “human subjects” in federally funded research. Among the new requirements: ensuring that participants in research studies granted “informed consent,” and that potential risks to individual participants were appropriately balanced by potential benefits of a given study. To evaluate and oversee such requirements, the National Research Act codified that federally funded research involving human subjects must be reviewed by a local “institutional review board,” or IRB, whose membership had to include individuals with a range of experiences and expertise. At least one member of each IRB had to represent “nonscientific” concerns.26
Much like the CERB process in Cambridge, the National Research Act required biomedical researchers—at least those working with federal funds—to negotiate safe and effective research practices, with input and oversight extending beyond the research community itself. Imperfect and at times frustratingly bureaucratic, the new IRB infrastructure did not force all research to grind to a halt. Rather, it formalized a set of practices that had been honed within NIH’s own research centers to mitigate real harms.27
The 1974 National Research Act and the 1979 Belmont Report were forged in response to specific concerns at the time. Although the so-called Common Rule (US federal law 45 C.F.R. 46) which governs research on human subjects has been updated as recently as 2017, the current provisions still do not map effectively to more recent forms of research involving human-sourced data and information, especially those for which potential harms need not arise at the point of data collection.
As technical systems come to depend on more and more sensitive data, these regulatory regimes are clearly insufficient, especially when researchers are separated from data collection by relying on third-party vendors.28 Few if any individuals have granted consent (informed or otherwise) for their personal data, medical records, or facial images to be used as training data for such massive algorithmic projects. Likewise, although the Common Rule includes clear definitions of “identifiable private information” that is to be protected for study subjects, recent computational projects that rely upon amassing and analyzing large datasets routinely violate stated privacy protections, even when manipulating “deidentified” datasets.29
Part 3: Forensic Science
Whereas scientists like Leo Szilard and Paul Berg tried to quickly craft guardrails around the scientific work they were developing, John Larson was eager to deploy his latest innovation: the cardio-pneumo-psychograph device, or “polygraph.” Larson’s goal was not new; uncovering submerged human truths had long been a goal of physiological inquiry. Nineteenth-century physicians were particularly interested in the way the body could betray the mind. Étienne-Jules Marey, for example, took physical measurements of small changes to reveal stress, with the conviction that such measurements could reveal a hidden interior truth.30 By the early twentieth century, leading psychologists were working to operationalize the emerging consensus that emotions were bodily. In 1917, William Moulton Marston and his wife Elizabeth Holloway Marston invented a form of the polygraph, but within a few years, Larson had added two crucial insights. The first was to take continuous measurements of blood pressure and record them as one running line, monitoring change relative to a baseline. The second was to partner with law enforcement.31
In the spring of 1921, Larson tried out his technology to solve a real crime, a potboiler-style drama of a missing diamond presumed stolen by one of ninety women living in a boarding house. The thief, whose recorded blood pressure did drop precipitously during her interrogation, eventually confessed after days of additional interrogation. With journalists eager for gripping narratives about the latest crime, the cardio-pneumo-psychograph made great copy, but to Larson’s chagrin, it was renamed the “lie detector.” Historical accounts even credit newspapers with pressuring police in other jurisdictions to further adopt the tool. For August Vollmer, the chief of police in Berkeley, California, the cardio-pneumo-psychograph was particularly appealing because it could help professionalize law enforcement. Concerned with perceptions of a corrupt police force that relied on personal relations and intuitions, Vollmer was enthusiastic to experiment with new “scientific” policing. Although the methods were unproven, Vollmer believed that the patina of scientific expertise gained by enrolling Larson would bolster public support for local law enforcement.32
From the beginning, the polygraph was a “charismatic” technology that captured public interest.33 It inspired popular depictions that led to the polygraph’s widespread deployment beyond routine police work or formal legal settings. Some of these uses were relatively banal, such as trying to understand what drew certain audiences to films or actors. But the stakes of this unreliable technology grew in more impactful domains like employment. For example, adherence to the Cold War nuclear secrecy regime was policed through polygraph tests for adjudicating and maintaining security clearances. Beyond the nuclear complex, employers saw the polygraph as a useful screen for job suitability, despite its unreliability and recurring biases.34
Judges were less willing to accept the polygraph starting in 1922, with the trial of James Frye. Frye had previously confessed to the murder in question but claimed that his confession had been coerced. William Marston performed a polygraph test to validate Frye’s claim. After a cursory review, the judge rejected the polygraph as evidence. The subsequent “Frye Rule” was designed to prevent scientific developments that were still undergoing development from entering the courtroom. Instead, a methodology like using polygraph machines would require “general acceptability” by the scientific community. This standard encoded a belief that juries would be distinctly swayed by supposedly objective scientific evidence produced by a machine like the polygraph.35
In practice, the Frye Rule did not prevent deeply questionable evidence from entering court proceedings—let alone from circulating beyond formal legal settings. In 1993, Daubert v. Merrell Dow Pharmaceuticals offered a new standard to replace the Frye Rule by further empowering judges to act as gatekeepers of expert testimony about novel technologies. It asked judges to think like scientists who evaluated peer-reviewed expertise. The goal remained the same: The courts would not be a place for radical experimentation with novel technologies.36 (As recently as December 2023, amendments to the US Federal Rule of Evidence 702 have aimed to clarify that expert testimony in trial is to be treated as the expert’s opinion, and that the proponent of introducing such testimony must meet a burden-of-evidence standard for such testimony to be admissible.37)
Either relying on their own judgment or assessing the consensus views, judges needed to assess basic validity claims about lie detection. In turn, the scientific community repeatedly mobilized to limit the use of polygraphs in court. The US Office of Technology Assessment (OTA) concluded in a 1983 report that there was “only limited scientific evidence for establishing the validity of polygraph testing.”38 Again, the discrepancy between criminal law within a courtroom and deployment of the technology in other high-stakes arenas remained stark. The resistance to the polygraph from multiple experts in the most regulated legal sphere of criminal law was matched by an unchecked spread of the technology in other important spheres. The same OTA report estimated that outside of the federal government, more than one million polygraph tests were administered annually within the United States just to determine employment.39 More recently, the US National Academies led efforts to (again) scrutinize evidence on the reliability of the polygraph.40 The 2003 report has played a crucial role in keeping the polygraph out of courtrooms.
In contrast to polygraph evidence, other science-based techniques have long been incorporated within legal proceedings in the United States, such as fingerprint analysis. Although far from perfect, the use of fingerprint identification techniques within law enforcement and legal settings has been subject to expert review, training, and standardization for decades.41 Moreover, high-profile misidentifications—such as the one in 2004 that led to the wrongful imprisonment of an American lawyer living in Oregon on charges related to the terrorist bombing of commuter trains in Madrid, Spain—catalyzed multiple reviews by expert panels to reassess the underlying scientific bases for fingerprint identifications and to update best-practice procedures for their use, including new types of training for practitioners.42
Algorithmic facial recognition technology has followed a trajectory more like the polygraph than like fingerprinting. Despite its significant, well-documented flaws, facial recognition technology has become ubiquitous in high-stakes contexts outside the courtroom.43 A few years ago, the US National Institute of Standards and Technology (NIST) conducted a detailed evaluation of nearly 200 distinct facial recognition algorithms, from around 100 commercial vendors. Nearly all of the machine-learning algorithms demonstrated enormous disparities, yielding false-positive rates more than 100 times higher when applied to images of Black men from West Africa compared to images of white men from Eastern Europe; the NIST tests also found systematically elevated false-positive rates when applied to images of women than men across all geographical regions.44 In the face of such clear-cut biases, some scholars have called for increased inclusion in the datasets—in theory broadening the types of faces that can be recognized.45 Others have argued that inclusion is the problem rather than the solution, and that the imperative to include more data puts identified and misidentified citizens at increased risk, without legal recourse.46
Although the research community has identified these stark demographic biases and thousands of research papers have focused on ways to mitigate such disparities under pristine, laboratory conditions, the commercially available algorithms have already moved well beyond research spaces and into impactful real-world settings.47 Within the United States alone, thousands of distinct law enforcement jurisdictions can purchase commercial facial recognition technologies, subject to no regulation, standardization, or oversight. This free-for-all has led to multiple reports of Black men being wrongfully arrested due to a combination of failures, for example, inadequate technical calibrations for the various algorithms together with human failures to follow recommended procedures following a putative facial image match within an active police investigation, such as seeking additional eyewitness testimony or forensic evidence from the crime scene.48
These continuing real-world failures—which exacerbate long-standing inequities within existing institutional frameworks—are likely to worsen in the absence of any oversight or regulation.49 There already exist more than one billion surveillance cameras across fifty countries. Within the United States alone, facial images of half the adult population are already included in databases accessible to law enforcement.50
Like the polygraph, these faulty unregulated technologies have already moved far beyond both laboratory and law enforcement settings. In some cases, they have generated sensational claims that far outstrip technical feasibility, such as a widely covered 2018 study that claimed that algorithmic analysis of facial images could determine an individual’s sexual orientation.51 Meanwhile, private vendors continue to scoop up as many facial images as they can, almost always from platforms for which the people depicted neither granted permission for such uses nor were aware of the third-party data collection.52 In turn, facial surveillance is now used to surveil all sorts of new contexts, including monitoring students’ behavior in school, preventing access to venues, and even screening for jobs.53
Conclusions
AI policy is marked by a recurring problem: a sense that AI itself is difficult or even impossible to fully understand. Scholars have shown how machine learning relies on several forms of opacity: corporate secrecy, technical complexity, and unexplainable processes.54 Scientists have a special obligation to push against that opacity. In fact, as these examples show, at its best the scientific community has worked closely with diverse communities to build broad coalitions of researchers and nonresearchers to assess and respond to risks. History offers both hope that building such collective processes are possible and repeated notes of caution about the difficulties of sustaining such necessary work. Three principles that emerge from across these historical case studies should inform how the scientific community leads present-day AI governance.
1. Self-policing is not enough: Researchers’ voluntary moratoriums on publication or on specific research practices has rarely (if ever) proven sufficient, especially once impactful technologies have moved beyond controlled laboratory settings. Scientists and engineers have been particularly poorly equipped to anticipate the ways in which public narratives about technologies would shape expectations and uses.
2. Oversight must extend beyond the research community: Broad-gauge input and oversight has repeatedly proven necessary to sustain an innovation ecosystem. Extended debate and negotiation among researchers and broader groups of nonspecialists can build public trust and establish clear regulatory frameworks, within which research can expand across academic and private-sector spaces.
3. Recurring reviews are necessary: In-depth reviews, conducted by reviewers that include specialists and broader communities of concerned stakeholders, should regularly reassess both the evolving technologies and the shifting social practices within which they are embedded. Only then can best practices be identified and refined. These reviews are most effective when they build on existing civic infrastructures and civil rights.
In all three historical examples, scientists and engineers were eager to act justly and to put bounds around novel technologies to mitigate potential risks. Yet these experts could not anticipate the ways in which official or popular enthusiasm would lead these innovations to spread in unexpected ways. For example, researchers did not predict the rise of an elaborate Cold War national security secrecy infrastructure, the reactions from Cambridge residents to fears of accidents or leaks involving dangerous pathogens, or the popular enthusiasm (despite legal skepticism) for the polygraph. These off-label uses, far beyond the reach of laboratory controls or formal legal strictures, have posed particular dangers to broader communities.
Nonetheless, by speaking decisively about risks, articulating clear gaps in knowledge, and identifying faulty claims, scientists and technologists—working closely with colleagues beyond the research community—have successfully established regulatory and governance frameworks within which new technologies have been developed, evaluated, and improved. The same commitment to genuine partnerships beyond the research community must guide governance of exciting—yet risky—AI technologies today.
Notes
1. Ruth Lewin Sime, Lise Meitner: A Life in Physics (Berkeley: University of California Press, 1996), chap. 10.
2. Niels Bohr and John A. Wheeler, “The Mechanism of Nuclear Fission,” Physical Review 56 (September 1, 1939): 426–450. See also Sime, Lise Meitner, chap. 11.
3. Mark Walker, German National Socialism and the Quest for Nuclear Power (New York: Cambridge University Press, 1989), 17–18; Walter E. Grunden, Mark Walker, and Masakatsu Yamazki, “Wartime Nuclear Weapons Research in Germany and Japan,” Osiris 20 (2005): 107–130; Margaret Gowing, Britain and Atomic Energy, 1939–1945 (London: Macmillan, 1964), chap. 1; Richard G. Hewlett and Oscar E. Anderson, Jr., A History of the United States Atomic Energy Commission, vol. 1, A New World, 1939–1946 (University Park: Pennsylvania State University Press, 1962), 15–17; David Holloway, Stalin and the Bomb: The Soviet Union and Atomic Energy, 1939–1956 (New Haven: Yale University Press, 1994), chap. 3.
4. Alex Wellerstein, Restricted Data: The History of Nuclear Secrecy in the United States (Chicago: University of Chicago Press, 2021), chap. 1.
5. Alexei Kojevnikov, Stalin’s Great Science: The Times and Adventures of Soviet Physicists (London: Imperial College Press, 2004), 132.
6. Wellerstein, Restricted Data, chap. 2; Hewlett and Anderson, A New World, chaps. 3–6.
7. Wellerstein, Restricted Data, chap. 4; Brian Balogh, Chain Reaction: Expert Debate and Public Participation in American Commercial Nuclear Power, 1945–1975 (New York: Cambridge University Press, 1991), chaps. 4–5.
8. Lawrence Badash, Scientists and the Development of Nuclear Weapons (Atlantic City: Humanities Press, 1995), chaps. 5–6; Hewlett and Anderson, A New World, chaps. 12–16.
9. Kai Bird and Martin Sherwin, American Prometheus: The Triumph and Tragedy of J. Robert Oppenheimer (New York: Vintage, 2005), part 4; Jessica Wang, American Science in an Age of Anxiety: Scientists, Anticommunism, and the Cold War (Chapel Hill: University of North Carolina Press, 1999), chap. 1; David Kaiser and Benjamin Wilson, “American Scientists as Public Citizens: 70 Years of the Bulletin of the Atomic Scientists,” Bulletin of the Atomic Scientists 71 (January 2015): 13–25.
10. Wang, American Science in an Age of Anxiety, chaps. 2, 5.
11. Bird and Sherwin, American Prometheus, part 5; Priscilla J. McMillan, The Ruin of J. Robert Oppenheimer and the Birth of the Modern Arms Race (New York: Viking, 2005), part 4; David Kaiser, “The Atomic Secret in Red Hands? American Suspicions of Theoretical Physicists During the Early Cold War,” Representations 90 (Spring 2005): 28–60.
12. Michael Gordin, Red Cloud at Dawn: Truman, Stalin, and the End of the Atomic Monopoly (New York: Farrar, Straus, and Giroux, 2009); Francis J. Gavin, Nuclear Statecraft: History and Strategy in America’s Atomic Age (Ithaca: Cornell University Press, 2012).
13. See also Peter Galison, “Removing Knowledge,” Critical Inquiry 31 (Autumn 2004): 229–243.
14. See especially Sheldon Krimsky, Genetic Alchemy: The Social History of the Recombinant DNA Controversy (Cambridge, MA: MIT Press, 1982); and Susan Wright, Molecular Politics: Developing American and British Regulatory Policy for Genetic Engineering, 1972–1982 (Chicago: University of Chicago Press, 1994).
15. Charles Weiner, “Drawing the Line in Genetic Engineering: Self-Regulation and Public Participation,” Perspectives in Biology and Medicine 44 (Spring 2001): 208–220; John Durant, “ ‘Refrain from Using the Alphabet’: How Community Outreach Catalyzed the Life Sciences at MIT,” in David Kaiser, ed., Becoming MIT: Moments of Decision (Cambridge, MA: MIT Press, 2010), 145–163.
16. David Baltimore, unpublished lecture in MIT Technology Studies Workshop, November 6, 1974, as quoted in John Durant, “ ‘Refrain from Using the Alphabet,’ ” 146.
17. Paul Berg, David Baltimore, Herbert W. Boyer, Stanley N. Cohen, et al., “Potential Biohazards of Recombinant DNA Molecules,” Science 185 (July 26, 1974): 303.
18. The impacts of the restriction of participation at Asilomar to life scientists is emphasized in Shobita Parthasarathy, “Governance Lessons for CRISPR/Cas9 from the Missed Opportunities at Asilomar,” Ethics in Biology, Engineering & Medicine: An International Journal 6, nos. 3–4 (2015): 305–312; and J. Benjamin Hurlbut, “Remembering the Future: Science, Law, and the Legacy of Asilomar,” in Sheila Jasanoff and Sang-Hyun Kim, eds., Dreamscapes of Modernity: Sociotechnical Imaginaries and the Fabrication of Power (Chicago: University of Chicago Press, 2015): 126–151.
19. Wright, Molecular Politics, chap. 4; Durant, “ ‘Refrain from Using the Alphabet,’ ” 150.
20. Weiner, “Drawing the Line in Genetic Engineering”; Durant, “ ‘Refrain from Using the Alphabet’ ”; David Kaiser and Jonathan D. Moreno, “Self-Censorship Is Not Enough,” Nature 492 (20 December 2012): 345–347.
21. Mayor Alfred Vellucci, “Hearing on Recombinant DNA Experimentation, City of Cambridge,” June 23, 1976, as quoted in Durant, “ ‘Refrain from Using the Alphabet,’ ” 150.
22. Weiner, “Drawing the Line in Genetic Engineering”; Durant, “ ‘Refrain from Using the Alphabet.’ ”
23. Durant, “ ‘Refrain from Using the Alphabet,’ ” 150–156; see also Wright, Molecular Politics, 222.
24. Durant, “ ‘Refrain from Using the Alphabet,’ ” 156–160. In other ways, the Cambridge-area biotech boom highlights enduring inequalities in the distribution and access to the advances of biotechnology. See Robin Scheffler, Genetown: The Greater Boston Area and the Rise of Biotechnology in America (Chicago: University of Chicago Press, forthcoming).
25. James H. Jones, Bad Blood: The Tuskegee Syphilis Experiment, rev. ed. (New York: Free Press, 1993); Eileen Welsome, The Plutonium Files: America’s Secret Medical Experiments in the Cold War (New York: Dial 1999); Susan M. Reverby, “Ethical Failures and History Lessons: The US Public Health Service Research Studies in Tuskegee and Guatemala,” Public Health Reviews 34, no. 1 (2012): article 13.
26. Laura Stark, Behind Closed Doors: IRBs and the Making of Ethical Research (Chicago: University of Chicago Press, 2012).
27. Stark, Behind Closed Doors.
28. Richard van Noorden, “The Ethical Questions That Haunt Facial-Recognition Research,” Nature 587 (November 19, 2020): 354–358; Casey Fiesler, Nathan Beard, and Brian C. Keegan, “No Robots, Spiders, or Scrapers: Legal and Ethical Regulation of Data Collection Methods in Social Media Terms of Service,” Proceedings of the International AAAI Conference on Web and Social Media 14, no. 1 (2020): 187–196.
29. Paul Ohm, “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization,” UCLA Law Review 57 (2010): 1701–77; Jacob Metcalf and Kate Crawford, “Where Are Human Subjects in Big Data Research? The Emerging Ethics Divide,” Big Data & Society (June 2016): 1–14; Mary L. Gray, “Big Data, Ethical Futures,” Anthropology News, January 13, 2017; Laura Stark, “Protections for Human Subjects in Research: Old Models, New Needs?,” MIT Case Studies in Social and Ethical Responsibilities of Computing, no. 3 (Winter 2022); Simson Garfinkel, “Differential Privacy and the 2020 US Census,” MIT Case Studies in Social and Ethical Responsibilities of Computing, no. 3 (Winter 2022).
30. Jimena Canales, A Tenth of a Second: A History (Chicago: University of Chicago Press, 2009), chap. 3.
31. Ken Alder, The Lie Detector: The History of An American Obsession (New York: Free Press, 2007).
32. Alder, The Lie Detector.
33. On “charismatic technologies,” see Morgan Ames, The Charisma Machine: The Life, Death, and Legacy of One Laptop per Child (Cambridge, MA: MIT Press, 2019).
34. Alder, The Lie Detector.
35. Sheila Jasanoff, “Science on the Witness Stand,” Issues in Science and Technology 6, no. 1 (Fall 1989): 80–87; David E. Bernstein, “Frye, Frye, Again: The Past, Present, and Future of the General Acceptance Test,” Jurimetrics 41, no. 3 (2001): 385–408.
36. Simon A. Cole, “Toward Evidence-Based Evidence: Supporting Forensic Knowledge Claims in the Post-Daubert Era,” Tulsa Law Review 43, no. 2 (2007): 263–84.
37. Chief Justice of the US Supreme Court, Amendments to the Federal Rules of Evidence, 118th Cong., 1st Sess., House Document 118-33, https://
www .govinfo .gov /content /pkg /CDOC -118hdoc33 /pdf /CDOC -118hdoc33 .pdf. For additional discussion of the new rules, see, for example, https:// www .law .cornell .edu /rules /fre /rule _702. 38. Office of Technology Assessment, Scientific Validity of Polygraph Testing: A Research Review and Evaluation (Washington, DC: Government Printing Office, 1983), 4.
39. Office of Technology Assessment, Scientific Validity of Polygraph Testing, 25.
40. Committee to Review the Scientific Evidence on the Polygraph, The Polygraph and Lie Detection (Washington, DC: National Academies Press, 2003).
41. See, for example, Andre A. Moenssens and Stephen B. Meagher, “Fingerprints and the Law,” US Department of Justice, Office of Justice Programs, 2011, https://
www .ojp .gov /library /publications /fingerprint -sourcebook -chapter -13 -fingerprints -and -law. 42. Robert B. Stacey, “Report on the Erroneous Fingerprint Individualization in the Madrid Train Bombing Case,” Forensic Science Communications 7, no. 1 (January 2005); National Research Council of the National Academies, Strengthening Forensic Science in the United States: A Path Forward (Washington, DC: National Academies Press, 2009); President’s Council of Advisors on Science and Technology, Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods (Washington, DC: Executive Office of the President, September 2016).
43. Committee on Facial Recognition, Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance (Washington, DC: National Academies Press, 2024).
44. Patrick Grother, Mei Ngan, and Kayee Hanaoka, “Face Recognition Vendor Test (FRVT), Part 3: Demographic Effects,” Report NISTIR 8280 (Washington, DC: National Institute of Standards and Technology, December 2019); Sidney Perkowitz, “The Bias in the Machine,” MIT Case Studies in Social and Ethical Responsibilities of Computing, no. 1 (Winter 2021).
45. Joy Buolamwini and Timnit Gebru, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proceedings of the 1st Conference on Fairness, Accountability and Transparency 81 (2018): 77–91, https://
proceedings .mlr .press /v81 /buolamwini18a .html; Martins Bruveris, Jochem Gietema, Pouria Mortazavian, and Mohan Mahadevan, “Reducing Geographic Performance Differentials in Face Recognition,” arXiv, February 27, 2020, https:// arxiv .org /abs /2002 .12093; Philipp Terhörst, Mai Ly Tran, Naser Damer, Florian Kirchbuchner, et al. “Comparison-Level Mitigation of Ethnic Bias in Face Recognition,” Proceedings of the IEEE International Workshop on Biometrics and Forensics (IWBF) (April 2020): 1–6. 46. Clare Garvie, Alvaro Bedoya, and Jonathan Frankel, The Perpetual Lineup: Unregulated Police Face Recognition in America (Washington, DC: Georgetown Law Center on Privacy and Technology, 2016).
47. See, for example, the literature review in Pawel Drozdowski, Christian Rathget, Antitza Dantcheva, Naser Damer, et al., “Demographic Bias in Biometrics: A Survey on an Emerging Challenge,” IEEE Transactions on Technology and Society 1, no. 2 (June 2020): 89–103.
48. See, for example, Robert Williams, “I Was Wrongfully Arrested Because of Facial Recognition. Why Are Police Allowed to Use It?,” Washington Post, June 24, 2020; Kashmir Hill, “Wrongfully Accused by an Algorithm,” New York Times (June 24, 2020, updated August 3, 2020); Elaisha Stokes, “Wrongful Arrest Exposes Racial Bias in Facial Recognition Technology,” CBS News, November 19, 2020; Kashmir Hill, “Another Arrest, and Jail Time, due to a Bad Facial Recognition Match,” New York Times, December 29, 2020, updated January 6, 2021; Editorial Board, “Unregulated Facial Recognition Must Stop Before More Black Men Are Wrongfully Arrested,” Washington Post, December 31, 2020.
49. Andrew G. Ferguson, The Rise of Big Data Policing: Surveillance, Race, and the Future of Law Enforcement (New York: NYU Press, 2017); Ruha Benjamin, Race After Technology: Abolitionist Tools for the New Jim Code (Medford, MA: Polity, 2019); Brian Jefferson, Digitize and Punish: Racial Criminalization in the Digital Age (Minneapolis: University of Minnesota Press, 2020).
50. Elly Cosgrove, “One Billion Surveillance Cameras Will Be Watching Around the World in 2021, a New Study Says,” CNBC, December 6, 2019; M. Melton, “Government Watchdog Questions FBI on Its 640-Million-Photo Facial Recognition Database,” Forbes, June 4, 2019; Perkowitz, “The Bias in the Machine.”
51. Yilun Wang and Michal Kosinski, “Deep Neural Networks Are More Accurate than Humans at Detecting Sexual Orientation from Facial Images,” Journal of Personality and Social Psychology 114, no. 2 (2018): 246–57, https://
doi .org /10 .1037 /pspa0000098; cf. Jacob Metcalf, “ ‘The Study Has Been Approved by the IRB’: Gayface AI, Research Hype, and the Pervasive Data Ethics Gap,” Medium, November 30, 2017, https:// medium .com /pervade -team /the -study -has -been -approved -by -the -irb -gayface -ai -research -hype -and -the -pervasive -data -ethics -ed76171b882c. For a sample of news coverage of the Wang and Kosinski study, see, for example, Alan Burdick, “The A.I. ‘Gaydar’ Study and the Real Dangers of Big Data,” New Yorker, September 15, 2017; Heather Murphy, “Why Stanford Researchers Tried to Create a ‘Gaydar’ Machine,” New York Times, October 9, 2017; Brian Resnick, “This Psychologist’s ‘Gaydar’ Research Makes Us Uncomfortable; That’s the Point,” Vox, January 29, 2018; and Paul Lewis, “ ‘I Was Shocked It Was So Easy’: Meet the Professor Who Says Facial Recognition Can Tell If You’re Gay,” The Guardian, July 7, 2018. 52. van Noorden, “Ethical Questions,” 354–358; Fiesler, Beard, and Keegan, “No robots, spiders, or scrapers,” 187–196.
53. Kashmir Hill, Your Face Belongs to Us (New York: Penguin Random House, 2023). The trend is not limited to the United States. See, for example, Kai Strittmatter, We Have Been Harmonised: Life in China’s Surveillance State (London: Old Street, 2019); and Tristan G. Brown, Alexander Statman, and Celine Sui, “Public Debate on Facial Recognition Technologies in China,” MIT Case Studies in Social and Ethical Responsibilities of Computing, no. 2 (Summer 2021).
54. Jenna Burrell, “How the Machine ‘Thinks’: Understanding Opacity in Machine Learning Algorithms,” Big Data & Society 3, no. 1 (June 1, 2016); see also Ho Chit Siu, Kevin J. Leahy, and Makai Mann, “STL: Surprisingly Tricky Logic (for System Validation),” 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2023): 8613–8620, https://
ieeexplore .ieee .org /document /10342290.