CHAPTER 3 Science in the Context of AI

Jeannette M. Wing

Generative AI took even the computer science community by surprise. To put this disruption in context, let’s start at the beginning, surf through two waves of AI, and then situate science in this time line. With selective highlights, I offer a compressed history of AI; a simplified view of the transformer architecture, which underlies generative AI; and a bird’s eye view of how AI can benefit science.

Part 1: AI Time Line

In 1950, Alan Turing, considered the father of modern computer science, proposed the Turing Test: If a human interacts with a machine, and the human cannot tell the difference between interacting with the machine and interacting with another human being, then the machine passes the Turing Test. We could thus consider this machine as exhibiting, in some sense, human intelligence.

The year 1956 marks not only when the term artificial intelligence was born but also the start of AI as an academic pursuit. At a summer conference held in Dartmouth, the participants asked whether we could build a machine that mimics the behavior of humans. This grand goal of AI was recognized early on as too difficult to achieve. Thus, subfields of AI splintered off, representing subtasks of human intelligence: computer vision for vision, speech recognition and natural language processing for language, and robotics for mobility and manipulation. Other subtasks such as logical reasoning and abstract reasoning found common ground with subfields of computer science, including theorem proving, formal methods, and programming languages. The first wave of AI, during the last half of the last century, is signified by representing knowledge symbolically, not numerically, and representing reasoning by rules. For example, with the following rule:

Man(X) ⇒ Mortal(X)

if Socrates is a man, then I can conclude Socrates is mortal.

These rule-based methods led to what are called “expert systems,” which by the late 1990s found their way into scientific domains. The first expert system project, Dendral,¹ began in 1965 with the goal of capturing knowledge about organic chemistry so chemists could identify unknown organic molecules “by analyzing their mass spectra and using knowledge of chemistry.”² Expert systems were limited by the need to enter and maintain rules manually; they were constrained to the vocabulary of the rules and did not scale to learn new domain knowledge automatically.

The second wave in AI hit by the end of the last century, and we are still riding it. It is signified by machine learning, where one trains a computational model on data, and upon deployment of the model in the real world, it can act on data it has never seen before. Moreover, the models learn from these interactions, thereby improving the model over time. The second wave is distinct from the first in that it is driven by the plethora of digital data, especially data that represents human behavior, for example, what our movie preferences are, when and how we commute to work every day, and what groceries we buy.

One form of machine learning models is deep neural networks (DNNs),³ which are characterized by multiple layers of nodes, where each layer is connected to the next by weighted edges. Each node at each layer computes a function based on input weights and correspondingly outputs weights for nodes in the next layer. Overall, a DNN transforms input data (e.g., an image) into an abstract representation of the data (e.g., a classification of that image). As a simple example, suppose a deep neural network was trained to classify images. Then if we feed it a picture of a cat that it has never seen before, it will output that it is a cat. More precisely, it will output the classification label “cat” with an associated high probability, say 0.95, and perhaps a different label, for example, “tiger,” with a lower probability, say 0.01. The DNN boom took off when, in 2012, AlexNet won the ImageNet contest on 1.2 million images and 1,000 classes.⁴ DNNs showed how with Big Data and Big Compute, machines could perform certain tasks as well as or better than humans. They are part of computer vision systems in self-driving cars; they enable voice recognition in personal assistants on our phones and tabletop devices in our living rooms; and they were at the core of the computing system that in 2016 beat the best human Go player in the world.

Fast-forward ten years into the Age of Generative AI, where we can generate new data, such as text that has never been written before or images that have never been created before. One generative AI technique is based on large-language models (LLMs).⁵ Authors of an early exploratory paper contrasting ChatGPT versus GPT4 show how GPT4 is able to generate a proof, which requires knowledge of calculus, to a simplification of a problem statement that appeared in the 2022 International Mathematics Olympiad.⁶ Another generative AI technique is based on diffusion models,⁷ which are especially good at generating images. A diffusion model first iteratively adds noise to the original picture, say of a dog, and then iteratively denoises to get a new image, that is, a brand-new picture of a dog.

To put science in the context of this AI time line, by the 1980s, supercomputers became the workhorse of science, for example, performing enormous numerical calculations and running complex simulations of physics-based models. The explosion of scientific data generated by devices and instruments enabled the age of data-driven science. From embedded microchips to space telescopes, scientists could sense the world, take measurements, record dynamics, and produce images of natural systems at unprecedented scale and speed.

At about the same time, statistical machine learning pulled the splintered subfields of AI back together, bringing vision, language, and robotics closer to each other, and even teasing us about the eventuality of artificial general intelligence (AGI).

It is the convergence of Big Compute, Big Data, and advanced AI that provides the context of this panel on using AI to make scientific discoveries. For science, the real breakthrough event came in 2018 with AlphaFold, an AI system built on both deep learning and reinforcement learning that could predict protein structure.⁸

In just the past 10 years, the second wave has turned into a tsunami. We have seen a 3.4-month doubling in the past 10 years in computational power (measured in petaflops per day) used to create machine learning models (Figure 3.1). In contrast, the 2-year doubling due to Moore’s Law looks like the lower part of the curve. Note the y-axis is a log scale. Since 2010, we have also seen a 2.2× growth rate in training data size.⁹ In just the past 4 years, we have gone from talking about billions of words with OpenAI’s GPT3 to trillions with Databricks’s DBRX.

Figure 3.1. Compute versus machine learning model. Graphic from “AI and compute,” OpenAI, May 16, 2018, https://openai.com/index/ai-and-compute/.

Part 2: Generative AI Architecture

How can scientists ride this tsunami? It is worth understanding a few basic concepts underlying generative AI, the cause for the recent disruption in the AI time line, and whose impact will be felt by all fields of endeavor for the long-term future. There is no going back. Today, generative AI is particularly remarkable for generating text and images. Tomorrow, who knows?

For example, to generate text,¹⁰ if we feed a large-language model a sequence of words, then the LLM will predict the next word. That is all it does! It will produce the word with the highest probability of occurring next. More formally, when given an initial i−1 words, we draw the next word from the distribution of possible next words:

P(w_i | w₁, …, w_i-1)

For example, if we feed in the input sequence “The cute dog begged for a” it will output “bone” assuming no other word has a higher associated probability.

Example Probability

The cute dog begged for a bone 0.85

The cute dog begged for a promotion 0.02

As mentioned earlier, we can generate new images using not just LLMs but diffusion models. The diffusion model was inspired by nonequilibrium statistical physics;¹¹ natural diffusion processes are found in physics, chemistry, and biology. In a diffusion model, the forward process systematically adds noise to an image; we then learn a reverse process by denoising, and in this process generate a new image (Figure 3.2).

What is fundamental to both techniques is that the underlying probability distributions are learned from billions of examples and represented as deep neural networks. Consider the transformer architecture,¹² which is shared by state-of-the-art large-language models. The transformer architecture has two steps, and both steps build on DNNs.¹³ In the first step, through a series of transformations, we encode the input into a multidimensional embedding space (Figure 3.3).

Figure 3.2. Generating a new dog image. Example from Lin Yang, Zhilong Zhang, Yang Song, Shenda Hong, et al., “Diffusion Models: A Comprehensive Survey of Methods and Applications,” *ACM Computing Surveys* 56, no. 4 (November 9, 2023): 1–39, https://doi.org/10.1145/3626235.

Figure 3.3. Transformer architecture, step 1: encode the input. Drawn from John Launchbury, “The Trajectory of AI,” Presentation at Galois, Portland, OR, December 2023.

The manifold hypothesis states that real-world high-dimensional data lie on low-dimensional manifolds embedded in the high-dimensional space.¹⁴ Thus, we can imagine that each transformer via linear and nonlinear operations stretches and squashes manifolds in this space, passing the transformed manifold onto the next layer. While we do not know if the manifold hypothesis is true, minimally it provides good intuition as to what happens at each layer. The result of this step is that each input word is embedded in this space. Embeddings make it possible to represent symbolic information numerically. Each word gets represented as a vector of floating point numbers, each of which represents some feature of the word (see left-hand side of Figure 3.4). Embeddings convert high-dimensional data into a low-dimensional space.

Figure 3.4. Embeddings represent knowledge abstractly.

What is interesting about embedding spaces is that both distance and direction have meaning; hence we often use the term vector embeddings. From embeddings we can learn abstract concepts not explicitly represented. In the embedding shown in the right-hand side of Figure 3.4, “king” is to “queen” as “man” is to “woman” represents the abstract concept of gender. And “king” is to “queen” as “kings” is to “queens” represents plurality. (These examples are taken from Figure 2 in Mikolov, Yih, and Zweig 2013.¹⁵)

In the second step, we decode the input sequence of words in the embedding space and eventually output the word with the highest probability of occurring next.¹⁶ But unlike DNNs of the past, critical to the success of this architecture, we also add attention layers in between the transformer layers (Figure 3.5).

After all these transformations, we finally output “bone” (with an associated probability, say 0.85), which is then appended to the previous output tokens and used as input for the next iteration. This iterative process is how ChatGPT works.

Attention layers provide context for the words being processed. Context constrains the possibilities. Each node provides key information about itself to others, for example, “I’m a noun.” And each node can query for information from its neighbors, for example, “I need a color.” (Key and query are terms used in information retrieval, also used in Vaswani et al. 2017.¹⁷)

Figure 3.5. Transformer architecture, step two: decode to output next token; attention layers provide context. Drawn from John Launchbury, “The Trajectory of AI,” presentation at Galois, Portland, OR, December 2023.

This context can help us resolve ambiguity in language. For example, Winograd Schema,¹⁸ which are considered mini-Turing tests, are easily solvable by large-language models, and thus makes it seem as if LLMs can do common-sense reasoning. In this example,

The trophy doesn’t fit in the brown suitcase because it is too [large/small].

it is ambiguous whether “it” refers to “trophy” or “suitcase.” We would know how to resolve the ambiguity if we know whether the last word is “large” or “small,” which might be in the context of a longer input sequence.

Part 3: AI for Science

Let’s take a step back from AI and explore how science can benefit from AI. Scientists can use generative AI to generate synthetic data, generate simulations, and more interestingly generate new hypotheses. The novelty of these new hypotheses is that because the computer has access to an enormous amount of data, it can find patterns or correlations that would never occur to a human or would take more than a lifetime for a human to uncover.

Even with more established techniques, such as DNNs, scientists can use AI for identification and discovery. We can classify and predict objects, recognize and discover new patterns, and detect anomalies and rare events. We can design and optimize experiments with AI recommending what control parameters to try for the next experiment. Automated experimental design is especially cost-effective and time-saving when running experiments on large, expensive instruments (e.g., a cyclotron, telescope, or neutrino detector). We could even use AI to propose new experiments and protocols to run. And it is a given that by using current AI techniques (e.g., LLMs), scientists can automatically pore through scientific literature, summarize results quickly, and create compelling visualizations. Finally, because AI techniques are agnostic as to what field of science we work in, by using AI we have the potential to expedite cross-disciplinary work.

Cutting across these categories of application, we can draw on a multitude of diverse sources of data to train and test new AI models for science:

Scientific publications, preprints, lab notebooks
Databanks, shared repositories, github
Data from experiments
Data from devices and scientific instruments (small to large)
Data from simulations
Data from the internet/web

These sources can be multimodal, structured and unstructured: text, images, graphs, tables, audio, video, clinical, software, and so forth.

To be more specific, consider two scientific disciplines, not covered by the panelists, to see how AI has already been helping to make new discoveries.

In astronomy,¹⁹ scientists used DNNs to recognize galaxies and now can classify galaxies with an accuracy of 98 percent. Astronomers used AI to detect new exoplanets, to predict signatures of new types of gravitational waves, and to find a unique object that may be a remnant of two black holes merging. They used generative AI to produce a sharper image of the very first image of a black hole.

In materials science,²⁰ the design space is huge. A short polymer with 100 amino acids has on the order of 10¹³⁰ designs, more than the number of atoms in the universe. Materials scientists are using generative AI to create new material designs. For example, Markus Buehler used LLMs to create a never-before-seen design of a hierarchical mycelium-based composite.²¹ Materials scientists are exploring how to use AI to identify new equations and algorithms, to synthesize complex novel proteins that do not exist in nature, to visualize complex systems, and to predict how a new material will behave.

What’s in the Future?

Looking ahead, already the scientific community is exploring how to build foundational models for their scientific domain, which can be later fine-tuned to a specific problem or even to other domains. For example, Shirley Ho of the Flatiron Institute is leading the Polymathic AI initiative, an international and multidisciplinary team of collaborators, including experts from physics, astrophysics, mathematics, artificial intelligence, and neuroscience, to build foundational models that could be applied to a wide range of scientific problems.²² As another example, Prov-GigaPath is a whole-slide pathology foundation model pretrained on 1.3 billion image tiles in 171,189 whole slides from Providence, a large US health network.²³

Scientists can tailor concepts from the transformer architecture model and apply them to their domain. What is the analogy to predicting the next word? What are analogies to abstractions such as language, grammar, embeddings, and context? For example, a collaboration between astronomers and computer scientists are exploring “planetary linguistics” to determine whether planetary systems fall into natural categories following grammatical rules.²⁴

Although Big Data and Big Compute have been responsible for driving the Second Wave of AI, for many reasons, the scientific community, including computer science and AI researchers, should pursue what can be done with Small Data and Small Compute.²⁵ Currently only those working in a handful of big technology companies have access to the large amounts of data and compute to train and build state-of-the-art AI models; the academic community is impoverished. Can we get similar functionality with less, perhaps through cleverer algorithms? Moreover, we may not have an abundance of data in some scientific domains. Finally, building today’s models incurs enormous energy usage; building smaller models with less data could be more energy-efficient.

One direction the AI community could pursue is combining symbolic models of the past with statistical models of today.²⁶ A different hybrid approach for science is to combine machine learning with physics-based models (e.g., for simulations). For example, one aim of the National Science Foundation (NSF) Science and Technology Center “Learning the Earth with AI and Physics (LEAP)” is to reduce the uncertainty envelopes of climate model predictions using machine learning.²⁷

There are some challenges, the first of which is having enough reliable scientific data.²⁸ And finally, AI raises a new challenge to ensuring scientific integrity. To address this challenge, we not only need to educate scientists to check the accuracy of AI outputs²⁹ but also to do more research on trustworthy AI.³⁰

AI has been around for decades, but today’s AI craze has captured the fascination of the public and media. Is this another technology fad? Definitively not. The next generation will not know a world without generative AI as part of their lives, much like the current generation without the internet or smartphones. How should the scientific community respond? Understand it, embrace it, and explore with it.

Acknowledgments

My contribution on this topic is primarily in the compilation of ideas from the literature and from content in talks of others (see footnotes). I am grateful to Chris Impey for his permission to use content from his paper on AI and astrophysics, and to John Launchbury and Rebecca Willette for their permission to use content from their presentations for my talk and for this article. Additionally, I borrowed heavily from the talk by Markus Buehler in my discussion on AI and materials science.

Notes

1. Robert K. Lindsay, Bruce G. Buchanan, Edward A. Feigenbaum, and Joshua Lederberg, “DENDRAL: A Case Study of the First Expert System for Scientific Hypothesis Formation,” Artificial Intelligence 61, no. 2 (1993): 209–261, https://doi.org/10.1016/0004-3702(93)90068-M.
2. “Dendral,” Wikipedia (n.d.), https://en.wikipedia.org/wiki/Dendral.
3. Yann LeCun, Yoshua Bengio, and Geoffrey Hinton, “Deep Learning,” Nature 521 (May 27, 2015): 436–444, https://www.nature.com/articles/nature14539.
4. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” Communications of the ACM 60, no. 6 (May 24, 2017): 94–90, https://doi.org/10.1145/3065386.
5. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, et al., “Attention Is All You Need,” Proceedings of the 31st International Conference on Neural Information Processing Systems (December 4, 2017): 6000–6010.
6. Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, et al., “Sparks of Artificial General Intelligence: Early Experiments with GPT-4,” arXiv, April 2023, arXiv:2303.12712.
7. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli, “Deep Unsupervised Learning using Nonequilibrium Thermodynamics,” Proceedings of the 32nd International Conference on Machine Learning (2015), https://proceedings.mlr.press/v37/sohl-dickstein15.html.
8. John Jumper, Richard Evans, Alexander Pritzel, Tim Green, et al., “Highly Accurate Protein Structure Prediction with AlphaFold,” Nature 596 (July 15, 2021): 583–589, https://www.nature.com/articles/s41586-021-03819-2.
9. Cade Metz, Cecilia Kang, Sheera Frenkel, Stuart A. Thompson, et al., “How Tech Giants Cut Corners to Harvest Data for AI,” New York Times, April 6, 2024, https://www.nytimes.com/2024/04/06/technology/tech-giants-harvest-data-artificial-intelligence.html.
10. The dog LLM example is from Rebecca Willett’s talk at CCC@AAAS 2024 (Part One): “Generative AI in Science: Promises and Pitfalls.”
11. Sohl-Dickstein et al., “Deep Unsupervised Learning using Nonequilibrium Thermodynamics.”
12. Vaswani et al., “Attention Is All You Need.”
13. The high-level depiction of the transformer architecture and the embedding example are drawn from John Launchbury’s “The Trajectory of AI” talk on December 1, 2023: https://galois.com/blog/2023/12/the-trajectory-of-ai/.
14. “Manifold Wikipedia,” Wikipedia (n.d.), https://en.wikipedia.org/wiki/Manifold_hypothesis.
15. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig, “Linguistic Regularities in Continuous Space Word Representations,” Proceedings of NAACL-HLT (2013): 746–751, https://aclanthology.org/N13-1090.pdf.
16. LLMs technically operate over “tokens” not “words”; one can think of a “token” as a sequence of contiguous characters, including letters, punctuation, and perhaps other delimiters. In this paper, we use “word” and “token” interchangeably.
17. Vaswani et al., “Attention Is All You Need.”
18. Terry Winograd, “Understanding Natural Language,” Cognitive Psychology 3, no. 1 (January 1972): 1–191, https://doi.org/10.1016/0010-0285(72)90002-3.
19. All astronomy examples are from Chris Impey, “AI Is Helping Astronomers Make New Discoveries and Learn About the Universe Faster than Ever Before,” The Conversation, May 3, 2023, https://theconversation.com/ai-is-helping-astronomers-make-new-discoveries-and-learn-about-the-universe-faster-than-ever-before-204351.
20. The polymer example and exploratory ideas for materials science are from a talk by Markus Buehler at CCC@AAAS 2024 (Part Two): “Generative AI in Science: Promises and Pitfalls.”
21. Markus J. Buehler, “Accelerating Scientific Discovery with Generative Knowledge Extraction, Graph-Based Representation, and Multimodal Intelligent Graph Reasoning,” arXiv, June 10, 2024, https://arxiv.org/abs/2403.11996.
22. Michael McCabe, Bruno Régaldo-Saint Blancard, Liam Holden Parker, Ruben Ohana, et al., “Multiple Physics Pretraining for Physical Surrogate Models,” arXiv, October 4, 2023, https://arxiv.org/abs/2310.02994.
23. Hanwen Xu, Naoto Usuyama, Jaspreet Bagga, Sheng Zhang, et al., “A whole-slide foundation model for digital pathology from real-world data,” Nature 630 (May 22, 2024): 181–188, https://doi.org/10.1038/s41586-024-07441-w.
24. Emily Sandford, David Kipping, and Michael Collins, “On Planetary Systems as Ordered Sequence,” Monthly Notices of the Royal Astronomical Society 505, no. 2 (August 2021): 2224–2246, https://doi.org/10.1093/mnras/stab1480.
25. Jeannette M. Wing and Michael Wooldridge, Findings and Recommendations of the May 2022 UK-US AI Workshop (National Science Foundation and Engineering and Physical Sciences Research Council, May 3–4, 2022), https://www.cs.columbia.edu/~wing/publications/WingWooldridge2022.pdf.
26. Wing and Wooldridge, “Findings and Recommendations of the May 2022 UK-US AI Workshop.”
27. “About Learning the Earth using Artificial Intelligence and Physics (LEAP),” National Science Foundation Science and Technology Center Program, 2021, https://leap.columbia.edu/about/.
28. Jennifer Listgarten, “The Perpetual Motion Machine of AI-Generated Data and the Distraction of ChatGPT as a ‘Scientist,’ ” Nature Biotechnology 42 (January 25, 2024): 371–373, https://doi.org/10.1038/s41587-023-02103-0.
29. Wolfgang Blau, Vinton G. Cerf, Juan Enriquez, Joseph S. Francisco, et al., “Protecting Integrity in the Age of Generative AI,” Proceedings of the National Academy of Sciences 121, no. 22 (May 2024).
30. Jeannette M. Wing, “Trustworthy AI,” Communications of the ACM 64, no. 10 (October 2021): 64–71; David “davidad” Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, et al., “Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems,” arXiv, May 10, 2024, https://arxiv.org/abs/2405.06624.

4. We’ve Been Here Before: Historical Precedents for Managing Artificial Intelligence

Show the following:

Adjust appearance:

Notes

CHAPTER 3 Science in the Context of AI

Part 1: AI Time Line

Part 2: Generative AI Architecture

Part 3: AI for Science

What’s in the Future?

Acknowledgments

Notes

Annotate