Oops!Zencastr was unable to start because Javascript is disabled
To fix this problem, check your browser's settings and enable Javascript

Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Concrete Problems In AI Safety With Dario Amodei And Seth Baum

Future of Life Institute Podcast

85 Plays8 years ago

Interview with Dario Amodei of OpenAI and Seth Baum of the Global Catastrophic Risk Institute about studying short-term vs. long-term risks of AI, plus lots of discussion about Amodei's recent paper, Concrete Problems in AI Safety.

Recommended

Facing Superintelligence (with Ben Goertzel) image

Facing Superintelligence (with Ben Goertzel)

Future of Life Institute Podcast

01:32:33·9 days ago

Will Future AIs Be Conscious? (with Jeff Sebo) image

Will Future AIs Be Conscious? (with Jeff Sebo)

Future of Life Institute Podcast

01:34:27·16 days ago

Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz) image

Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz)

Future of Life Institute Podcast

01:35:09·23 days ago

Inside China’s AI Strategy: Innovation, Diffusion, and US Relations (with Jeffrey Ding) image

Inside China’s AI Strategy: Innovation, Diffusion, and US Relations (with Jeffrey Ding)

Future of Life Institute Podcast

01:02:32·1 month ago

How Will We Cooperate with AIs? (with Allison Duettmann) image

How Will We Cooperate with AIs? (with Allison Duettmann)

Future of Life Institute Podcast

01:36:02·1 month ago

Brain-like AGI and why it's Dangerous (with Steven Byrnes) image

Brain-like AGI and why it's Dangerous (with Steven Byrnes)

Future of Life Institute Podcast

01:13:13·1 month ago

How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil) image

How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil)

Future of Life Institute Podcast

01:34:33·2 months ago

Special: Defeating AI Defenses (with Nicholas Carlini and Nathan Labenz) image

Special: Defeating AI Defenses (with Nicholas Carlini and Nathan Labenz)

Future of Life Institute Podcast

02:23:12·2 months ago

Keep the Future Human (with Anthony Aguirre) image

Keep the Future Human (with Anthony Aguirre)

Future of Life Institute Podcast

01:21:03·2 months ago

We Created AI. Why Don't We Understand It? (with Samir Varma) image

We Created AI. Why Don't We Understand It? (with Samir Varma)

Future of Life Institute Podcast

01:16:15·2 months ago

Why AIs Misbehave and How We Could Lose Control (with Jeffrey Ladish) image

Why AIs Misbehave and How We Could Lose Control (with Jeffrey Ladish)

Future of Life Institute Podcast

01:22:33·3 months ago

Ann Pace on using Biobanking and Genomic Sequencing to Conserve Biodiversity image

Ann Pace on using Biobanking and Genomic Sequencing to Conserve Biodiversity

Future of Life Institute Podcast

00:46:09·3 months ago

Michael Baggot on Superintelligence and Transhumanism from a Catholic Perspective image

Michael Baggot on Superintelligence and Transhumanism from a Catholic Perspective

Future of Life Institute Podcast

01:25:56·4 months ago

David Dalrymple on Safeguarded, Transformative AI image

David Dalrymple on Safeguarded, Transformative AI

Future of Life Institute Podcast

01:40:06·4 months ago

Nick Allardice on Using AI to Optimize Cash Transfers and Predict Disasters image

Nick Allardice on Using AI to Optimize Cash Transfers and Predict Disasters

Future of Life Institute Podcast

01:09:26·5 months ago

Nathan Labenz on the State of AI and Progress since GPT-4 image

Nathan Labenz on the State of AI and Progress since GPT-4

Future of Life Institute Podcast

03:20:04·5 months ago

Connor Leahy on Why Humanity Risks Extinction from AGI image

Connor Leahy on Why Humanity Risks Extinction from AGI

Future of Life Institute Podcast

01:58:50·6 months ago

Suzy Shepherd on Imagining Superintelligence and "Writing Doom" image

Suzy Shepherd on Imagining Superintelligence and "Writing Doom"

Future of Life Institute Podcast

01:03:08·6 months ago

Andrea Miotti on a Narrow Path to Safe, Transformative AI image

Andrea Miotti on a Narrow Path to Safe, Transformative AI

Future of Life Institute Podcast

01:28:09·7 months ago

Tamay Besiroglu on AI in 2030: Scaling, Automation, and AI Agents image

Tamay Besiroglu on AI in 2030: Scaling, Automation, and AI Agents

Future of Life Institute Podcast

01:30:29·7 months ago

Transcript

Introduction to AI Research Focus

00:00:04

Speaker

from the FLI audio files. I'm Arielle Kahn with the Future of Life Institute. This summer I watched all four of the White House symposia on AI research, which were designed to give AI researchers and others in the field a chance to discuss what the status of AI research is now, how it can help move society forward, and what risks we need to figure out how to address. There were many, many hours of talks and discussions, and I couldn't help but notice how often researchers got up

00:00:35

Speaker

and were very adamant that talks should only be about short-term AI issues. They insisted that long-term AI concerns weren't something that needed to be addressed right now. They didn't want to worry about advanced AI and they definitely didn't want to think about superintelligence. But then, as soon

Long-term AI Concerns: Necessary or Not?

00:00:51

Speaker

as they started talking about their research, many of the issues that came up were related to things like control and transparency and bias.

00:00:58

Speaker

Now these are obviously issues that need to be addressed for short-term, narrow AI, but they're also concerns that need to be addressed for long-term, more advanced AI. And so I started to wonder why we were so worried about focusing on short-term versus long-term artificial intelligence. Now somewhat famously, in the AI world at least, Andrew Ng, who is with Baidu, has compared worrying about long-term artificial intelligence issues to worrying about overpopulation on Mars.

00:01:25

Speaker

And I guess my reaction is that overpopulation is an issue, and it's an issue that we need to address. And if we can solve overpopulation issues now so that we don't have to worry about them later, why would we not do that? I realize there

Introducing Key Figures in AI Safety

00:01:40

Speaker

are probably some cost issues or other reasons that planning ahead is more difficult, but it seems like a really strange stance to me that we shouldn't try to solve a current problem now so that it doesn't crop up again in the future.

00:01:53

Speaker

I figured the best way to try to understand what's happening would be to turn to two people in the field. So I have with me Dario Amade, who had been working at Google Brain, where he published a well-received paper titled Concrete Problems in AI Safety. And he's recently moved to OpenAI, which is focused on AI research and safety issues, as well as on understanding the social implications of AI development.

00:02:16

Speaker

I also have Seth Baum, executive director of the Global Catastrophic Rift Institute. Much of Seth's own research is on ensuring AI safety. Seth, I'll turn this over to you now. Thanks Ariel, and these are really good questions. We saw it at the White House symposia and attended one of those. We also see it in a lot of different conversations across the AI world. I think it's a really important one because while we might have some reasons,

00:02:44

Speaker

for being especially interested in or concerned about the long-term AI issues. At the same time, a lot of people are just more focused on short-term questions. And so it's fair to ask what we can do to get action that works for both short-term and long-term AI issues. And so that's why I think it's good to have this conversation, especially good to be having this conversation with Dario Amade, who has

00:03:11

Speaker

recently published

Challenges in AI Safety: Real-world Examples

00:03:12

Speaker

a paper on this topic or rather on the topic of AI safety in general and the issues that come up in AI safety, whether it's short-term or long-term safety. Now Dario, I want to get to the paper in a second, but first as a little background for our listeners, maybe you could say a little bit about how you became interested in the AI safety topic in the first place.

00:03:36

Speaker

Sure. I've actually been working in the AI field itself for only a couple of years. And, you know, before coming to open AI, I worked at both Baidu and Google and, you know, was drawn to the field by what I see as kind of the incredible advances, particularly in deep neural networks to solve problems like vision, speech, and language.

00:03:54

Speaker

And I was involved in research in all of these areas and found it very exciting. But one thing I definitely noticed, I think particularly with deep neural networks and with powerful ML systems in general, is that there are many ways in which they can be very accurate. A speech recognizer can tell you something and can be almost as good as a human, but they can also be somewhat brittle. If I train speech data on

00:04:16

Speaker

on humans speaking in a clean background with unaccented American speakers, they'll perform great. But then if I test the same data on accented speech or noisy data, it performs terribly. And as the systems get deployed more into the world, having systems that fail unpredictably is not a good thing. And I think that impression was reinforced

Risks of Reinforcement Learning in AI

00:04:36

Speaker

as I continued my work at Google, where you have issues with Google photo systems, which was based on a neural net classifier, ended up

00:04:44

Speaker

accidentally classifying people of color as gorillas, which of course is an incredibly offensive thing to do. Now the neural net didn't know that it was offensive. You know, it was a combination of a problem with the classifier and a problem with the training data, but that machines can lack context.

00:04:59

Speaker

for something, but the classifier that's produced by the machines can be something that has some very bad real world impacts. And if it does something that is not what we intended to do, it can be very harmful. And then I think in the last year, I've become particularly interested in reinforcement learning, which is.

00:05:15

Speaker

a branch of machine learning that's concerned with interacting with the environment in a more intertwined way, and is often used in things like robotics, self-driving cars, autonomous systems. We've seen there was a recent announcement from Google that it's used to control the power in their data centers. So once you're actually interfacing with the world directly and controlling direct physical things, I think the potential for things to go wrong, which often are quite mundane, starts to increase. So I became more and more interested in if there were principled ways

00:05:44

Speaker

to reduce the risk or find some theoretical basis for guaranteeing that something bad won't happen as we start to deploy these systems more and more into the world. That was kind of where sort of my interest in AI safety started. And certainly there is the thought that these systems are advancing very quickly and the more powerful they get, the higher the stakes are for something that might potentially go wrong. So as someone who really, really wants to think about the social impacts of what I'm working on, this seemed like just a very important area.

00:06:15

Speaker

Yeah, that makes a lot of sense. And it does seem that if you look at the systems themselves and what they're doing, that it just naturally follows that they can fail in these types of ways. And we saw it with the Google Gorilla incident, which is kind of a classic example of,

Consolidating AI Safety Research

00:06:34

Speaker

as you put it, the AI system failing unpredictably. And a lot of my research is on the risk and especially the policy end of AI.

00:06:43

Speaker

And the same issue comes up in that context because who do you hold liable? Do you really hold liable the company or the computer programmers who built this code because they didn't want to do that. And so for the legal management of these types of software, it's a challenge because you don't want to hold these people liable. They weren't intentionally causing these harms yet at the same time, because these are systems that are almost by design.

00:07:11

Speaker

bound to behave in unpredictable ways, then we're just going to see this happen more and more. My impression at least is that that is a little bit of your motivation for you wrote a paper recently called Concrete Problems in AI Safety that seems to take on some of these topics. Maybe you could say a little bit more about the paper itself and why you wanted to write it and how it contributes to these issues.

Categorizing AI Safety Issues

00:07:35

Speaker

Yeah. Yeah. So as I got more and more interested in these safety issues, you know, I did what any researcher does and looked into the literature that had been written in the machine learning literature so far about these problems. And there actually was a fair amount of literature on various subsets of the set of things I was worried about.

00:07:56

Speaker

And, you know, in many cases there was a substantial literature, but there was, I would say, kind of a scattered nature to it, where, you know, and I'll get into some of the specific problems that I talked about in the paper in a bit, but the four or five different problems that I was kind of thinking in my head are ways to classify things that could go wrong with an ML system. We're often written in kind of parts of the literature that were not very related to one another, and often were kind of very specific to particular applications. And so,

00:08:25

Speaker

I felt that what would help a lot would be something that was kind of a combination of reviewing all the existing work that has been done in one place, and also kind of having an agenda that talks about what needs to be done in the future. In particular, a lot of this work had been done quite a while ago, and so really writing this review and agenda with a view towards

00:08:48

Speaker

the cutting edge ways in which machine learning has advanced in the last three or four years with neural nets doing vision, speech, language, game playing, autonomous driving, a bunch of other applications that I felt like a lot of the thinking about making systems safe and controllable and robust and predictable could really use an update in light of a lot of these advances. And then there was kind of another stream where for a while I'd been aware of

00:09:14

Speaker

The work of people like Nick Bostrom and Eliezer Jutkowski, who come from outside the ML field, have been warning about these very long-term considerations involving AIs that are smarter than humans. I've read their work and my attitude towards it has always been that I found it thought-provoking. But if as a researcher, I wanted to think about ways in which machine learning systems can go wrong,

00:09:39

Speaker

It's very important, at least for now, to stick to the kinds of systems we can build today. And then if future scenarios do come up...

00:09:46

Speaker

then we're better equipped to think about those scenarios. So between those two poles, that there was a lot of existing literature, but that I felt needed to be drawn together a little bit. And that there was more high level kind of vision-y thinking about the far future that I felt there's kind of a middle space of thinking in a principle, but much more concrete way about the systems that we're building now are likely to build in the next few years. And what general lessons can we learn about how ML systems go wrong that way. And that was the thought.

00:10:14

Speaker

with which he sat down and wrote the paper and then it turned out that not just me but you know there were several other authors in the paper so my main co-author was Christopher Ola also on the Google Brain team who's done a lot of work on visualization and blogs on teaching machine learning to a wide audience.

00:10:31

Speaker

Then we had some collaborators from Berkeley, Paul Cristiano, from Stanford, Jacob Steinhardt, from OpenAI before I joined John Chulman and another Google brain researcher Dan Manet. We've all found that we have the same general perspective and vision on it relative to what happened in this space before. So we all got together and spent a bunch of time and eventually we produced this paper. Good. So perhaps you could take us into the paper a little bit. There are five concrete problems. You have

00:11:01

Speaker

avoiding negative side effects, avoiding reward hacking, scalable oversight, safe exploration, and robustness to distributional shift, which makes sense when you get into the paper, but just hear them spoken like this. It's kind of vague, kind of ambiguous. Maybe we could go one at a time through each of these five problems and you could explain just in basic terms what they mean. So starting with avoiding negative side effects.

00:11:29

Speaker

So actually, before going into the five problems in detail, one frame that I thought of as a little useful for thinking about the problem that we describe a little bit in the paper is that we actually split the five problems into three general categories. And I at least have found it useful to start by thinking about those categories and then going into the problems. If you're trying to build a machine learning system, one of the most important pieces of a machine learning system is the objective function, which really defines

00:11:56

Speaker

the goal of the system or the way of judging whether it's doing what you want it to do. For example, if you're building a system that's supposed to recognize speech, then you might measure, you know, what fraction of words it gets correct versus incorrect, the word error rate. If you're building an image classification system, the objective function is the fraction of time that it identifies the image as being in the correct class. If you're

Exploring AI Safety Problems

00:12:19

Speaker

building a system to play Go, like the AlphaGo system that DeepMind worked on, then it's a fraction of games that you win or the probability

00:12:26

Speaker

that you win a game. So in thinking about when you build a machine learning system and for whatever reason the machine learning system behaves in some way that you didn't intend and you don't like, one of the ways I found it useful to think about exactly where in the process things go wrong. So I think the first place that things would go wrong is if

00:12:44

Speaker

your objective function wasn't actually right. And I'll go into that in a little bit of detail later. But the idea is that, you know, you were putting pressure on the machine learning system to do a particular thing and rewarding it for doing a particular thing. And unbeknownst to you, that was actually the wrong thing to do. And the system ends up behaving in a harmful way because you had in mind that it would behave a certain way. You tried to formalize that with an objective function. The objective function was the wrong objective function.

00:13:14

Speaker

The second class is if you do know the objective function, but it's very expensive to evaluate the objective functions. This might be human judgment or we have a limited budget of a human supervising and checking in with the AI system and the AI system has to extrapolate. And so it might do the wrong thing because it hasn't fully seen

00:13:34

Speaker

what the correct objective function is. And the third problem is that a system has the right objective function, but all machine learning systems, as indicated by the phrase machine learning, have to learn. And there is a concern that

00:13:47

Speaker

While a system is in the process of learning, while it doesn't understand the world around it or the tasks that it's supposed to do, it could do something harmful while it's not fully trained. That at least to me from the perspective of a researcher has been kind of a natural way to decompose the problem. So now maybe with that, we can kind of go into the five problems. Before we dive into the five, let me just try speaking these back at you to make sure that I understood them correctly. The first one was just the wrong objective function that's

00:14:14

Speaker

You gave it goals that in retrospect turned out to be goals that you're not happy with. And it's fulfilling those goals despite the fact that you wish it was fulfilling some other goals. Correct. Now the second one you said the objective function is expensive to evaluate. That means you gave it good goals and it is working towards those goals, but it's struggling. And instead of fulfilling those goals, it's doing something else that's causing harm.

00:14:44

Speaker

Right, because it has a limited ability to assess what the right goal is. So the right goal might be, is a human happy with what I'm doing? But I can't, in every little action that I'm taking, ask if a human is happy with what I'm doing. And so I might need some kind of cheaper proxy that I think predicts whether a human is happy. And if that goes wrong, then the system could do something unpredictable. Okay, good. And the third one, you said harms that occur during the training process.

00:15:13

Speaker

Yes. Maybe say what is this training process and how can you get harms during it?

00:15:18

Speaker

Sure. So I mean, we can go into that more at the two problems in that sub category, but the general idea is that you can kind of think of a child when they don't really understand the world around them. If they don't know that, you know, if they press this button on a stove, a fire will turn on, it might burn them. It might burn someone else. The child might be using the right process to learn about the world, but if they've never touched the stove before, they don't know what's going to happen and they don't know enough to know that they might hurt someone. Okay. Makes sense to me.

00:15:47

Speaker

Shall we go ahead and dive into the five? Yeah. So the first one that you asked about was avoiding negative side effects. So this is one problem under the subcategory of having the wrong objective function. One way to introduce this is to say that in some sense, machine learning systems are very literal minded, right? If you tell them to do X, they'll do kind of exactly and literally X. So the example we gave in the paper was, let's say I have a cleaning robot that is trying to, you know, move a box from one side of a room to another.

00:16:16

Speaker

And just in this kind of very simple example, if all it was trying to do was move the box, I might give it an objective function that basically said you get points, you get reward for moving the box from one side of the room to the other. And that's all that matters to you. But if we literally just give it that one objective, then actually we're implicitly telling it that anything else that's in its environment, it doesn't care about. So if there's a vase in its path.

00:16:41

Speaker

it doesn't in any way, according to its objective function, get penalized for knocking over the base. So it's likely to just walk from one side of the room to the other and may just knock over the base and kind of won't care about this because it's not in its objective function. And you can generalize this to, you know, if I have a robot and it's really very focused on accomplishing a particular task, moving a particular thing, but the world is big and

00:17:05

Speaker

humans, when we walk around and perform a task, I have to make very sure that I'm driving my child to school, that I don't run someone over or break the car or do something else, right? I'm never explicitly just doing one task. I'm always doing one task and making sure that the rest of the world is, is okay. And I don't do anything really damaging while I'm doing it. And as a human, I have common sense and I know this, but our, our machine learning systems, at least at present are not at the place where they have common sense. And so.

00:17:32

Speaker

I see a potential for various things to go wrong in this case. That to me sounds like the classic genie story, right? So that the genie in a bottle, you rub it, you get your wish and it gives you your wish, exactly what you ask for, whether, whether you like it or not. And what you're saying is that a machine learning AI system has that same sort of taking things very literally.

00:17:58

Speaker

or could at least the systems that we built today often have that property. I mean, I'm hopeful that someday we'll be able to build systems that have more of a sense of common sense. We talk about possible ways to address this problem, but yeah, I would say it is like this genie problem for the specific case of one of the specific things

AI Adaptability and Learning Challenges

00:18:16

Speaker

that can go wrong is that the world is big and that it's very easy when you're training the machine learning system to focus on only a small aspect of it. And that gives you a whole class of things that can go wrong.

00:18:28

Speaker

Okay. Sounds good. So let's move on to the second one, avoiding reward hacking. What's that all about? So reward hacking is this kind of situation where you write down an objective function and it turns out that your objective function was kind of trying to capture the ability to do something hard, where you felt like in order to achieve this objective function, but we need to do some hard tasks. That is the task you're trying to get the machine learning system to do.

00:18:57

Speaker

But often there's some way of cheating the objective function that's been written down. So the example we give in the paper is if you have a cleaning robot that's kind of trying to clean up all the messes it can see, and you decide to give it an objective function that says, well, how much dirt can you see around you, right? You might think that would be a good measure of how clean the environment is.

00:19:18

Speaker

Then the cleaning robot, if it were designed in a certain way, could just decide to close its eyes. Or it could decide to kind of shovel a bunch of messes under a desk or into a closet. And in fact, this isn't limited to machine learning systems. This is a problem we have with humans as well, right? If I hire a cleaner, most cleaners are honest, but you know, in theory, if I didn't check, if I hired a dishonest cleaner, they might find it easier to just shove all the messes in my house into some closet that they think I won't look in.

00:19:45

Speaker

And, you know, again, there's this thing that machine learning systems can be literal minded, at least in the way we designed them today. And there's all types of things that can go wrong. And in the paper, we discuss some general factors that lead to this. And one factor is when the robot or the machine learning system is not able to see everything in its environment. It's very easy for the wrong kind of objective function to give it incentives to hide aspects of the environment from itself or others.

00:20:12

Speaker

like the shoveling things into the closet case. So we discuss a few of the general ways this can happen and a few thoughts and recommendations for designing objective functions where this is less likely to happen. I'm reminded of those cute pictures you can see on the internet of kids taking quizzes and it would be something like there'll be this long difficult math problem and then below it it says write the answer here and the student just writes the words the answer below it

00:20:40

Speaker

And tries to pass it in because they don't know how to actually do the math problem. Yeah. We need to get our AI systems to not behave like mischievous little kids. Yeah. There was the blogger who wrote about our work. I think it was Cory. Toro.

00:20:56

Speaker

And he said, a lot of the problems that I've read in this paper remind me of issues in, you know, child development and child psychology. And there is a sense in which a lot of these systems are kind of a bit, a bit like savants, right? They're like small children who don't have enough context or common sense to know how to do quite the right thing, but are at the same time, very, very voracious learners who can, who can process a lot of information. So I do see a lot of commonalities there.

00:21:21

Speaker

Makes sense. Okay. So let's move on. The next one is scalable oversight. Yeah. So the kind of the example we give there is.

00:21:31

Speaker

Let's say that we have a cleaning robot again and it's trying to clean up a room, but there are some objects in the room that might belong to a human. And there's a surefire way to get the right objective function, which is if every time you find an object, you ask a human, you ask the human, does this object belong to you? You ask every human who could possibly own it, does this object belong to you? Then you'll always do the right thing.

00:21:52

Speaker

But a robot that does that is impractical. No one would sell any robots to do that because if that's going to happen, I might as well just clean it myself, right? I don't want my robot asking me questions every two minutes while it's trying to clean my house. So can we find ways where the robot maybe only asks me like the first few times and gets a good sense of what kind of stuff I would actually own versus what kind of stuff that's okay to throw away. Maybe looks accused of where I leave the stuff. And so the way to state the problem is if the robot tries to do this, there is the risk that it will throw away things

00:22:22

Speaker

that I really would have wanted. And the solution is, are there ways we can get the robot to think about, from repeated experience, being able to predict the true objective function, which is what really belongs to me and what I really want thrown away without actually having to ask me every time, which might destroy the economic value of the system. Yeah, that seems like something that

00:22:43

Speaker

humans face all the time, right? You're cleaning. Do you really know whether they want that thrown away or not? I mean, sometimes it's a candy wrapper. It should be pretty obvious unless it's a, maybe it's a candy wrapper with a winning lottery ticket printed on the inside and some prize competition that the candy company had, but it seems really easy for robots or AI systems to make the same sorts of mistakes. Yeah.

00:23:07

Speaker

And do you think it's particularly difficult to train an AI to get those sorts of questions right? I mean, that seems like maybe, maybe I don't know the systems well enough, but that seems like something that we should be able to train an AI to figure out without too much difficulty. In the particular case of the cleaning robot, I'm pretty optimistic, but I think, you know, it's designed more to be kind of a parable that illustrates that often

00:23:33

Speaker

There are aspects of human preferences that are quite subtle and that getting all of them right without an unacceptable amount of communications with humans or halting the workflow of the machine learning system can often be quite subtle. A human might be able to look at something like a concert ticket. And if the day for the concert ticket was yesterday, then they might just know that it was okay to throw it away. But if the date was tomorrow, then they would say, oh, that's really valuable. This is something someone's going to use.

00:24:02

Speaker

So there are just a lot of subtle things like that, that I think take some work to get right. Okay. Sure. Now let's move on. The next one is safe exploration. What's that? Yeah. So this is actually, this is a problem that's been worked on a lot in the machine learning community. And so our work here was more to summarize prior work, but also point towards how work in this area could be integrated with a lot of the advances that we're seeing in robotics and whether it's possible to step up the reach of work in this area. So.

00:24:31

Speaker

The basic idea here is that, particularly in reinforcement learning, which I mentioned earlier is a branch of machine learning that deals with

00:24:38

Speaker

systems that interact with the environment in a very intertwined way. There's a trade-off between exploring and exploiting, between doing the thing that I think is best right now and understanding my environment better, which might lead to me understanding it better and realizing that there are even better things that I can do. But the problem is that when I'm exploring an unknown environment, often, you know, there are aspects of it I've never dealt with before, so I can do something dangerous without knowing what I'm doing.

00:25:05

Speaker

The example we gave with the Queen robot is, you know, maybe it's never seen an electrical outlet before and so

00:25:10

Speaker

It wants to experiment with cleaning strategies and it tries to stick a wet mop in the electrical outlet. So obviously this is just going to be really bad for the robot. Another example that has actually come up in actual robots people have built is robot helicopters. So the idea is, you know, if I want to like train my robot helicopters, fly properly. I want to use reinforcement learning to train how to fly. One problem it can have is that, you know, if it's experimenting with spinance propellers, it doesn't really understand the dynamics of flying very well.

00:25:39

Speaker

If it does something bad and ends up crashing, it could break its propeller or break its control system or something. And then, then you can't use the robot helicopter anymore, right? The system is broken and you need to get a new one and the designers of the system won't be very happy. And yet the system needs to learn somehow. And so again, this is a problem children encounter, right? Children need to, need to try things on their own to understand what works and what doesn't. But it's also very important that they don't do things that are truly dangerous that they couldn't recover from if it went wrong.

00:26:08

Speaker

To some extent, children have an instinct for this. Part of the role of parents is to keep children from going into truly dangerous situations, but it's something that our machine learning systems currently grapple with and I think are going to need to grapple with more and more. That seems like something that we all grapple with on a pretty regular basis for myself as an academic. I'm constantly worrying about whether I'm spending too much time thinking about stuff.

00:26:37

Speaker

And researching it and learning more and so on versus just going ahead and writing what I have and forming an opinion and getting out there and saying what I have to say on the topic. And it's, it's a dilemma, right? How hard do we try to figure things out before we do things? And the tech community, they have that, the old Facebook saying move fast and break things, which.

00:27:02

Speaker

for some context might work well and for other contexts does not work so well. And actually Facebook has changed it. It's now moved fast with stable infrastructure, something that sounds more responsible and not nearly as catchy. And so yeah, I guess an AI would have to face the same sorts of issues, right?

00:27:25

Speaker

Yeah. I mean, I think the problem of machine learning and AI is to get machines to do some of the same tasks that humans do. So in some sense, I

Impact and Reception of AI Safety Research

00:27:33

Speaker

think it's not surprising that in doing some of the same tasks humans do, they run into a lot of the same problems that humans do. Okay. So we got one more of the concrete problems and it's called robustness to distributional shift.

00:27:47

Speaker

Yeah, this is the idea that often if you have a machine learning system, you know, we have the notion of kind of training data and test data. So a machine learning system often gets trained on one particular type of data, but then when it's deployed in the real world, it often finds itself in situations or faced with data that might be different from the data that it was trained on. So our example with

00:28:10

Speaker

The robot is, let's say we have a robot that's been trained to clean factory work floors and it's kind of learned that, you know, it should use harsh chemicals to do this that needs to avoid lots of these metal obstacles. If you then deploy it in an office, it might engage in some behavior that's inappropriate, right? It might use chemicals that are too harsh. It might not understand how the office is set up.

00:28:32

Speaker

et cetera, et cetera. I think another example of this actually is the gorilla example that occurred with Google a year ago, where one of the problems with that photo captioning app was that a lot of its training data had been trained on Caucasian individuals and it had seen monkeys, but it had never seen an individual with different skin color. And so it made very inappropriate inference.

00:28:56

Speaker

based on insufficient training data. Our interest in robustness to distributional shift is in trying to both detect and remedy situations where you're seeing something that is different than what you've seen before, right? The photo caption system should have said, this is something that

00:29:15

Speaker

doesn't actually look like anything that I've seen before in any of the classes that I've seen before. It's actually something different and I should be very careful of what class I assign this to because I don't have high confidence about the situation and I'm aware that I'm facing data that's different from the data that I was trained on. It's not always possible to respond appropriately to a totally new situation or

00:29:39

Speaker

totally new percepts that I might receive, but it seems like it is possible to recognize that what I'm seeing is different from what I've seen before. Though that section of the paper discusses how to recognize that and how to be appropriately cautious once you've recognized it. It's really interesting to me just listening to you talk through these different challenges of designing an AI to behave the ways we would want it to behave. How similar to me at least it sounds like

00:30:09

Speaker

child development and human behavior and challenges that we all face. It makes me feel like these artificial intelligence systems are already not so different from us, at least in these types of ways. Well, I would definitely say that certainly the systems that we're building today are very limited systems. I do want to emphasize that we are nowhere near building systems that can replicate the incredible range of behaviors that humans are capable of.

00:30:37

Speaker

However, I would say that within the particular tasks that we assign to a machine learning systems, yeah, I think some, many of the problems that they face in learning those specific tasks often not always have analogies to, you know, the challenges that humans face in learning those tasks. Okay. So they're still not at all as capable as we are across the board, but I still face some of the same challenges we do. Yeah. Okay. Very good.

00:31:07

Speaker

So I want to bring it back to the conversation we started out with, which is on what we saw at the White House symposium, what we see in other contexts about this short-term AI versus long-term AI issue, and people care more about short-term and long-term and so on. With that in mind, I'm curious what the reaction has been to your paper, right? Do people say, oh, this is crazy? Or do these ideas seem reasonable to them?

00:31:34

Speaker

Yeah, actually the reaction has been extremely positive, more so than I even anticipated. There's a few different communities that read and looked at our work. The first was because it was published on the Google research blog and ended up getting covered by the media. And their reaction was actually quite positive. I mean, most of the stories had titles like Google gets practical about AI concerns or Google addressing AI concerns responsibly. The idea was that it was kind of a set of serious engineers who

00:32:02

Speaker

or really sitting down to think about just very specifically what can go wrong in machine learning systems and how can we prevent those things from happening so that machine learning systems can benefit everyone. And it was very much that and not the kind of alarmist terminator robots are going to kill us all kind of thing. So I feel the media understood actually pretty well, which surprised me a little bit.

00:32:22

Speaker

What about from the AI community? Yeah. So the AI community was also extremely positive. So even people like Oren Ezioni, who have been local spokespeople against more long-term concerns about AI and risks of AI, you know, they were very positive about this paper. Oren Ezioni was quoted in one of the news articles as saying, these are the right people asking the right questions. And I think a lot of the doubts that people like Ezioni have had about

00:32:50

Speaker

long-term AI risk has just been a kind of vagueness. Well, how do you work on this? And I think the reaction to our paper was very positive because it gave problems posed in a way where you could actually sit down and write a paper on it. And then actually, by the way, we intend to follow up the paper. This was more of kind of an agenda paper, but we intend to follow it up by writing papers, trying to address these problems in actual real systems. And you know, that's one of the things I'm working on at, at open AI, and we're going to continue to collaborate with, with Google on it. So I think that.

00:33:18

Speaker

the concreteness part of it and the practicality and the promise of real empirical work that I hope we can deliver on made a lot of the AI community seem pretty excited about it. And then finally, the community of people that have been worried about long-term AI risks.

00:33:36

Speaker

I think the reaction there was pretty positive as well, even though the focus of the paper was on shorter term issues. As Ariel pointed out, the beginning of this conceptually, a lot of the longer term risks that they're worried about can be seen of as instances of some of the problems we've talked about, particularly the negative side effects and reward hacking, but actually all of them.

00:34:03

Speaker

are things that when I try and think about what someone like Nick Bostrom is talking about, I think what they're talking about is the kinds of problems we're talking about in the concrete problems paper if you had those problems with an extremely powerful AI system that was even more powerful than humans.

00:34:20

Speaker

then I think that's how you get to some of the scenarios that Bostrom is describing. And I

Social Impact and Responsibility in AI Research

00:34:25

Speaker

definitely have a disagreement with the AI safety community, which is it's not that I don't think we may face these extreme scenarios eventually, and I'm glad that there's someone thinking about them. But I at least am most interested in thinking about

00:34:40

Speaker

problems that we can attack empirically today. And I hope that those problems that we attack today will start to shed light on the longer term issues that we have. Again, if we really work on things like reward hacking and avoiding negative side effects, I think it'll be work on them in the right way.

00:34:56

Speaker

There'll be a lot of relevance to the scenarios that people worried about, AI risk are worried about, and that eventually many things they're talking about are the things they write about. Maybe they'll become very relevant someday. But my difference is more tactical, that I just see a great deal of importance to having an empirical feedback loop. To saying, this is a problem I think a system might have, let me test it. Oh, it has this part of the problem, but not this part of the problem, let me do another iteration on it.

00:35:22

Speaker

Just in research and science in general, I feel we've gotten a lot of mileage out of the empirical feedback loop. And so that's something that I emphasize a lot. I'm really glad to hear that the response has been so positive. This seems to me like the sort of clever solution that we need for problems like AI safety that can resonate across seemingly disparate audiences. And we've had a lot of disagreement between the people who are worried about the future superintelligence risk

00:35:51

Speaker

versus the AI researchers who are out there building the new systems today. And my impression is that the difference of opinion between these two groups only goes so far in that both of them, as far as I can tell, genuinely do care about the social impacts of their work. It might not be the core focus of everyone's attention. And I think this is an issue that needs to be addressed within the AI community.

00:36:20

Speaker

Stuart Russell has a great line about how the AI community needs to take social impacts more seriously, compares it to civil engineers. He says, no one in civil engineering talks about building bridges that don't fall down. They just, they just call it building brickage, right? Because in civil engineering, everyone takes it for granted that the social impacts of their work really matters. Whereas in AI, according to Stuart Russell, that's less the case. But the impression I've had.

00:36:49

Speaker

listened into AI research is a lot of them do actually care about the social impacts of their work. They're just not sure about the super intelligence thing. It's remote into the future. It's speculative. Maybe even sounds a little odd. And it's just so, so far removed from the systems that they're working on. And so to see opportunities to address the general AI safety concerns that may also

00:37:15

Speaker

be relevant to super intelligence, but are very much relevant to the systems that people are building today. It makes sense to me that they would respond positively to that sort of message. And I wonder if there's anybody, especially within the AI research communities that is pushing back against it and saying, no, we should just focus on building AI systems that are more capable and we shouldn't worry about these safety problems. Have you gotten that at all?

00:37:42

Speaker

I don't think I've ever had someone say that specifically to me. I think there is probably a healthy debate to some extent in the machine learning community about which social impacts we should care the most about. Some of my colleagues are very interested, and I am actually too, in kind of things like the economic impact of machine learning systems or in fairness. And I think definitely people differ in

00:38:06

Speaker

how much they choose to focus on each of these issues. But I haven't really encountered anyone who says, oh, we shouldn't think about any of these issues, or who says, this is the only issue that we should think about. I think when properly explained the risk of AI systems doing things that we didn't intend, everyone says, yes, that's something we should prevent. The risk of AI systems treating people unfairly, that's something, yes, we should prevent.

00:38:29

Speaker

the risk of bad economic impacts of AI. Everyone says, yes, that is something we should prevent. Internet security issues that could arise with ML. Everyone says, yes, that's something we should definitely prevent. Different people are interested in working on these to different extent. And for some people, this isn't a personal research interest of them. But I actually haven't encountered anyone who says, no, I don't think anyone should work on these things. Maybe such people do exist, but I haven't met any. And maybe that's the bigger challenge with this is not people who

00:38:59

Speaker

actively push back against work on these problems, but people who just essentially ignore it. I remember from my own engineering days, and I really liked to bring up social impacts and social issues related to our research. My fellow engineers, they would listen to me and then they would basically be like, okay, that's nice, now get back to work. Because in their minds, thinking about the social aspect of it was someone else's job.

00:39:27

Speaker

And so it's easy to imagine AI researchers not really disagreeing with the sorts of things that you're saying, but just thinking maybe this is somebody else's responsibility to worry about it. Do you see that at all? I have definitely heard people say that, but to be fair, I don't have a huge objection to some fraction, maybe even a large fraction of the field having the attitude, right? I mean, I think research is.

00:39:51

Speaker

a process of specialization and not everyone can work on everything. If your attitude is, I just want to make a better speech system. I know that machine learning has social impacts and someone else is working on that.

00:40:06

Speaker

a decent fraction of the field takes that attitude, I'm fine with it. My concern is more that we as a field collectively that we're on the issue, that we have enough people within the field who do want to think about these issues. If there's no one or too few people in the field who wants to think about these issues, then I think that's a problem because I think it's our responsibility as researchers to think about the impact of the research that we're doing.

00:40:30

Speaker

But if a particular person says, that's not my public key, that's not my focus area, I'm fine with that. I think that's the way research works. But I would say that right now, the fraction of people doing this at least a year ago, I would say was too low. Now I think, thankfully, we're starting to get more and more people into this and maybe getting to a healthier place. Okay. That was what I was going to ask you because I presume based on the fact that you're speaking up on this topic, the

00:40:57

Speaker

Do you sense that there should be more work going on on this? And it seems like the paper you wrote was, as you put it, an agenda, a call for action, a call for research on these topics. But it's very encouraging to hear that you think that the fraction of people working on these safety problems is going up. Would you still say that it should be going up more? Or do you think that we're actually reaching a pretty comfortable place right now?

00:41:22

Speaker

I mean, it's all kind of coming into place, right? Since I've joined open AI, I've had a number of people say, you know, I'm interested in these topics. I want to work on these topics, both new people coming into the ML field and people who've been in there in a while. So I actually don't know where things will end up.

00:41:38

Speaker

And my main goal is just to get some good technical research done on this topic. Then we'll see if there needs to be more people in the field. And my hope is the usual dynamics of if there's a lot of interesting results to be found in one place, then more people come into the field. And if there's too many people working on something, then some people go somewhere else. I'm hopeful that those normal dynamics will get us to a place where we're thinking responsibly. That may be too optimistic, but that's my hope.

00:42:03

Speaker

That makes sense to me, and let's hope that things do balance out there. In my experience, researchers don't necessarily always gravitate towards just where the right topics are, what research most needs to be done. You can end up with too many people crowding in one seemingly popular area. But we'll see. I'm really glad to hear this and be great if these safety problems could then just solve themselves as more AI researchers work on them. Yeah. I hope for what would happen.

00:42:32

Speaker

Okay. Thank you. Any final thoughts you'd like to add to this conversation before we sign off? You know, I think my perspective is that empirical and testable work on unintended consequences of machine learning system is the best way to illuminate these problems and figure out where to go next. Okay. Thank you. Ariel. Yeah. I want to thank you both for sitting down and having this discussion.

00:42:58

Speaker

I think this helps shed a lot of light, at least on the issues I saw at the White House symposia. And it's been a really great overview of where we're at with AI safety research today. So Dario and Seth, thank you very much. Thank you. Thank you for having me. To learn more, visit futureoflife.org.