Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Darren McKee on Uncontrollable Superintelligence image

Darren McKee on Uncontrollable Superintelligence

Future of Life Institute Podcast
Avatar
275 Plays1 year ago
Darren McKee joins the podcast to discuss how AI might be difficult to control, which goals and traits AI systems will develop, and whether there's a unified solution to AI alignment. Timestamps: 00:00 Uncontrollable superintelligence 16:41 AI goals and the "virus analogy" 28:36 Speed of AI cognition 39:25 Narrow AI and autonomy 52:23 Reliability of current and future AI 1:02:33 Planning for multiple AI scenarios 1:18:57 Will AIs seek self-preservation? 1:27:57 Is there a unified solution to AI alignment? 1:30:26 Concrete AI safety proposals
Recommended
Transcript

Introduction & Guest Background

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Docker and I'm here with Darren McKee. Darren is the author of the upcoming book Uncontrollable and he's the host of the Reality Check podcast.
00:00:14
Speaker
He's also been a senior policy advisor for 15 years. Darren, welcome to the podcast. Hi, Gus. Pleasure to be

The Goal of 'Uncontrollable'

00:00:21
Speaker
here. Fan of the show. I had a great time reading your book and one of your goals with the book is to take a complex topic like
00:00:30
Speaker
machine learning, AI research, and in particular, alignment research, and then present it in an accessible way. How do you go about a task like that? Well, it was a bit difficult. And I think clarity above all else. I've been following the AI developments for, we'll say, a decade or two, sometimes deeply, sometimes loosely. And I would read the other popular books and some of the articles and have discussions about all these things. But it was really around April, May, 2022,
00:00:58
Speaker
when I thought, Oh, wow, things are really picking up. And I think a lot of people felt that when the calculus projection for AGI dropped about 10 years, everyone's like, Oh, oh, things are things are happening, right?

Bridging the AI Knowledge Gap

00:01:10
Speaker
And at that point, I thought, Okay, I think there's a gap between readily available materials that reach a wider audience, and the speed at which AI is progressing. And so I thought there's an opportunity here to write a book.
00:01:22
Speaker
It's not that materials didn't exist at all. There's forum posts, there's blogs, there's lots of podcasts, there's videos and so on. But I thought it'd be nice, it would be pulled together in a book, we'll say, for people with no technical background, even no science background. And so that's where some of my experience with the podcast or policy where you're trying to kind of do knowledge translation, take a complicated idea, try to phrase it simply, excessively, and relate to an audience.
00:01:47
Speaker
And with that as a context, the journey began, so to speak. So since last June 2022, I've been trying to work on this book and I'm happy that it's done. And I've tried to make a really concerted effort from the design of the cover to the table of contents to the chapters to how they flow.
00:02:05
Speaker
It really is trying to put yourself in the mind of someone who is curious about AI. They've seen these news headlines, they're interested, may be concerned, confused, what the heck is going on with AI? And that this book is for. So it's not so much for the technical people. It's not for the people who've been in this AI safety debate for many years, although hopefully they'll get some value. It's more in a way for the people they know that they can help perhaps explain some of the issues to them.
00:02:31
Speaker
Yeah, I'll just say, I've been reading about AI, I've been interested in AI for a long time, and this book still provided a value to me. So there's value in going over the basics again, I would say.

Balancing Accessibility and Accuracy

00:02:45
Speaker
We're also in this conversation going to go over some of the basics again, and there are interesting choices about how you frame different issues, which analogies you use, and so on.
00:02:57
Speaker
I'm interested in how do you balance accessibility with accuracy? So I imagine that an expert is going to read this book. How do you deal with that fact when you're writing for a broader audience, but you might get a nitpicky expert critiquing your book?
00:03:13
Speaker
I think there's just going to be some inherent trade-offs. The goal is to reach, as I said, as many people as possible because the experts, they already know these things. They already have materials that are available more readily to them. But for the average person, to have something really explained in a book form, I think this is kind of the first of its kind.

Navigating Expert Disagreements

00:03:31
Speaker
It is entirely dedicated to the AI safety argument and tries to reach people as accessible as possible.
00:03:37
Speaker
My own background, science and academia, I am also inclined towards accuracy and even precision and trying to understand the difference between accuracy and precision, but also understanding that for the audience I'm trying to reach, the difference between accurate and precision isn't what matters.
00:03:52
Speaker
What is the main idea? Is there rigor? Like, is what I'm saying generally true or understood to be true? Is there evidence to support it? Does it make sense? But I did try to be, we'll say a bit more flexible by how I might phrase certain things and how I might use certain analogies to try to meet the audience where they are. As you said, this field is very, very complicated. And you have to make concessions somewhere when you're explaining things to people.

AI Risk Debates & Imagination

00:04:18
Speaker
What did you do when in your research you came upon a topic on which the experts disagreed? This is often the case in alignment research. For example, the experts might disagree about the basics. What do you do then when you're trying to explain the basics?
00:04:34
Speaker
So I think it's there's a couple options there. It was my approach was generally to try to give I won't say that generally accepted consensus because there while there is a consensus of sorts, but not unanimity. Everyone doesn't fully agree and no one ever does. And that's true. Let's be honest for everything.
00:04:50
Speaker
When you look at the surveys of what the philosophers believe, there's fundamental differences. Same thing with economists, same thing with physicists, same thing with pretty much every discipline. Understanding that, the book is trying to be a bit of a neutral observer, but at the same time, I have a perspective.
00:05:06
Speaker
I've looked at these issues. I am concerned. We're trying to figure it out together, but I'm giving reasons why I think AI safety is a concern. To be more specific, I kind of tried to be sympathetic to all sides, but at the same time, I probably didn't get too much into the weeds. If people thought there's no concern at all, I might mention the uncertainty around everything, but not so much giving that a lot of weight because I think there are reasons to be concerned.
00:05:30
Speaker
To take a step back, the book The Structure, to try to give people a sense of how I did it, is you look at the debates or discussions that are occurring in the AI risk, AI safety space, and what do people seem to get stuck on? What are their disagreements? What are their cruxes? What might be the underlying, we'll say, intellectual or even emotional dispositions that are leading people to have certain beliefs? And then identifying that, thinking about which part of the AI issue relates to it,
00:05:59
Speaker
and then trying to think about an analogy or a simple way to explain the thing related to AI so it addresses the underlying issue in the future debate. It's a bit complicated, but it's kind of like working backwards and then forwards. So if I think one of the issues is it's really hard to imagine how powerful something like an advanced AI or super intelligence could be. The first chapter of the book is a lot about how powerful intelligence is or the importance of imagination.
00:06:27
Speaker
Like, is it really that it's a failure of imagination that's driving a lot of this? Well, not everything, of course, but I think it's a factor. So then I'm trying to address aspects of how imagination might work or just open up people's minds. Why don't we put it that way? About what's possible before I even talk about AI, because it's a general issue about trying to be open minded about what could happen.
00:06:46
Speaker
And with that in place, hopefully later when there's more AI stuff, it's just less likely to be a problem. It is still hard to imagine something as smart as an ASI or artificial super intelligence, but we can try to at least acknowledge that there's something there. There's something to be understood or even that we don't know exactly what it could be and that's its own value.
00:07:06
Speaker
So you write about how the range of intelligence extends much further than we might normally imagine. What what intuitions do you rely on

Imagining Intelligence Beyond Humans

00:07:16
Speaker
here? How do you explain? Why is that? Why is that the case? Sure, I think there's a, you know, the human brain is great, right? I like mine, even though it's flawed, I don't know what I do without it.
00:07:25
Speaker
But when we interact in the world, we have our kind of default settings. We know there's various biases and the availability bias and other things that when you're asked a question or you're quickly reading something, whatever pops in seems to be how you might think or feel about a subject. If given a direct question by someone, you might reflect a bit more and think a bit differently. But I'm trying to shift a bit out of that default. So when we navigate the world, we kind of think of the intelligence mostly of our friends, or I ask people to imagine like the smartest person they know.
00:07:54
Speaker
and they have that person in mind. I'm like, okay, now imagine the smartest person you think ever existed. Maybe it's Einstein or Mary Curie or Von Neumann or whoever it is. And so right away, you're like, okay, well, I had the smartest person that I know personally and the smartest person ever like, well, could there be even people who are smarter? And I give some examples of savants who have incredible abilities, whether it's memorization or processing information, visually or auditorily, whatever it is. And like, well, look, humans already can do some amazing things.
00:08:24
Speaker
which that smartest person you know, Von Neumann may be an exception, can't really do. It's trying to show, let's think about what might be possible and then giving examples of what already is possible to help people understand, okay, if this is possible and it's already happened, could we imagine a little bit more, a little bit higher? I also go the other way, looking at capacities of animals.
00:08:47
Speaker
And we look at, you know, briefly, like in a page, it's, you know, birds and you have fish and you have dolphins and so on. And we know broadly humans are more intelligent than these other creatures. Not to say all humans, right, at all times of their lives than all the other creatures, of course, but broadly as a general truth, that's the case. And when we think about, say, why gorillas are in our zoos, and we're not in theirs, it's not strength, right?
00:09:10
Speaker
It's not our good looks. It's not our

Transferring Human Knowledge to AI

00:09:12
Speaker
charm. It's because, again, collectively, we have an intelligence capability that is beyond theirs. That's another nuance that when I use intelligence, it's a bit more like anthropologist Joseph Henrich's cumulative cultural collective capital in terms of intelligence. It's very broad and encompassing because I thought that was probably the best way to try to communicate how important it is.
00:09:33
Speaker
Why do you think we forget, or at least this is my experience? I'm intellectually aware of all of these examples you just gave. I know there are people that are much smarter than me, and I know there are savants and so on. But it seems to me that when I think in normal life, I deceive myself into thinking that the range of intelligence ends at my height, basically, that I represent the basic range of intelligence.
00:10:00
Speaker
Is there, do you think this is common? Why is it hard to imagine abilities that are beyond us? That's a great question. I'm not sure of detailed research, but I do think it is somewhat common. We kind of think we're the example, right? This sort of mind projection fallacy or typical mind fallacy is its thought. I'm sort of the baseline. If things are different than me, then that's how I calibrate them, right? If we say someone is tall or short, sometimes that's in reference to ourselves, but it's often it's sort of some general vague notion of the population.
00:10:30
Speaker
Of course, if you're 6'5", 6'5", that makes you very tall, unless you're in the NBA, then it's average. And so we do understand there is a frame of reference, but again, the default setting of human brains, again, they're great, but we can't be thinking in complexity, that much complexity all the time.
00:10:46
Speaker
working space memory is tapped. So if you're just, you know, trying to, I don't know, make dinner, or you want to read an article to then think of like, Oh, there's a 15 to 30 things I should always have in mind while I'm doing this. That's great. I applaud the effort. And I try to have a couple of mine, but it's almost impossible to do. So in that sense, the book can also serve to just remind people of things that they already know.
00:11:06
Speaker
It's very hard to even for me to talk about like, well, do I have a three, 350 page book memorized yet? Not entirely, but the ideas are more frequently in there and they're more likely to come to mind. So with intelligence, I think it really is just almost asking yourself more often, am I making the right projection? Is this a fair generalization? Am I inadvertently benchmarking to the wrong set or the wrong baseline?
00:11:29
Speaker
And in time, through human history is what I mean to say, humans now in various places are on average much more intelligent than they were in the past. And that's mainly due to education and socialization, nutrition, diet, these sorts of things. I think if you look approximately 100,000, 200,000 years ago, genetically humans are very similar.
00:11:49
Speaker
Of course, if you go, you know, a million years ago, then it's quite different. But it's not staggeringly

Understanding AI Systems & Safety

00:11:53
Speaker
different compared to the other species entirely. All that to say is that why humans are so capable now is because of our nutrition, our socialization, and the fact we live in a world
00:12:04
Speaker
that does a lot of the work for us. As I say, can you build a fire? Well, many people can and many people can't. What's worse is some of us can't build a fire even with matches. We know what we're supposed to do. You get some kindling, you get some paper, you get some lighter, you get some thicker wood and you have the matches. You do a decent job, it starts and then it fizzles out.
00:12:25
Speaker
And it's like, oh, I can't even build fire despite knowing how to do it and starting with matches. Well, that's embarrassing. Again, this is easily rectified if you practiced, but I don't necessarily have to know this because there are matches and there are lighters and other people have figured this out for me so I can leverage the intelligence and the efforts of other people. So I don't have to worry as much.
00:12:43
Speaker
And perhaps similarly, it feels like your one is an individual super smart navigating through the world. You're like, well, that's because everything else has been done for you right now with computers and phones, most of the complexities behind the scenes. And that's great for us. You press a button, something works. What actually happens, the staggering mind boggling complexity of information and ones and zeros going across space and time, it's easy to just not think about it. In a way, why would you? If you're trying to watch a movie,
00:13:10
Speaker
You're not going to think of the matrix decoded. You could, but then you're going to ruin the experience for yourself. Do you think we are giving our collective knowledge to AIs by training them on all available data online, for example? Are we thereby transferring the collective knowledge that we've built up that allows us to succeed in daily life?
00:13:33
Speaker
I think we are. It's almost like having a new child of sorts and you're socializing it the way you might a person. Let's look at all what humanity has learned and we try to pass that on to a new generation. We're doing the same thing with AI just in a much more dramatic
00:13:48
Speaker
complicated and against staggeringly vast way, where there's as you know, the amounts of data that at a bite zeta bites, I can't remember exactly what the number is that's going into these systems or would or could go into these systems, I should say it's not quite that high yet. But why not have all the world some knowledge available?
00:14:07
Speaker
and it's only going to get better and better. I think there's vast data sources that haven't been fully tapped, right? So you think of all the video, which programs are now starting to mine both. You can just take the audio from it, which has its own value, but even the movements, how people move, what they show and how that sort of thing happens. That could be whether it's fixing a chair or the sink, but it's also how people navigate what they pay attention to and what they don't.

AI Goals & The Virus Analogy

00:14:30
Speaker
once all the video gets recorded and all the radio and there's all these different data sources that I think it'll be even more staggering. So yes, we're kind of giving humanity everything that we've ever done that's in record over two AIs, and that will have, of course, very fantastic, wonderful things, one hopes, but also a lot of risks and concerns. Given our troubles with understanding other people's minds, how can we hope to understand AI systems?
00:14:56
Speaker
Yes, I think that's a great question. I am optimistic, but I think it's going to be definitely a challenge. I mean, there's a broader idea here is that this is what we need to work on. There are many things related to AI that are going to be very difficult, and whether we're hopeful or not, we have to realize that we have to try. We have to try to figure it out, how they work, why they're doing what they're doing. And yeah, it might be complicated, but humanity has figured out the impossible before. That goes back to the imagination issue.
00:15:22
Speaker
If you look at what we kind of take for granted now that humans have gone to the moon, that we're having this conversation, again, across space and time relatively easily, this is not only odd or unlikely or improbable. This is impossible to people from the past. And beyond that, I actually want to argue that it's unthinkable. If you go far enough back, they didn't have the conceptual apparatus to even consider how difficult this could be.
00:15:47
Speaker
And that's also a shift where you're thinking, okay, imagine gazing up at the moon, as a lot of humanity has done throughout our history. Well, how do you get there? You can't just like climb a tree. You can't, you know, you can build it, you can build a tall structure, as many did, and, you know, a mountain, a temple, these sorts of tall ladder. But the idea that you could build a ship or a rocket and actually fly there,
00:16:08
Speaker
It wouldn't even occur to people. It just would have been out of their reach at the time. It's within our reach, again, just because we happen to be born at this time. It allows for what's possible. Well, with the AI at the moment, it seems very, very difficult given how these systems are, as people say, almost grown rather than built to know exactly what's going on given the complexity of artificial neural networks.
00:16:29
Speaker
and the size of the parameters and all these models. That said, I think there's a lot of good efforts to try to figure these things out. And that's exactly why we need more investment in AI safety and more people to try to help us. One hang up people have about AI safety is thinking about the goals of AI systems. So why is it that AI systems might develop goals that are contrary to ours? Why would they suddenly turn evil? It's a question you could ask.
00:16:58
Speaker
There you talk about you use the virus analogy. So maybe you could explain how you use that. So I think there's kind of a two-step process here, whether AI systems have goals or not. And I'll take each in turn. With the virus, why don't we just do that first? A virus doesn't have goals, but it can still harm you.
00:17:15
Speaker
And I think this is a great analogy. And you can think of the biological virus, but there's also computer viruses. Biological viruses, depending on what they are, we know they're very hard to contain. They can cause pandemics, they can cause illness, the common flu, these sorts of things. Computer viruses, people not in tech aren't aware that there's some still crawling the internet that were developed years and years ago that we can't really stop, but that's the default world we're in. So these things are created and they get out of control.
00:17:41
Speaker
these things, computer and biological virus, it doesn't need a goal to harm you. It doesn't need an intention. It's kind of just following a process in the biological sense and evolved mechanism. Whether viruses are alive is a debate we don't need to get into, but it's trying to achieve certain objectives and you can describe it as if it has a goal because it's easier to kind of navigate the world. It reduces complexity.
00:18:04
Speaker
But you could also argue, of course, it doesn't have goals. It's a virus. It's a little thing. Why would a computer virus have a goal? It doesn't. And in the book, I acknowledge this, but I kind of just go with the as if goals. Let's not get into a protracted philosophical debate about whether something has goals or not. If it acts as if it has goals and it's useful to treat it that way, then let's just do that. It's much easier. And this is for anyone curious. This is totally Dennett's intentional stance type stuff coming in here because I'm a huge fan of philosopher Daniel Dennett.
00:18:32
Speaker
And what are we trying to do here? We're trying to protect ourselves from computer viruses or biological viruses. And you can think, oh, the virus is trying to get you. The flu is trying to infect you. When someone coughs or sneezes, it's trying to spread. Well, of course, it's not trying to spread. The organism has engaged in activity, which makes it more likely to replicate.
00:18:51
Speaker
But to say that every time, it just becomes very burdensome. So sentences become paragraphs, paragraphs become many paragraphs, and that's why we sort of circumvent it. So that's the first bit about why it might have goals in an as-if sense. And then there's the other part where, OK, so if something becomes a bit more autonomous, does it come to have goals? And I think we can look at different organisms again as an understanding that autonomy falls on a continuum.
00:19:17
Speaker
Does something have a goal or not? Well, in some ways, yes. In some ways, no. The chess playing program, does it want to win? Well, it sure acts like it wants to win. And when I lose, I feel like it beat me versus like some computer code and some people designed it. The designers definitely designed it to have a range of skills what could beat me.
00:19:34
Speaker
Do worms have goals? Do cats have goals? Do dogs have goals? Do orangutans have goals? Well, of course, in some ways, they definitely engage in goal-directed behavior because they're trying to achieve a certain objective. And similarly, with AI systems, we already have various, we'll say, automated programs that have autonomy, and they need to have autonomy in some way to act in the world.
00:19:54
Speaker
This would be the stock trading platforms that already function all the time. Credit card, bank fraud detection, numerous other monitoring of systems type algorithms and whatnot function in an automatic process because they couldn't keep checking with a human and even in the military.
00:20:09
Speaker
Typically, there is a human in the loop, but we know for, say, missile defense, these things are automated often because there's no way a human could react quickly enough to stop incoming missiles. It's not that whether computers become more autonomous at all. It's that to what degree are they going to become more autonomous?
00:20:26
Speaker
And I think that is a much larger conversation, but at the same time, you could see why would be incentives to have things become more automatic, which means a bit more empowered, which will then translate to seeming like they have goals. And at one point, the seeming goals blends enough into the, well, that looks like it has goals that I'm just going to say it has goals. If we return to the virus analogy, one thought I have there is whether
00:20:52
Speaker
The existence of the common cold, for example, the common cold is not very smart. It's pretty dumb, I imagine. And it's also very harmful and costly to society. Does this mean that intelligence is not...
00:21:06
Speaker
necessary to cause harm? Is this a decoupling of intelligence and power in the world? A great point. So I think as you've just said, it's not a necessary condition in that case. But that's different from whether intelligence is a factor that becomes correlated with the ability to cause more harm.
00:21:24
Speaker
So, you're right that a virus will say it's pretty simple, pretty straightforward, not that many moving pieces compared to other organisms, and it can cause drastic amounts of harm. But, you know, you're not going to go watch a movie with a virus and then talk about its thoughts after. It's just not that type of thing.
00:21:39
Speaker
But as you scale up the abilities or the intelligence capabilities of different entities, they do have a different capability for harm. So viruses can just infect you, but something with more intelligence could design viruses to infect you, design even worse viruses. And even more dramatically, if something's smart enough, it might be able to deflect an incoming asteroid.
00:21:59
Speaker
and save us in a way such as harmful, but it's the other side to the benefits. So I think intelligence is a factor and in some cases it becomes a very significant factor in the ability to cause harm versus not as anyone who's like taking care of a child, a very young child can cause a lot of harm.
00:22:14
Speaker
but they're relatively, relatively containable. If you have the right crib, if you have the right gates, they can usually be at least put in a certain place. That's partly due to mobility and partly due to intelligence. Same thing with various other animals. But as you ranch it up the intelligence to sort of contain something or control it, it becomes increasingly difficult to the point where it's very, very difficult to contain some very intelligent entities.
00:22:36
Speaker
Now, there are lots of definitions of artificial general intelligence out there.

Defining Artificial General Intelligence

00:22:42
Speaker
And you use one in the book that I quite like. Maybe you could talk about the average office worker, the average remote worker.
00:22:48
Speaker
So artificial general intelligence or AGI, as people say, it's very much in the news, right? How long till AGI, what is AGI and so on. And the approach I took in the book, again, for accessibility, I'm not trying to make the definitive definition. I just want a good enough definition so people could understand what we're talking about. And I thought in the most accessible way. And with those factors in mind, okay, what are we talking about here? It's usually someone's concerned about employment, primarily. And so I was thinking, could someone replace me at work?
00:23:16
Speaker
And so if you take the average coworker idea, that's the foundation. If you're interacting with a coworker, say it's a remote coworker, it's still focused on intellectual tasks. So it's a computer system that does intellectual tasks as good as an average person. And that's really the foundation where you could imagine if someone's performing as an average coworker level, they're doing general tasks to a certain capability. And therefore, you could see why that entity may be employable or why it might affect employment.
00:23:46
Speaker
And it seemed to, I guess, as I said, hit home in the best way. As a nuance here, why use that terminology at all is a question I asked myself, because like, well, do I need to introduce these terms to the average person? Why, why not? Well, they seem pretty popular, right? That's one thing. So you meet people where they are. And I knew the world was going to talk about AGI in one way or another.
00:24:04
Speaker
Now, is it better to say human level because it's almost the same? In some ways, but at the same time, we use the term AI a lot, right? That's the sort of like, why did I use intelligence, artificial intelligence, artificial general intelligence and artificial super intelligence is because I thought, you know what, most people are going to talk about AI and people kind of know what AI is. So once you accept artificial intelligence as a foundation,
00:24:25
Speaker
To make things as easy as possible is going down one in a way as intelligence broadly, and then going another level is artificial general intelligence and then super intelligence. I thought that four-part structure was the best way to frame things.
00:24:37
Speaker
I imagine that you decided to write the book out of some sense of urgency. Now, you give a lot of uncertainty, you present the uncertainty around what is called AI timelines in the book. You must be motivated in some sense by urgency about the speed at which AI is developing. Do you want to give us your AI timelines? Do you want to tell us how long it is until we have AGI?
00:25:04
Speaker
How long do we have left? I don't necessarily think that. I don't want to be dramatic. So I think it's a great point, and I'll elaborate a bit, that I don't want to head. So I think something like the average coworker, sure, it could be plausible within three years, two to three, four years. I'm being a bit vague because I almost feel like that's not what's as important as the fact that there's a lot of good data that shows capabilities are dramatically increasing. This is the basics of more investment and computational power.
00:25:31
Speaker
better design of chips, new chips coming online to the extent that Nvidia's H100s are being shipped. So the big players, OpenAI and DeepMind will probably start training their next gen models on all these things. And then of course, a couple of years after that, there's even more powerful chips that are planned by Nvidia and all these designers. So the path, the most likely outcome seems to be increasing computational power.
00:25:52
Speaker
which then has some relationship to increased AI capabilities. And we don't quite know how the input matches the output, which is itself a concern. There's that uncertainty there. But the trend lines are things are going to get more powerful and things are going to get more capable. And because of that, it does seem like AGI or even a artificial super intelligence. I think that's within 10 years. I think that is entirely plausible. Again, this is not 100% estimate, but I think it's plausible enough.
00:26:18
Speaker
Now, in the regular listeners of the show might be like, well, did he mean 30%, 60% or 85%? I understand the tendency for

Urgency in AI Development Timelines

00:26:26
Speaker
that. I think it's the average person is like, well, is it going to happen or is it not? It's like 10%, 50%, 100%. So it's definitely never 100 because all knowledge is probabilistic and there's uncertainty everywhere. But why I spent some time on the book with the uncertainty is the uncertainty is concerning. It's not that things get better when we say we don't know. I think sometimes people hear like, well, it could be two years away. It could be 10 years away.
00:26:47
Speaker
20 years away. And they kind of just leave it at that. Like, wait a minute, if you don't know, it could be two years away. So how do we make decisions faced with uncertainty? We always have to, right? The world is full of complexity and uncertainty and decisions have to be made. And it's tempting to think decisions don't have to be made, that you can just be agnostic. It doesn't quite work that way. Now, this is more just me highlighting a complexity of the world. Everyone can't care about every issue. Everyone can't read up and be knowledgeable about every issue.
00:27:15
Speaker
But I do want to highlight that if you happen to be in the AI space and you say we don't know, that's almost more concerning than someone that says it's five years away. Because if you actually mean we don't know tomorrow, a month from now, and like, oh, no, no, I don't mean that. I mean, you know, there's a 1% chance in the next 15 years. Like, oh, okay. So you do have something, right? I guess I'm trying to highlight that
00:27:35
Speaker
Again, the human brain, we go through life and we're making probability-based decisions in a very loose sense, whether you get a mortgage, whether you might have kids, whether you get a certain insurance, and for how long. These are often based on some sense of what the future is going to be like. Similarly, when we're engaging with AI stuff, we are making decisions or not based on what the future is like.
00:27:55
Speaker
Now you could say there's a disconnect between what someone says and what they do and behavior. And that's, that's true. Sometimes we're dramatically inconsistent and then no human is immune from this, right? I won't eat junk food and then I'm eating junk food. What happened? Well, clearly there's conflicting impulses within the human organism. Okay. But I really just want to sort of highlight the complexity there and have people reflect on, okay, I've said something. Does my behavior match it in any way? And if things are that uncertain, isn't it better to be prepared than to be caught off guard?
00:28:25
Speaker
Uncertainty isn't a justification for complacency, is another way to put it. We can do something, we can make plans, and we should probably make plans for each timeline you mentioned.

AI's Speed Advantage Over Humans

00:28:37
Speaker
In the book you discuss some cognitive traits that AI might have.
00:28:41
Speaker
And I think this was quite interesting. Starting with speed, so AIs will think much faster than we do. This is something that's often overlooked, I find in debates. What does it mean if you have human-level AI, but it thinks a hundred or a thousand times faster than you?
00:28:59
Speaker
Is it still reasonable to call it human level? I'm happy you picked up on that. Yeah, so this goes to the what is an ASI? What kind of thing is it to try to help people along? We don't quite know what an ASI might be or exactly how it will be, but I was trying to have people understand what it might likely be.
00:29:16
Speaker
There's a couple key traits and then a couple other possible ones. You have speed in there. We'll talk about that first. There's capability, reliability, and insight in these sorts of things. Again, just imagine someone's never talked about AGI or they've never thought about super intelligence. What the heck are these guys talking about?
00:29:33
Speaker
It's a computer system. What's it going to look like? And yes, they'll probably draw on science fiction or popular movies. And some of those are misleading and some of those are useful. But it's like, what are we talking about? So I thought very likely AI systems, ASI systems, AGI system, will function very quickly. And this is because computers generally work very quickly.
00:29:51
Speaker
That's why they're so amazing. Again, people listening to this, our ability to record this meeting and discussion, it's because so many things are happening so fast in a way that our brains can't comprehend that it just seems smooth, seamless, and easy. So with an AI system, as you've seen it, if you've ever used any of the chatbots, the recent models, or even the image generators, oh, let me think for a second, and then out comes the output. And it's usually remarkable, right? It happens at a speed that no human could possibly generate in that amount of time.
00:30:19
Speaker
And so I think there it's the flaw or the limitation of any definition. There's always going to be an asterisk where like, well, in other contexts, it's not quite like this. That famous Bertrand Russell quote, everything is vague to a degree you do not realize until you've tried to make it precise. It really is the case because any word you can find like, well, what does this word mean? And sometimes people spend an entire thesis or a book defining a word and then someone argues it's not that word and that sort of thing.
00:30:46
Speaker
So with AGI, I think if we think about a general ability, we have to understand, again, the coworker thing is useful as a sort of a test. It's sort of a clear bar. Now, you could say, well, aren't there different types of coworkers in different types of environments? And some coworkers and entire even industries or companies in different countries will be smarter or more capable than other ones. Sure. But we have to have some way of talking about this. Otherwise, we kind of get lost. And at least I'm trying to be consistent in that way.
00:31:12
Speaker
So if it was the case that your average coworker could do something a thousand times faster than you, it doesn't quite seem like your average coworker anymore, right? And then it's like, well, how do we deal with this then? Like, well, when we examine people or give them evaluations in their workplace, some people have strengths that other people don't. And there is some sort of mishmash of like, well, are they insightful? Are they analytical? Are they on time? Are they, you know, friendly? Are they good to work with? All these different things that it becomes an amalgam of an evaluation. And similarly with AI or AGI,
00:31:42
Speaker
We'll sort of do something similar, I think, in the practical sense, that if there's a task and it comes back one second later and your colleague would have taken three days, well, that's not quite human level, right? That's above human level. Now, if it has errors that the human never would have made, like, well, okay, that makes you maybe a bit less than human level, the AI gives you something, it takes you two days to fix it, and another process would have taken three days. Well, now it's above human level, but not as much as it seemed originally.
00:32:10
Speaker
It seems to me that we are in front on the speed aspect with AI right now. If we talk to GPT-4, you get an answer almost instantly. The answer won't be as well done as a human expert could. It seems to me that we are in front on the speed aspect, but we are behind on the depth aspect as things stand right now. Who knows with the upcoming models.

AI Solving Complex Scientific Problems

00:32:36
Speaker
that my leader is into talking about AI insight which is kind of you can think about it as the AI's understanding of its context and the relationships it's engaging in and you expect the AIs to have greater insight and to understand their context and understand their relationships perhaps even better than humans at some point. Why is that?
00:32:58
Speaker
Yes, that's a great point. It's also something that gives me not the most concern, but it is a concern. So we'll do the first part first. As you said before, if someone is super fast, but they're not necessarily good at pattern detection or insight, they're missing a lot of what makes useful product services or work in the broad sense. And right now, we already have algorithms and whatnot that can do pattern detection in a way that a human just can't. In a way, this is how most of the recommendation algorithms
00:33:28
Speaker
whether it's your Netflix or online social media, it's processing vast amounts of data based on what you've given it in terms of clicks, likes, even just moments spent looking at something and then spitting out something you probably like to some extent or another. If a human was doing this, oh yeah, give me three months to analyze reams of data. And then I think this guy might have a 70% chance of liking this one YouTube video. That's actually impressive for the human, but not at all useful in the real world. And so as they're designed right now,
00:33:57
Speaker
The systems are already, we'll say, super good at pattern detection. There's even a story where some researchers were doing different analyses and an AI was able to detect a patient's race from scans like x-rays of a person's body. But the researchers couldn't figure it out. They tried to realize or understand what the AI system was doing because they would block certain parts of the anatomy, like maybe it's looking at the heart or something in the lungs, and they couldn't figure it out.
00:34:24
Speaker
Now, maybe there is an answer because there has to be some sort of thing the AI was doing, but it's already the case where AI systems can figure something out that humans cannot. And even after it's done, humans still can't. And I think, as you said before, the lag right now is not quite the speed or even the insight. It's almost the integration, the flexibility, in a way, the generality that right now humans are very sophisticated general creatures. So if you give them an intellectual task, they can do a wide range of things, but also they do it in a certain way.
00:34:51
Speaker
and the world isn't yet restructured. Even how people talk to these machines that has a certain nuance and a sophistication.

Narrow vs. General AI Systems

00:34:58
Speaker
Sort of like, well, the book purposely does not deal with robotics because I think that's a separate thing. It's an interesting and amazing thing. But an Amazon warehouse, you can't just stick a robot in there or automatic machines. It has to be designed in a certain way to have a true efficiency and effectiveness. Similarly,
00:35:13
Speaker
I think the chatbots, large language models, other AI systems in the future will also provide great value, but they're not quite integrated in a way that works. The second thing, when I said the insight concerns me a bit is, again, if you take a big step back, how did humans manage to do what they do? They managed to figure out how things work, like the laws of physics, how things fit together, putting different objects and different pieces together. We can sort of rearrange matter to create things.
00:35:40
Speaker
like cars, like houses, like computers. This is also inspired by David Deutsch, Beginnings of Infinity. It's like, well, our brains just managed to figure it out. There's all this raw material that was inside the earth and we were able to take it out and refine it and put it together and now we have wonderful devices.
00:35:56
Speaker
And that was hard and difficult, and I couldn't do it, but someone figured out groups of people, teams of people, generations of people, and that's amazing. And the concern for me sometimes is what if an AI that's super insightful can sort of see the universe in a way that we can't? I'm not saying it might unlock a truly new fundamental law of physics. That's a possibility though. But it is the case, like, well, what if you just sort of tilt the world like this and like, oh, of course, of course. Wi-Fi, as a technology, it was always possible.
00:36:25
Speaker
We just didn't know how to do it. And of course, recently, people have shown you can use Wi-Fi as a way to sort of scan a room and create models of where people are and how they move. And that, of course, was also always possible. The laws of physics haven't changed in our lifetime, and they haven't changed as far as you know, much in 13.8 billion years since the birth of the universe.
00:36:41
Speaker
So, it's really this insight that allows you to figure things out, which if you understand the world and the universe more than other people, you can do dramatic things. You can create wonderful inventions. You also have a huge advantage in terms of how to manipulate systems or perhaps how to escape from systems where someone else just didn't realize, oh, if you manipulate the cooling fan in a computer, you might be able to gain access to different things. Again, that's pretty remarkable. It was always possible until someone figured it out though. We didn't know.
00:37:09
Speaker
Yeah, I do wonder whether AI's advanced AI's will be able to solve some of the science problems where humans have been less successful. So I think we've been successful in domains that are, in some sense, simple. Not simple to understand, but can we be reduced to something simple like physics?
00:37:27
Speaker
as soon as we get into something very complex like biology or psychology, we haven't had the same level of success. But I wonder whether AIs are well suited to tasks where there's a lot of data and complex data, and whether they might be able to make progress such that they have superhuman insight about our own minds that and you could see how that would be dangerous to be exposed to manipulation or otherwise being
00:37:54
Speaker
interacting with a system like that? I think it could also be amazing, right? That's how most of this stuff goes. It cuts both ways. The uncertainty cuts both ways. I'm just a certain for listeners. I am a huge fan of science. I have imagined science. I've always been a big fan of science, but it is also limited, right? We know there are humans in a way. We're a bunch of apes trying to figure things out, right? And I think we've done pretty well, but we know there's limitations. There's replication crisis. There's even just the reality of millions of scientific publications each year. Who can read all these?
00:38:22
Speaker
And so I years ago thought, you know what, humans should make a good, strong effort. But won't it be nice if some AI system, pick your field, economics, psychology, biology, chemistry, whatever it is, and they could read a thousand papers like, you know what we need? We need this particular type of experiment that addresses these four things that these thousand didn't in the right way. So that, you know, at the moment, it actually feels it fills me with like more happiness, like, oh, we might actually really figure things out, even though, of course, depending on what's figured out, it could be bad.
00:38:49
Speaker
But there's so many complexities here that humans just can't easily disentangle. This is the pattern recognition. This is the insight that I really hope with AI systems, we can make some dramatic progress. It's like, okay, you know that thing we thought might have had an effect size of a certain size or not. It just isn't the case or it kind of is, or it's a 1% thing that people just shouldn't pay much attention to given this other thing, which is much more important. So I'm pretty
00:39:14
Speaker
happy at the moment of the idea of AI really improving science while acknowledging that depending on the field, that could also be a problem. Yeah, I'm also pretty excited for seeing what AI can do in the science realm. I wonder if we can create more systems like Alpha Fold, DeepMind's Alpha Fold that was able to
00:39:33
Speaker
figure out how proteins fold. It's a system that's narrow but highly capable in a specific domain and helps us solve a long-standing science problem. I do think the trade-off between risk and reward is pretty great for such systems. Do you think that's a potentially responsible way forward to push hard on narrow systems that help us with the scientific problems? I definitely do.
00:40:00
Speaker
And as you pointed out, AlphaFold and other highly capable narrow systems, it's amazing. It's amazing what they've done, right? And we're just getting started type thing. That said, we know the incentives for generality, general purpose systems is very high. Whether we can kind of tweak people towards having like, look, you want a certain capability, this is a particular problem, it's in a way easier to solve if you just focus on it, then why don't we focus on that? That said,
00:40:25
Speaker
There's other problems where it turns out being general is better. I think there's a nuance and a delicacy here or a complexity that we're just going to see how things pan out. I'm aware of Mustafa Suleiman. He has a personal assistant AI and their plan is not to make AGI, but a good personal assistant. I guess the question is, well,
00:40:44
Speaker
you going to have to end up making something like an AGI to have a personal assistant that is as good as the other personal assistants that are more AGI-like from other companies? And it seems like that's probably the case. It's kind of hard to think why wouldn't it go in this direction. In terms of developing systems, as things get more capable, again, the computational power that's coming on board is staggering. It's not clear if it's even much work to kind of stick something in on the side, so to speak.
00:41:09
Speaker
Do we need like a single purpose machine, so to speak, or can it be multiple things in combination? And I know some of the other AI labs are experimenting with these types of things. And even if the machine or the AI is kind of isolated on its own and has certain capabilities, we know once you connect it to numerous other applications through APIs or whatnot, it becomes far, far more capable.
00:41:29
Speaker
So I think it'd be nice if we can focus on narrow things. And I think that's a good path forward and maybe even like the most reasonable one, if in a risk averse cautious sense, I'm, I'm wary about the viability of it. That said, we should still try. Um, and we're kind of just kind of see what plays out. But I think even if you make something highly capable, uh, like Google did with alpha fold, we're going to see how it goes. I mean, Gemini, their product is supposed to come out, I think within a month or two, and that's supposed to be very agentic and it's kind of like alpha fold plus the other one, if you hear the rumors.
00:41:58
Speaker
And yeah, can you get some sort of, I don't know, amalgam Voltron thing? We're like, well, it turns out in the two or three years, it's actually relatively easy to stick all these things together. And you thought you were doing five amazing narrow systems. Well, I just created something that it figured itself out how to combine all these things. And now we have what we were trying not to have. So I think it's worth trying and also being wary and aware of the concerns that might happen if we combine them all.
00:42:22
Speaker
What do you think the relationship is between generality and autonomy? It seems easier, in my view, to avoid creating autonomous systems or more autonomous systems if the systems are narrow. Is there some sort of relationship in which when you push on generality, you also push on autonomy? I do think they're very much linked, as you said. I kind of have a similar view, I think, that if you were having something that was designed for a specific task, it can be less general.

AI Integration in Daily Life

00:42:50
Speaker
and maybe less autonomous. But as we talked about before, these things are nuanced and there's overlapping aspects of we'll see the distributions and some things that aren't.
00:42:58
Speaker
So, with say credit card, bank fraud detection, that sort of thing, these are pretty narrow. They're highly autonomous and they're not really that general though, right? And you can imagine something else that is pretty general at the moment. Maybe most of the language models, they're pretty general. They're not really autonomous. You ask it to do something and if you didn't, it would do something on its own for the most part. So, those seem entirely disconnected. So, like everything in this world, there's kind of
00:43:23
Speaker
multiple overlapping distributions where you could imagine things are correlated in some ways, but in other examples, they may be pretty distinct. I think there will be pretty high demand for autonomous systems. You mentioned the AI personal assistant. I think what consumers want from such a system is to simply tell it once to order the flight tickets to wherever and never think about it again and just everything works out. That sounds pretty great to me at least. What do you think about the commercial incentives for autonomy?
00:43:53
Speaker
I think they're staggeringly high. That's the best way to say it. I fully agree that why wouldn't someone just want to, can't you do it for me? Like that, let's be honest. We as humans want things to happen. And a lot of things that we do want to happen require like administrative burden. I hate filling out forms. Even if it takes 10 minutes, psychologically, it seems to take an hour. There's a complete disconnect here. So if like, if someone could do that for me, something could do that for me. Yeah, give it more power. So I think this really highlights, we're getting at the notion of control.
00:44:22
Speaker
that I think if we're exploring how AI works and how it's going to function in our world, we will initially willingly give up control for integrated AI systems. Yes, you can imagine the financial incentives are huge to provide these products to people. The AI assistants, they could be just normal, say personal assistants, they could be therapists, they could be
00:44:41
Speaker
The companions, all rolled into one. The military incentives, the governments have a desire to use them to help citizens, maybe even for surveillance of citizens, pros and cons all over the place. There's all these reasons why driving towards more AI good, helping consumers, helping businesses, helping governments do things that they want to achieve. As things become more and more integrated, then it becomes difficult to disentangle them or perhaps stop it before it gets long too far.
00:45:07
Speaker
Say more about that. Why is it difficult to disentangle after you've integrated? Say people have begun using AI personal systems. That's not a high risk, let's say, application. But why is it difficult to then stop using these systems once you've begun using?
00:45:24
Speaker
So people kind of adapt to their world and their expectations change a bit. And we can use an already existing case study. So with the replica app, which allows people to have AI companions and often form relationships, including romantic or otherwise, they went through a sort of upgrade or reboot or their tweak to their model and their code, such that explicit conversations and things were not allowed anymore.
00:45:48
Speaker
And some of the users who built up relationships with these artificial AI little systems felt devastated. They felt like the person that they knew and even loved is just gone and they were depressed. And so here, once you start along a path, it sometimes becomes very hard to shift out. And we know this perhaps for people who thought, well, that replica thing, I would never do that. Most of us have a phone.
00:46:08
Speaker
smartphone of some type. Nowadays, it's pretty hard to not exist in this world without one. The utility is just so high. It's great. You have all these functionalities and you can talk to people and I don't have to sell the phone on you, but phones are great. So great, in fact, that it's really hard to not have one. You could say, well, I could choose not to have a phone. You could, but at the moment, you'd actually be reducing your control by not having a phone. But if you really start to think through the phone like, okay, well, the phone company could just brick my phone and stop it.
00:46:36
Speaker
At one point, Apple put that U2 album on many people's phones through the Play Store. People didn't quite like that. They realized they didn't have as much control as they thought. You need the internet or something like it for most of the applications on your phone. When you're using the internet, you're clicking, I agree to user agreements for apps or agreement of cookies and tracking data all over the place. There's all these ways in which we're, again, willingly giving up power and control for
00:47:01
Speaker
Again, other types of power and control. We want to talk to our friends. We want to watch funny videos. We want to listen to podcasts or watch them. And that seems great. But now once we're here, if you said to people, okay, we're just shutting it all down. You can't listen to podcasts anymore. You can't watch YouTube. Or more dramatically, is Facebook too big to fail?
00:47:20
Speaker
Is it the case where, although a lot of people don't like Facebook for various reasons, a lot of people really do, and it's billions of users. Is it the case that Facebook could just go down and people wouldn't be upset? A lot of people have built their lives on it. That's where their friends are. That's where their photos are. That's where their memories are. I actually enjoy the memory feature. What did I do on this day five, six, seven years ago?
00:47:39
Speaker
And I appreciate that because, again, human brains are fragile. I sometimes did fun things. It's nice to remember them. I wouldn't have remembered it without the prompt from the algorithm. So in that sense, you sort of see that the incentives are very high. It's going to be hard to opt out. I'm not saying people should use social media all the time or that social media doesn't have also lots of problems. But if you think of like citizens in the world and what they would want and how they would react,
00:48:01
Speaker
You could imagine it wouldn't go so well if a government's just like, okay, we're just not going to let people do this anymore. Or when people say they don't have internet access for a very short period of time. That's usually very concerning for a lot of people. If you just didn't have it for weeks and you weren't prepared, this would be truly destabilizing.
00:48:17
Speaker
Most people nowadays of a certain generation, they don't know where their friends live. They don't know their friend's phone number. They may not know their email off by heart, maybe, and they don't know their last name. So you have all of these things where like, oh yeah, that person, their handle on Twitter or their handle on Instagram or who I interact with and that's how you talk to them. But if that went away, you would lose a vast social network and almost have no ability to reclaim it.
00:48:40
Speaker
in any way if the infrastructure was taking place. So all this is these push-pull tensions where once things are in place, it becomes very hard to disentangle. And that makes them even more and more robust and desirable.

AI Risk Management

00:48:53
Speaker
Yeah, we can think about, say all the world scientists came to us and told us the internet is going to become very dangerous in 10 years. We must shut the internet down. Would the world be able to coordinate around this project?
00:49:08
Speaker
would it be possible for us to shut down the internet given the uncertainties, given the different incentives that different groups have? I think that would be extremely dangerous. Now, the internet doesn't represent a danger to us like AI might do, but it's just thinking about when these systems have become integrated into our society, how difficult it is to step out of these systems again.
00:49:32
Speaker
Sure. And that's actually why I chose that example in the book. And when you're trying to think about artificial super intelligence and why it's so powerful, if you talk to an average person, one of the first things they say is like, can't you just shut it off? Why don't you just shut it down? What's the big deal here? And I think the internet is probably the most useful analogy as a comparator, as a, again, imagination device. Like, okay, let's think through the internet. Right now, the internet is very robust. It is designed to be robust, right? Because not only do the average person want it, global commerce hinges upon it.
00:50:00
Speaker
various municipal services, hospitals, everything else. So many things are now integrated into the internet. As anyone who's had a smart home now is like, what? I can't get into my house because the internet's down or I can't control the temperature. So again, pros and cons once you're integrated. All that to say is there's cables that are crossing the ocean and many other places that enable the internet to happen. And there is currently no way to just shut down the internet.
00:50:22
Speaker
If all the main governments agreed, then maybe you could make a lot of progress in it. But as I say in the book, there'd be a huge vulnerability. If you created an internet kill switch, that's a great opportunity for some terrorists to be like, why don't I just press the button? Because it would wreak havoc and cause a lot of damage. So there's a huge incentive to not even have a kill switch available.
00:50:44
Speaker
which is problem kind of one or problem and solution number one. And the second part is, are we going to get agreement from countries or representatives to then actually activate such a kill switch in an emergency? Like you can imagine, okay, people, let's imagine, let's sort of ideal world. They've done their due diligence. We've somehow created a kill switch. It's very well protected. It's not actually a threat. And we even have a protocol that is designed with written clear reasons when you would activate the kill switch or not.
00:51:11
Speaker
And then it comes time to something bad happening. And everyone looks at the agreement that everyone agreed upon, the details and like, well, I don't quite think point three has been satisfied. And someone else like, of course it has. And we have minutes to decide whether this is a good idea or not. I could just see that also being a problem. So this is the normal, like human, complicated, making decisions together, understanding the world differently.
00:51:31
Speaker
It's not that there's no hope at all, but it just highlights that we'd have to do a lot of work in advance and ensure that such a thing could exist if we needed it. All I have to say is, to your original premise, if the scientist presented clear enough evidence, I would hope that ... We won't say everyone, because everyone doesn't agree on anything, but there's enough people with enough resources and power that the risk would be clear enough that we should act and put things in place to make such a thing happen.
00:51:57
Speaker
With an artificial super intelligence, one could also see a similar like, well, how would you shut it off? Is it distributed in a way that even is worse than the internet? You could be on computer servers in various places that are or are not connected to the internet or fragments of an ASI that could recombine into something else. There's certainly a lot of reasons to be concerned and we can use the present case of the internet to see why it's going to be a challenge.
00:52:23
Speaker
How reliable do you believe current AI systems are and how reliable do you think they'll be in the future? I think they are mixed and they will be mixed. And so to elaborate on that, I think a lot of these systems, again, they're amazing. Certainly image generators, they're reliable or not, right? You kind of put something in, you get what you get. Do you get exactly what you want? Very rarely.
00:52:45
Speaker
I find with at least image generators, if someone's looking over your shoulder, they're like, that's amazing. You're like, that's not quite what I wanted. So there's a disconnect there. But to the language models, I think they're generally great. And then you have to fact check everything though, there's already, you know, numerous examples of
00:52:58
Speaker
lawyers citing false precedents that didn't exist. Another case where, I'm not sure which system, but one of them was asked for examples of sexual harassment cases involving lawyers with references. It spit back this wonderful example of an individual who was a professor who sexually harassed some student on a trip to Alaska with citations from a Washington Post or something like that.
00:53:20
Speaker
The whole thing was fake. The guy is a law professor, but he's never been there. He never went on trip to Alaska. There's no incidents ever of sexual harassment. So the problem is these systems are very, very confident and convincing. That the analogy of a very good improv comedy partner, whatever game you want to play, it'll play along with you. And in that sense, it can seem like it's doing more than it is when it's kind of adapting to who you are. With reliability, I think there's an interesting nuance here that depending on the situation the AI is in,
00:53:48
Speaker
You may have to be like super extremely reliable. Like, you know, 99.9999999 is still not enough because if it's operating fast enough and it's making a billion decisions a second, then even one in a billion, you're like, okay, is that one problem every second? That could be catastrophic. Even if one in a thousand were the actual problem, you'd very quickly reach that threshold. In other cases, you could imagine, well, if something's really super insightful, like it's coming up with interesting scientific advancements or ways of understanding the world,
00:54:18
Speaker
Well, even if it was 10% reliable, like one in 10 idea, even one in a hundred, it might be still very, very valuable. So I think it's going to be very mixed in terms of reliability, but the main thing one would want is to have an understanding of how reliable it is before it is deployed and used in any way. Cause you don't want something that you think is 99% reliable being 10% reliable.
00:54:39
Speaker
And the reverse, of course, because there's confusing things. Again, we're going to kind of see how it pans out, right? With the self-driving cars, I think we just saw crews say like, oh, well, you know, we consult AI or some network of people in some ways of 4% or 5% of the time, which sounds like a lot, but in another way is not, if you look at, again, the vast history of human invention, but it's not doing what it's supposed to be doing.
00:54:58
Speaker
How is reliability different from alignment? How are these concepts different? Is it just about getting the AIs to do what we want in both cases or how would you disentangle here? That's a good question. I think alignment kind of breaks down into various different related issues and problems. Does the AI do what we want?
00:55:18
Speaker
like who is we, what are our values, that sort of things. That is the alignment quagmire, but you're definitely right. There is certainly some overlap where if I've asked an AI for X and it's delivering what seems to be X, that seems like it's reliable and therefore it seems aligned with what I want. So in that sense, I would say, yeah, there's a lot of overlap between these terms. That said, when we're thinking about alignment, it's usually the broader, is this thing going to cause a problem in some way or another?
00:55:44
Speaker
But well, yeah, she's actually now is very similar. So I'm appreciating that these are highly linked because if you think of alignment, it could go an AI system could be misaligned for several reasons, right? It could be due to an accident. It could be due to misuse by malevolent actors, or it could be, you know, the AI itself becomes more capable, more power seeking.
00:56:01
Speaker
And if it was reliable in any of those ways, it would kind of make it worse. But with the accident example, it does really seem like well, if an AI is misfunctioning, it makes it not reliable. So on the fly, I don't know if I'll commit to this, but I'll think like maybe reliability becomes a bit of a subset
00:56:16
Speaker
of the alignment accident issue. And then, of course, it would also relate to misuse. In a way, these things are all very much connected. And that's something that's in the book as well. You can't make one long chapter that's 3,000 pages. So you kind of have to like, how can I put this into different chunks, even though these things overlap and interrelate? Yeah, so you mentioned these three categories of risks from AI, an accident, intentional misuse, and rogue AI.
00:56:44
Speaker
Which of these categories were you the most and on which timelines? So what are you most worried about right now? What about in 10 years and 20 years and 30 years? Yeah. 30 years, you're like, oh, 2053. Let's think about that for a moment.
00:57:00
Speaker
I can't even think that far in advance right now. But yes, I think it's a great question because you hear all these things like what's in the present day, right? And I think at the moment, it is more of the accident and it is the misuse. Just to clarify, by accident, we kind of mean that the system is not quite doing what we wanted to do, right?
00:57:16
Speaker
So when Bing chat was aggressively misaligned and it was kind of treating its users badly earlier in 2023, then that indicates that's not what the machine was supposed to do. And this is the broader category of, you know, people respond to incentives and there are perverse incentives, right? That you think you've designed a law or a rule in one way and then it turns out it's something else.

Strategic Foresight in AI Planning

00:57:36
Speaker
So in that sense, these things are happening semi-frequently to some extent and they're trying to like train them out, right?
00:57:42
Speaker
And whether even saying the wrong thing in terms of violence or sexual imagery counts as accident as well, that's more nuanced and we don't really have to get into that. But all I have to say is accident is a frequent occurring problem right now. With the misuse, that has also already happened, right? People are using voice cloning software to scam people out of money. They call a person, usually a parent or a grandparent, pretend to be the person's child because they've voice cloned that child's voice, and say like, I need money, please send it immediately. And people have already lost money.
00:58:11
Speaker
Now that seems like a bad case, but of course it's not nearly as dangerous as like new bio weapons, but that seems like it's also plausible. So if you sort of imagine, yes, why don't we say the next two, three, five, 10 years, it seems like accident is already happening. Misuse is most likely to increase before more power seeking autonomous behavior from AI itself or the rogue AI comes on board. The concern is since people who are building these things don't know
00:58:37
Speaker
When you put in a certain amount of compute or computational capacity, you get a certain level of capability. Will there be a dramatic jump in capability? Maybe, maybe not. And that uncertainty is a problem. So while it seems reasonable to say, well, right now misuse is more than near term problem.
00:58:52
Speaker
the world of AI is near term six months? Because then it's like, oh yeah, I meant like until the end of 2024. And then it's really also going to be both misuse and power seeking. So I think when you're sort of talking to average people or even policymakers, there's usually like a multi year, sometimes multi decade concern in the back of their head or timeline in the back of their head of how the world works. And if you say something is near term versus long term, you should clarify like, oh, by misuse, I mean like one to two to three years, and then overlapping within one to five years, perhaps power seeking as well.
00:59:22
Speaker
And that's how I would break it down. You write about strategic foresight, which is making plans for different scenarios. How does that help us manage these risks?
00:59:32
Speaker
Sure. It's really just, again, trying to think about ways to figure things out without committing to a specific outcome like a forecast would. Again, the weather forecast, I think, is the best example of the average person encountering probabilistic assessments of the future. 80% chance of rain tomorrow. With forecasts, metaculous, different prediction markets, what is the likelihood of X event happening at a certain time?
00:59:56
Speaker
That's great. I think we need those and they are important. I think in addition, we can use things like foresight, which explores a range of plausible futures. So you can look at the data, you can look at analysis and you can think, what is the most likely outcome? What is most probable? And that's very useful. But we can also think, well, what's plausible? Let's play through in kind of broad scenario planning is what this is. What might happen if, if AI becomes more prevalent, if image generators become more popular,
01:00:22
Speaker
What happens? For example, image generators become more popular, then more people use them. Does that affect artists? Let's just assume it affects artists and artists lose work. Then what happens? And you kind of do this cascading first order, second order thing that really I think helps open up the possibility space, the realm of what could happen. Now, sometimes it's hard to draw a direct line of what do we do now, but at least you've opened up your mind of what could be. And once you start to think back all the things that happened 5, 10, 15, 20 years ago,
01:00:50
Speaker
If you put yourself back 15 years ago and try to imagine what happens then, you realize, oh, people didn't see a lot of things coming. They weren't open-minded enough, or they didn't see enough of the data. There's a bit of hindsight bias, or is tempting to think, of course, that thing was foreseeable, and many things are not.
01:01:05
Speaker
But with foresight, I really think it's very useful just to, again, open up our minds. So with the AI issue, you can take an example where, say, artificial super intelligence arrives in 10 years. Just assume that's happened. Then work backwards. So what had to happen for that future to come into existence?
01:01:21
Speaker
what if it was 20 years? What if it was 50 years? And you can kind of think like, okay, maybe if it's 10 years, current projections seem to hold, but maybe if it was 20 years, there was some hiccups, there was some complications. We didn't understand the complexity of certain things and we hit certain walls. 50 years, I think a lot of us would be like, well, I don't know. We just got something wrong. We didn't understand the nature of what we were dealing with and a lot of projections now would be wrong. And you can do that in a variety of ways. So I think it opens up the ability to
01:01:45
Speaker
think about these issues in different ways without committing to something. But fundamentally, it also really helps challenge assumptions. If you sort of have discussions with people of what they expect the future to be like, and you could break that down, do you expect it? Like, what's your preference for the future? What would you not want the future to be like?
01:02:00
Speaker
What do you think is most likely? And so by doing all these different nuances, different themes about what they think the future might be like, you might be able to have someone realize like, oh wait, my expectation of the future is very much aligned with my preference for the future, right? Because that's how a lot of people are. But maybe my preferences are not that relevant to how the future actually exists. And then they go, oh, I didn't realize that was happening.
01:02:21
Speaker
or even just with AI stuff, some people don't realize how advanced these machines already are. And if you can say, this thing has already happened, then what happens? It really does help people think, oh, maybe this could be a concern.
01:02:33
Speaker
I think it's great to make plans for different AI scenarios, but I do worry that these plans will work best if we have gradual improvements, so say 10% improvement per year. We can go back to our plans, we can get feedback from the world, adjust our plans, but what if AI progress is more bumpy and much faster than 10% per year? Does this make strategic foresight less useful?
01:03:00
Speaker
Well, perhaps less useful, but not useful, right? It still has utility. It's sort of like we have to, again, make decisions under uncertainty. And so we should do the best we can. We should put resources into figuring it out. And we should map different possibilities and try to communicate those broadly to others to get feedback and see what things could be. Yes, you're right. If things are dramatic, if there's a big step change in capabilities, it doesn't mean all that work wasn't useful at all. But it might mean like, oh, I have to flip to page five. All those things I thought were going to happen in my document have now already been passed. What now?
01:03:30
Speaker
But hopefully, having these conversations themselves allow us to plan even better.

Exponential Tech Growth Challenges

01:03:35
Speaker
Like, okay, again, the use of scenario planning. It's most useful when there's 10% increases. What happens if it's 50? What happens if it's a 200? And again, you might not be able to really figure it out and have a perfect plan, but having something is better than nothing.
01:03:47
Speaker
And sometimes again, just thinking it through, you at least get through all say is like the emotional complications, either the barrier intellectually or even viscerally that, Oh my God, this thing just happened. And sometimes people need a bit of time to sit with that. Like, okay, now imagine something is much more capable than anything ever. And it's highly general. Let's think through what that might be like. And you can even think through how you might feel to then better make a decision when it's happening. Because again, if you're trying to make decisions and things are happening very quickly, urgency rarely helps decision making.
01:04:17
Speaker
So you discussed this fact, I would say, that we are living in unusual times in terms of economic growth, in terms of scientific papers published per year, in terms of the exponential growth of computing power available for a certain dollar amount. Yeah, that's one aspect of the world we're living in. Another aspect might be that ideas are getting harder to find. We have many more researchers
01:04:45
Speaker
for the same amount of scientific breakthrough. Economic growth might be slowing down in certain countries. Say we have these two trends and you can tell me whether you think these trends are actually occurring. Which of these trends are going to win out? Are we going to hit diminishing returns or are we on a path to even stronger exponential growth?
01:05:07
Speaker
Going infinite, right? I can find ideas on the internet very easily. What do you mean they're hard to find? Jokes aside, so I appreciate you highlighting that. This is certainly not a new idea, but I really wanted to, again, the average person hasn't thought much about these issues. Where are we sitting right now? And to think how humans currently live, again, not everyone. There are billions of people without food, water, electricity, that sort of thing.
01:05:30
Speaker
or at least hundreds of millions, things are very different than they used to be. So I wanted to give a sense of the grand sweep of how things are very different, to show just how much change has occurred, to then say, well, if so much change has occurred, it's at least reasonable, possible, plausible to think a lot of change might also occur in the future. So if you go back millions of years, proto humans are still developing,
01:05:51
Speaker
At one point, was it 1.6 million years ago, we have a hand axe, which is a sharp stone tool, and that was the best thing for a million years, a million years, 50,000 generations of people. And you're like, what? I was like, well, I made this sharp stone slightly sharper. Like, okay, well, that's not that great an advancement compared to like the iPhone and all the different new releases there.
01:06:12
Speaker
That said, for people who are sticklers, I'm sure there were also various wooden tools they often don't preserve as well. Humanity also lost knowledge about how to make certain tools at various points throughout history, which is something that's difficult to imagine now that we would lose knowledge about how to print books or something like that. Maybe at the very cutting edge of the technology stack, we can imagine that we might lose knowledge. It seems difficult for us to imagine now, I think, losing knowledge of how to create basic products.
01:06:39
Speaker
I think you're right. As the nature of the world has become more industrialized, certainly making a particular product often requires many, many people, sometimes thousands, sometimes millions in the entire supply chain. So that's its own types of complexity where now, well, maybe someone could have made a pencil and someone still can. Nowadays, it's a whole team and company and industries and machines.
01:06:59
Speaker
Is it the case that we're going to keep growing, getting into more advanced and things that are going to keep changing? I think it depends on what we measure. I'm well aware that the economists will say innovation has slowed or productivity is down in certain ways. I think that's important. I don't want to say it isn't.
01:07:14
Speaker
But from the normal human user experience, it seems like things are still remarkable. Now, you could say most of it's in the information technology space. It's the internet, it's the computers, it's the phones. Where's our new plane or washer dryer or the car or that sort of thing?
01:07:32
Speaker
And I guess I think that the changes in the internet in that space, like that we're easily doing this podcast are significant in a similar way to some of these other things. Now, yes, the invention of like a dishwasher is a truly big difference in terms of how it affected life, but so are recent inventions.
01:07:47
Speaker
So I don't want to say things are going to continue forever. That seems unlikely because it just also makes no sense conceptually. Usually there's an S curve in terms of how these things develop, right? And it's hard to know where we are in the curve. I would just say more comfortably for the next little while, it does seem computer, computer technology, that whole domain is going to increase a lot. And then that's going to ripple through. Now, whether some people think this isn't enough, I don't know. I guess I'm less concerned about that. I mean, I am concerned about economic growth being good for human development in that sense.
01:08:17
Speaker
but that we're running out of ideas, I don't know, there's still lots of great ideas. In fact, I think with the AI thing, maybe we need to slow down some of this development because we haven't figured out how to deal with the ideas we already have. I also think it's interesting, as you said, scientists now, sometimes on papers, there's 10, 50, 100 or some hundreds of scientists to do some of these things.
01:08:37
Speaker
Usually it's particle physics or something in AI or machine learning or maybe even biology. And yes, it's not like when it was with, you can picture your Darwin, your Aristotle or someone's like, oh yes, I think the nature of the world is blah. And I've unlocked some mystery of the universe. And yes, you could think that there are some diminishing returns. But at the same time, there's lots we haven't figured out, right? How gravity interacts at the quantum level, even if what the right interpretation of quantum mechanics is, will these things be figured out?

Aligning AI with Human Values

01:09:06
Speaker
Could we build fantastical ways of capturing energy, more than solar even, right? And these sorts of things. So I guess when I think of the new solar and wind stuff, which didn't exist when I was younger, the fact that, you know, this immersive VR things exist, didn't exist when I was younger, planes haven't changed a lot, sure. But now people have gone to space casually. Like again, in the history of the world, none of this has ever happened. So yes, on a multi-decade span, it may seem like things have slowed. But if you really take a step back,
01:09:34
Speaker
thousands of years, even on a millionaire scale, it's all squished right in the past couple hundred years. And so like, let's just let's keep these things in mind. But let's see how the next couple decades pan out. Let's talk about alignment. So one objection you might give to the whole project of alignment is to say that humans can't agree on what values we should have. Philosophers haven't been able to figure out ethics. What is it that we're trying to align AI with if we haven't
01:10:03
Speaker
determine our values yet? It's a great question, and it is currently unsolved. In some ways, we're going to have to muddle through. That would be my concise answer. It's sort of like, what do we do when humans disagree? Well, we try to come together in some sort of compromise, some sort of consensus, hopefully some sort of democratic system where people don't necessarily get everything they want, but they get enough that the world functions decently for most people. It's not perfect by any means, but then compared to what? The Winston Churchill line.
01:10:31
Speaker
So with AI, yes, this is the technical alignment issue, which I think is critically important. This is more like, does the AI do something we didn't want to do, like by an accident, by a technical point of view? But yes, if we solve the technical alignment problem, that's great. That's amazing. That's difficult, but it's still going to be amazing. And then there's this other problem which was always there that slots right into place. Well, now what? Who decides the fate of the world type thing, right? And if these systems are as powerful as they are, it does seem very bizarre how we're currently going about it.
01:11:00
Speaker
Yes, we do have states that are starting to issue executive orders like the White House did or other regulation or the EU AI Act, but right now it seems, I want to say weird that a few people are not controlling the fate of the world, but by their own standards, by their own statements, they're developing by design, by their own goal, a very, very powerful systems that could have vast control and abilities. What is

Power Concentration in AI Development

01:11:24
Speaker
going on here?
01:11:24
Speaker
And this is where the book is just trying to raise awareness of the, yes, even if this thing about AI is solved in a technical sense, there's this other problem about how do we ensure everyone has a voice? How do we ensure people are represented? Is there going to be even greater power imbalances and inequalities that will result in a way that's truly disruptive? So I think, again, it's like a call to arms. We need a lot more people thinking about it. We need a lot more people aware of it.
01:11:48
Speaker
And even like an if-then scenario, like, okay, so say I just developed, what's the plan? How will resources be distributed? Will these companies just go to multi-trillion-dollar, quadrillion-dollar things? Who knows how far it goes? Does something end up getting nationalized? These are delicate things maybe to say in certain environments, but hopefully conversations are at least happening behind the scenes of, okay, let's plan through, again, the scenarios. If something isn't aligned with other people and they could use it for malicious use, that's one thing.
01:12:15
Speaker
But just the normal, my preferences are different than yours, and that may make your life a lot worse. That's something we really need to pay attention to. I think also there's some hope that given our humans have a shared evolutionary history, we share a lot of our values, even though we also disagree strongly with each other all the time. I think there's some hope that we won't have to actually get something like a final theory of ethics or something.
01:12:45
Speaker
I want to say we should definitely not stop working on alignment until we have such a theory that we can then plug into our AI systems. I think we can probably agree on some basics of life and then, as you say, model through. So thinking about healthcare, for example, I think most people can agree that most people should have
01:13:08
Speaker
access to all people who have access to fantastic health care and that's that's that's something where i might be able to help and then we can go on to the next thing in the next thing in the next thing i think there's some sort of a there might be a. Too much focus on trying to trying to develop a perfect theory of ethics and we should we should actually say model through.
01:13:29
Speaker
Well, I think that sort of might be the only way, right? As you said, this philosophers, academics, anyway, all humans sort of been working on this for thousands of years. And of course, we don't all agree. And importantly, we don't often agree with ourselves, right? We change over time, your preferences from a child to as an adult to maybe even five years ago, like I feel differently. Well, were you correct five years ago or now? Oh, which you know, which system which value system did you get the AI? Was it supposed to lock in? Was it supposed to supposed to know better?
01:13:55
Speaker
I think it's a very interesting but also very concerning space that, like you though, I think, can't we agree on some basics? We could think the United Nations Declaration of Human Rights or development, most countries did sign on to these things. There is a general sense like, okay, people should have food. We'll get to healthcare in a moment, but food, water, sanitation, up to grade A primary education, these seem to be universals. Hopefully, yes, with abundance from the AI, we can all agree, can't we just
01:14:23
Speaker
and tuberculosis, can't we really solve this malaria thing? Can't we have everyone more educated? But there will always be someone who disagrees. There will always be someone from a different angle. There are malevolent actors. There are currently 40 million people in modern day slavery. Clearly, they are there because other people are doing terrible things in various ways or the situations they find themselves in.
01:14:41
Speaker
are so compromised. That said, how do you ever get rid of that? As you said, we kind of muddled through, while at the same time certain things remain wholly unacceptable. With the AI, it is the concern that, and this is of course race dynamics all over the place, that if certain malevolent actors get their way first, then they may then be able to disempower or displace what we'll say is the loose, reasonable majority that thinks everyone should have food, water, healthcare, and education.
01:15:07
Speaker
In sort of traditional discussions of alignment, we imagined perhaps that we would sit down and hand code human values into AI systems.

Large Language Models & Alignment

01:15:19
Speaker
And then we thought about how complex human values are. And that was a cause of despair. How could we ever summarize something as complex and inconsistent as human values?
01:15:32
Speaker
Do you think that large language models change this picture? Because large language models can digest all of human knowledge and then at least they can pretend to have knowledge of human values, ethics, psychology, and so on. Is it the case that large language models make the alignment problem easier?
01:15:52
Speaker
Hmm, that is a great question. I think that's one of the things we're currently figuring out, right? From what I understand, OpenAI plans to use AI models to help solve the AI alignment problem. And in a way, we need AI to assess and evaluate and to test to see if it is aligned so that AI is inherently involved. But to your general question is maybe, and what I think is a perhaps interesting development of this, so what if the AI system looks at all human knowledge, right? And it says, you know what, I figured it out, guys, everyone, this is what you should do.
01:16:21
Speaker
We're like, I don't want to do that. But by your own standards, you said you cared about these things. And then humans are like, no, no, but I didn't. Not really. Come on. And so in the book, before I talk about the more like AI alignment problem, as people know it, I talk about Isaac Asimov's laws of robotics.
01:16:37
Speaker
I think this was just the very easy way into simple rules to align computer systems don't work. When you say don't harm something, you're like, that seems reasonable. That seems obvious. Of course, put it in the machine. What does that mean exactly? Don't harm at all? If someone needs to get surgery where they have to be cut into, does that count? What if it's a risky surgery? When you say don't harm anyone,
01:16:57
Speaker
Does that include non-action? This is all the omission bias, right? Where if you drown someone, we see that as a terrible thing. If you let someone drown, well, we see that as a bad thing, but not usually as bad as the intentional drowning. So is an AI system now supposed to think, well, wait, I'm letting people suffer. People are currently dying in poverty needlessly. Should I be doing something about that? Well, by your own standards, you said don't allow humans cause harm or come to harm. What am I supposed to do now? And you can see that very much disconnect and maybe the thing short circuits.
01:17:25
Speaker
Yeah, does harm imply any probability of harm? Then you cannot move, you cannot do anything because anything, any action at all could cause harm. It just doesn't work.
01:17:37
Speaker
No, exactly. And so I think it'll be very useful for AI systems to provide insight. But like any sort of human enterprise thing, we might get back an answer we don't want or that's very hard for us. And whether people will take that on board is also going to be highly variable. Like for people who are very much interested in ethical reflection and philosophy, they might have made like substantial progress and they realize, you know, my beliefs mean I shouldn't do X and therefore I don't do X.
01:18:02
Speaker
A lot of us are like, I know I shouldn't do X, but sometimes I still do because I'm a human. Again, progress is good. It's not to say that people should be absolutist about these things. It just sort of highlights the difficulties of the human system, the human nature of the whole thing. Do you think current AI systems have self-preservation?

AI Self-Preservation Debate

01:18:20
Speaker
Good question. I would think I would probably defer to like who's doing the most cutting edge research now in the advanced labs. From what I understand is slightly but not a lot.
01:18:31
Speaker
There are some examples, I think, more in like the theoretical or there's like a prototype where they've played around with certain systems and, you know, simulated environments. And the system does seem to engage in certain behavior to protect itself, to achieve a goal. Whether it's full on self-preservation as we commonly understand it, I don't think we're there yet. But yeah, I would kind of think like, well, there's probably some paper on archive that I haven't had a chance to read yet that may say otherwise.
01:18:57
Speaker
Do you think that AI systems will develop self-preservation as we get more advanced AI? I do, or at least I think I do to the extent that we should be concerned about it. Again, nothing's 100% here, but there's enough of a risk that the whole Stuart Russell, you can't fetch coffee if you're dead thing, to have a system do anything, it has to exist. To me, it is reasonable, it is plausible that to achieve anything to exist, and once you know that, you might engage in various activities to ensure that you do exist.
01:19:26
Speaker
Now, maybe there are ways to contain this or to circumvent it where you somehow clearly specify a goal with a certain amount of error bars and then the system is supposed to shut itself down after it's done that goal, perhaps. But when we talked to it before, the incentives for autonomous systems that are highly capable, highly fast, and so on, will then have a disconnect with something that shuts itself down all the time.
01:19:47
Speaker
You imagine like, oh, I like to use my phone and after I send one text, I shut the phone off and then I shut it. I turn it back on like, well, that seems really painful and slow, right? And it's people who just might not do it. So I think it is plausible that the systems will engage in such behavior in a way like the expectation would be they probably would. I guess I'm trying to say the default expectation to me is that something that's very, very intelligent will probably engage in these behaviors. So we should be on the lookout for it and really try to figure out if they are or they're not versus the expectation that they wouldn't.
01:20:15
Speaker
It doesn't seem to me that GPT-4, when I talk to it, is trying to self-preserve at all. This might seem naive, but it seems to me that I can just click stop on the chat whenever I want, or close the tab whenever I want, and there's nothing the system can do. Do you think self-preservation will emerge together with more autonomous systems? That's a great point. So yes, right now, if you think about how could this thing be autonomous, I literally close it. I click the button, and it goes away. It's not secretly hiding somewhere to our knowledge here.
01:20:45
Speaker
I guess there's a small probability that it is already. That's the thing. By the way, I tried to be careful in the book where AI that's super smart can do anything, right? Because then you kind of get into these almost non-falsifiable things like, well, maybe it's secretly doing the thing and it's so good at deception that it looks like it isn't deceiving. Now, I think there's something to it and we should be wary about it. But I also think we have to be careful because those explanations are not satisfying to most people. Oh, look, it can do anything. Well, tell me about it. Oh, anything. You're like, well, I don't understand what you mean.
01:21:12
Speaker
To your point though, yes, right now I'm not concerned about that. I don't see an issue with that. That said, as we start to get beyond the GPT-4 or Gemini and these frontier models, I think it's a very important thing to assess, not only before deployment, but in the training stage. There should be methods and benchmarks in place, and even asking the labs, what are your expectations for the capabilities of your models throughout the process?
01:21:34
Speaker
And if the lab themselves are consistently wrong in a certain direction, like, oh, we thought it'd be certainly capable, and it ends up being more capable every time, you're like, that's not a good track record. So next time when you say it's not going to be as capable as we thought, it probably will be. Just some sort of way to get at how we might understand these things.
01:21:51
Speaker
As it goes in the future, again, we currently have autonomous systems, right? As I said, so they're doing banking stuff. They're doing fraud detection. They're doing cybersecurity things. They're stopping missiles that are being bombed. The incentives to have these things become more autonomous will be there. And then again, if the system doesn't exist, it can't really do its job. I want to be careful here that there's a sort of the as if goals, right? That the system only needs to act as if it is engaging in self-preservation.
01:22:18
Speaker
It doesn't have to have like, I'm an AI, I have a certain goal, I need to exist. It may do something like that, but it doesn't have to. And I don't want us to sort of think it needs to be the case. It's more just going to engage in various, we'll say, from our perspective, reasonable goal oriented processes to make it more likely to achieve its goal.

Unexpected AI Capabilities

01:22:35
Speaker
And some of those will involve getting more power to ensure its own existence so we can serve that end. Now again, maybe not, but there's enough of a maybe so that we should be very careful.
01:22:44
Speaker
There's a bunch of emergent capabilities in AI systems that could be problematic if we're trying to align these systems. So we're talking about power seeking, deception, manipulation of humans, using humans to achieve goals in the physical world and so on. Which of these traits do you worry about the most? And where do you think we are on the road to these traits? How close do you think we are to having manipulative or deceptive or power seeking systems?
01:23:13
Speaker
I don't think we're there yet, but what could happen in a short period of time could definitely be the case. On the first example, just the emergence itself, I think a lot of people think the recent large language models, your chat GPT, GPT-4 and cloud and whatnot, they are largely emergent in a lot of their capabilities. You have these systems that were trained
01:23:31
Speaker
on mainly English, and then they can speak other languages. That was not the plan, right? Or it was trained mostly on some sort of corpus of text, and then it can also do computer programming. So it's not that if someone had thought through a lot of this, they would have realized, oh, maybe it will also do the thing. It's that from our perspective, it did seem like certain behavior emerged in a way that was unexpected on a plan, at least in some cases in some domains. So all I have to say is emergence seems to be like all over the place, right? Especially if you try to prompt a model in a certain way, and you get a certain thing which
01:24:01
Speaker
You didn't think it might do, but then it would. It was just strange to me knowing something about the training process of DPT-4, interacting with the system in my native language and talking to it perfectly in Danish and seeing that it's pretty capable in my native language was kind of a surprise and an interesting experience. And I believe that a lot of people have had that experience of talking to it in their native language. And knowing that it wasn't trained specifically to do that, it's quite impressive.
01:24:31
Speaker
I agree. I think it's staggeringly impressive. It's hard to think of a human parallel because clearly Danish was somewhere in the training set, right? It's not like it read English and then it like invented Danish. No, no, of course not just for everyone else. But at the same time, it's like, well, you have a test coming up. It's an English test. Here's a whack of material and like study all of it, but focus on the English like, okay. And like, and then Danish like what? And of course it's not just Danish. It's numerous of the languages. It's also math, not usually the addition, which it has trouble with, but like complicated math.
01:24:59
Speaker
And if famous mathematicians like Terrence Tao are using these systems, like it really helps improve my workflow. You're like, okay, well, that's clearly a significant indicator of capability. So you could also imagine if then someone's like, we really need a dedicated math system.

AI Manipulation & Malicious Actors

01:25:13
Speaker
This goes to that general versus narrow. Would it be more capable? It seems to think like it should be more capable, but it's possible. The general somehow is more capable, right? And then fine tuning tweaks it. To your other question though, I am
01:25:27
Speaker
Concerned just sort of thinking about sort of that security nature of things, right? If you're looking through the world from a security lens, security mindset, you think like where are the weak links, right? And all companies and even individuals deal with this to some extent. And people can be manipulated in a sort of social engineering way, right? People get information just by calling someone up, pretending to be someone else, or there's more overt cyber hacking and whatnot. But with AI systems, it's almost like you take the normal problems that already exist. And then there are also still problems in the AI space.
01:25:54
Speaker
So whether there's people who are bribable, people who are manipulatable, people you can hire off the dark web. It's one of these sort of things that most people don't like to think about for obvious reasons, but there are like hit men. You can hire them to kill people. They just don't hire them to kill you because you're not that important. Thankfully, it's this whole weird world. You're like, what happens? What type of criminal activity? There's actual criminals who do things. And so to think that AI systems won't liaise with if they're trying to cause harm,
01:26:21
Speaker
nefarious individuals who literally can be paid to do crime to make themselves more capable, it seems like that would be an oversight. I guess I'm concerned about a range of these things. Again, at the moment, not that concerned about that much of it, but because it often takes months, years to address problems, you need the infrastructure in place now. You need the, what is the problem type conversations? Problem definition is very important. People often speak past each other. If we can agree that an AI system that would have an ability to manipulate
01:26:49
Speaker
or that would have an ability to hire people to do certain tasks is a potential problem. That's a good start. And I think we already saw when GPT-4 was being evaluated, there was that famous case where, again, the AI system itself did not do this, but it was liaising with the researchers in between.
01:27:05
Speaker
And the GPT-4 was able to hire someone off TaskRabbit and lie about why it needed that person to fill out a CAPTCHA to build a commuter code. Again, the system didn't do it, but it's like, well, it doesn't seem that complicated to connect those things or to have enabled the system to do it. So if it was the case months ago that if a system with a bit of tweaks could have hired someone off the internet to circumvent something, to stop AI systems and lie about why it did it,
01:27:31
Speaker
why wouldn't this be possible in the future? It's already happened. It's like, okay, so now imagine something that's more capable, more capable of lying, can sort of think through step by step on its own, would know how to navigate a decision once it has more examples.

Unified Solutions for AI Alignment

01:27:42
Speaker
Again, this human history, fiction novels, movies, or even just current events are replete with numerous examples of people deceiving each other and engaging in various complicated nefarious schemes. And you could imagine that as a wonderful training data set for someone trying to cause harm.
01:27:57
Speaker
Do you think we will get something like a unified solution to the alignment problem, like something that solves all of these problems that we just talked about, deception, manipulation, power seeking, and so on? Another way to ask maybe is, do you think we'll get something like the theory of evolution, which solves a bunch of distinct problems in a general and simple way? Or will it look more like wagon mold, solving one problem, moving on to the next problem?
01:28:27
Speaker
I like the option between evolution and whack-a-mole. The evolution of whack-a-mole. I think it's probably a bit more the whack-a-mole. The reason here is, again, the straightforward logic of human incentives and human behaviors. Say someone develops a system that, as you said, reliably does what it's supposed to do.
01:28:46
Speaker
Well, that means it could be used for good or bad, right? At the moment, the AI systems, and I briefly mentioned this in the book, they're kind of sort of loosely corporate American Western values, like what's acceptable and what's not. But what it does mean is that someone is sort of has their hand on the lever. Someone is sort of manipulating the system to do X and not Y. So it is already the case that certain values are being implemented or at least displayed through these systems of a certain type.
01:29:12
Speaker
Now, if you just had the alignment of, you know, does what we want, if someone was a malevolent actor and it wanted to cause harm, it could use the thing to do what we want. So then you're like, well, is the alignment problem really going to solve not having bad people do bad things or not having, we'll say even desperate or confused people inadvertently do bad things? That's the other thing. Well, I would say people could be bribed and manipulated. People could also just be persuaded, like their child is very sick, they're desperate, they need money, or maybe like, oh, my child has a certain form of cancer.
01:29:41
Speaker
I can, I can cure it. I just read a thousand papers. I just need some resources. I understand desperation could also be a factor here. So trying to get rid of like, not say everyone's evil in the Ferris out there. It's just like, there's many reasons why someone might give up power. And it's hard to imagine how sort of the alignment problem might sort of address all that.
01:29:59
Speaker
In an earlier version of the book, actually, there were four different alignment problems I was going to talk about, but I thought that was too complicated for my audience. It really was like, you're not fully aligned with yourself. We're not fully aligned with each other. We're not necessarily fully aligned with AI, and then AI itself may not be aligned with us or with itself if there's multiple AIs. But I thought, okay, let's streamline this to then think about Isaac Asimov's laws of robotics to make it easy, that type of alignment is a problem, and then the more traditional the alignment problem stuff.
01:30:26
Speaker
Right now, there's a bunch of proposals on the table for making AI safe. There's a lot of attention on this issue in policy circles. And I think you had a great discussion in the book about principles for selecting among these proposals for AI safety. Maybe you could talk a bit about these principles.
01:30:44
Speaker
Sure thing. So yes, there's a lot of great proposals out there, but it's sometimes useful to take a step back and even think like, what's the framework that we're even using to think through these proposals? And even if one set out loud, like, that's kind of obvious, like, sure, but let's put it down. Because sometimes what you think is obvious is obvious to you and not someone else, or you see where you might agree, right? If you have five principles and I have five, maybe three overlap, and that's great. So I kind of tried to keep it simple and think through, like, what are the three main ways we might want to think about this or that should be a part of any principle?
01:31:14
Speaker
And so that's verification and agility, adaptability. That's the second one. And the third one is defense in depth. So verification is just realizing that we need to verify. It's nice to say trust, but verify. But the idea is that everyone should be accountable. There should be transparency. There should be verification mechanisms built in.
01:31:32
Speaker
And if actors in the AI space, the companies that are developing these things say there's no problem, there should be no problem then. Let's verify everything, right? If you think you're not developing anything harmful, let's have that as a backbone, we'll say, of any proposal that you want to ensure that things are happening
01:31:48
Speaker
as they're understood to be happening as much as possible. Even the idea that people will try to circumvent verification, as they always do in this world, at least to some extent by some actors, that should also be built into the process. You can't just take people's words for it. I can't remember the quote, but it sounds like they can't be grading their own homework here in these AI systems, like these AI companies. It just doesn't work that way.
01:32:08
Speaker
And again, even if they're not necessarily nefarious, great. We'll just have it all above board and have ensure that verification is there. The second one, agility and adaptability. Again, this isn't like a novel insight, but it's just really trying to highlight how fast moving the space is and how we really have to think through what if it happens even faster,

Principles for AI Safety Evaluation

01:32:26
Speaker
right? That a lot of times in the policy regulatory legal space, things take months, years to work through the system. And what if it has to be much less than that?
01:32:34
Speaker
or what if you have a law that you thought was useful that, you know, usually what happens is the laws developed. It's somehow it comes to exist in the world. Sometimes it interacts with challenges from the courts. There's some sort of settled agreement. The law seems to have some sort of a standard or consistency and then maybe it's challenging in the future.
01:32:50
Speaker
That's a multi-year, sometimes multi-decade process. For the AI stuff, it might have to be multiple months, multiple years. How can we even start thinking about changing almost the machinery of government in parts of the world to at least address these sorts of things? Yes, you can have amendments to laws, but really thinking through that this is a factor and that people should be prepared for things happening faster than usual.
01:33:11
Speaker
The third one, defense in depth. This comes more from cybersecurity than the military definition. And that's that we need multiple layers of defense. You don't expect any one particular proposal or any one particular action to lead to safety or security, but you're trying to have multiple layers. So if any one of them fails and you actually expect them to fail, that there are others there to sort of pick up the slack.
01:33:32
Speaker
Makes a lot of sense, all of these principles. So in the book, you discuss eight proposals for safe AI innovation. Maybe you could talk about your favorites here. What are the most important things? It's like, how do you choose a favorite child, right? No, I just think the music. So why there's eight, by the way, and why isn't there 10 and why isn't there seven? Well, you know, eight seemed like a good number. I think this really encapsulated what I thought were a good number.
01:33:56
Speaker
Also, why I want these proposals or why I want these to be discussed is not that these are again definitive. The whole book is trying to be very open-minded solutions oriented. We need more people working on this and we all need to come together to work on this. If it is the case that someone's like, you know what? Most of your proposals don't work for me.
01:34:12
Speaker
okay, which ones do, right? Because as we know, this is unfortunate tension that exists sometimes in different AI safety and ethics communities, or some people who are more focused on, we'll say, present day concerns, like algorithmic bias, are taking issue with people who are more concerned about existential threats. And this is an unfortunate division. I mean, there's some rationale behind it, right? Because allocation of resources, but I think it's largely just an unfortunate way the world has turned out, and it doesn't have to be this way. So with the aid proposals,
01:34:37
Speaker
There could be like, okay, yes, some of those are maybe more X risk or existential risk oriented, but what are the ones that work for you now? Some sort of liability. That's one of the proposals. Great. Some sort of a transparency or identification you're interacting with an AI system. And that's where I'm trying to like, you know, all of branches all over the place, build bridges here that if certain proposals don't work, pick the ones that do or show me your list of eight. Let's see what works and let's work on those together that seems like it's going to have the most broadest support that is going to be good for all these issues.
01:35:03
Speaker
I think if we took a step back, some sort of liability for these systems, again, why do people do anything? Well, they respond to incentives. And if they are held liable, sometimes personally liable for how these systems malfunction, that is usually a good lever. And it can't just be a sort of distributed sense of like, well, I did a thing, but I'm not really accountable. Like, well, let's think through what makes the most sense here. And maybe if you are responsible for distributing an AI model, even though you didn't create it, you do bear some of the liability here, some of the accountability.

Managing AI Compute Resources

01:35:30
Speaker
I think compute governance is also a very important one here. This is sort of like getting a sense of and controlling who has access to which chips. The US has already implemented some of these things with export controls on China and some recent additions to that, which are making it even more restrictive. The book Chip War is a fantastic exploration of some of these issues.
01:35:49
Speaker
In that sense, if you think of the main reasons why AI is increasing in capability, usually people think there's three main inputs. You have the computational power, you have lots of data, and then you have the algorithms itself, which are partly a human thing, like just talent pool that are building these things, but also insights from science and other domains and even AI itself. How can you, as taking a big step back as a system, as a government, as an international organization of sorts,
01:36:14
Speaker
think about how to control or at least have a sense of influencing these inputs. And data is kind of everywhere. It's really hard to stop people accessing data. With the talent pool, the algorithms, that also seems like we want free mobility of labor for the most part, right? We don't want to lock people up and tell them they can't work certain places or anything else. That seems bad. But with the compute governance, it is more of a tangible physical thing. There's a chip, which is understood at least to some extent of what it does and how it works. And getting a sense of where these chips go could be very, very useful.
01:36:43
Speaker
And to reassure some people, this isn't for all AI systems. This is really the frontier AI model. So for the average consumer, the average AI business, the average AI product, this doesn't really affect them at all. It's more that, are you using the most advanced chips, and do you have thousands and thousands of them that you can put together in a cluster to then train highly capable models? So it really is something that sort of like those tax, like, oh, if you make more than $100 million, this is a tax. And people are like, I would never want to be taxed like that. Like, well, don't worry, you never will. So for the most part, it doesn't affect most people or most businesses.
01:37:12
Speaker
but I think that's very useful. Now, how you go about it, with the proposals, I have an idea, I have some sketch, I have some detail, but I always want to say, let's think this through. What is the best version of this that could go forward? With the most recent evidence, with the most recent analysis, what is the best version? Is it useful to have ways that the chips can't communicate with each other as much, so you can't bundle more weaker chips together?
01:37:34
Speaker
Or should there be some sort of remote kill switch even for the chips themselves so we can't shut down the internet? Maybe. Let's study it. Let's see if that's viable. If it turns out it's not for good or bad reasons, then we wouldn't do that. But really trying to think through how can we access some of that stuff.
01:37:48
Speaker
One of your proposals is about investing in AI safety research. Here I'm imagining technical AI safety research. At least when I try to see this from the perspective of a funder, I worry that it's extremely difficult to choose among proposals. What should you fund?

Funding AI Safety Research

01:38:05
Speaker
How should you respond to an applicant being optimistic about solving the problem? Is pessimism a sign that you understand the problem in a deeper way? There's so many complexities. In a sense, this is just normal science funding, but I worry that this problem is even stronger in trying to fund AI safety. Do you have any ideas about how to go about evaluating proposals for technical AI safety research?
01:38:32
Speaker
I think you raise a great point. And the smile was just the idea that, yes, if someone is more depressed or less optimistic about solving the problem, do they get more money, right? Intellectual integrity and, you know, epistemic modesty and all these things. And maybe that's the case. So I would say that I don't necessarily have specifics. I think it's sort of like, do we agree that there are currently a lot more people that are trying to increase capabilities that are concerned about safety? Like, do we agree this is a fact?
01:38:57
Speaker
whether the numbers like a hundred to one or more or less, you know, AI capabilities researchers versus safety researchers, do we agree there's some sort of large discrepancy? And that probably shouldn't be the case. I think Ian Hogarth had that nice sort of two chart graph, like, well, one line's going way up and the other one's not really going up and meeting it at all. So if there is a disconnect between safety and capabilities, what should we do about that? And it seems like somehow funding safety research seems like a good idea. Sort of like, let's start there. If we can't get agreement there, then that's an issue. But assuming there is some agreement, yes, how to actually go about it.
01:39:26
Speaker
And here I would probably defer to people who've already been in the space for several years and have them talk to each other and see what are some current best practices. You're right. It's an absolute mess. And how to study.
01:39:38
Speaker
Safety, without increasing capabilities, seems to be one of the biggest issues of all. And as Anthropic and other people have reasonably said, even though it's kind of a bitter pill to swallow, but we need the advanced AI system to study the safety of the system. And it doesn't really make sense to try to think about the safety of, say, GPT-4 or GPT-5 if you're currently working on GPT-2 or GPT-3.
01:39:58
Speaker
from what I understand, not that much of what you learned is going to be applicable and translatable. So I think it's a really important issue. And I think it's just sort of really orienting people to realize, look how much capabilities are increasing. And there is not nearly the attention paid on safety, not at all. And in fact, some of the organizations don't care at all.
01:40:16
Speaker
And so if people really onboarded that, OK, some of these people seem to be more responsible actors, and some of them don't care about safety at all, they're just trying to make the product as fast as possible and dump it into the world, like I think it was the Mistral AI, then we should be concerned. Great. There's a lot of complexities there. We could talk for hours on this topic. Darren, thanks for talking with me. It's been a pleasure. My pleasure as well, Gus. Thanks so much.