Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Why the AI Race Ends in Disaster (with Daniel Kokotajlo) image

Why the AI Race Ends in Disaster (with Daniel Kokotajlo)

Future of Life Institute Podcast
Avatar
2 Plays2 minutes ago

On this episode, Daniel Kokotajlo joins me to discuss why artificial intelligence may surpass the transformative power of the Industrial Revolution, and just how much AI could accelerate AI research. We explore the implications of automated coding, the critical need for transparency in AI development, the prospect of AI-to-AI communication, and whether AI is an inherently risky technology. We end by discussing iterative forecasting and its role in anticipating AI's future trajectory.  

You can learn more about Daniel's work at: https://ai-2027.com and https://ai-futures.org  

Timestamps:  

00:00:00 Preview and intro 

00:00:50 Why AI will eclipse the Industrial Revolution  

00:09:48 How much can AI speed up AI research?  

00:16:13 Automated coding and diffusion 

00:27:37 Transparency in AI development  

00:34:52 Deploying AI internally  

00:40:24 Communication between AIs  

00:49:23 Is AI inherently risky? 

00:59:54 Iterative forecasting

Recommended
Transcript

John von Neumann's Brain and AI Simulation

00:00:00
Speaker
Think about the smartest humans, the best humans in Given Field, like John von Neumann, their brains are not very big and their brains were not even trained on that much data. That proves that it's in principle possible to have ah relatively small rack of GPUs running a simulation of a John von Neumann level intelligence. If the company had published, here's how powerful our AIs are getting.
00:00:20
Speaker
Here's all the eval results of what they're capable of, the goals and values that they're supposed to have. Here's a description of our alignment technique. Here's some stuff on like how we're going to check if it's working. If they just like published all that stuff, they have that stuff internally. Then outside scientific experts could read it and critique it.
00:00:35
Speaker
But if instead you just make these sort of vague announcements about how for national security reasons, blah, blah, blah, blah blah then like they don't have anything to work with. you know Can't actually contribute. Daniel, welcome to the Future of Life Institute podcast.

AI Superintelligence and Human Impact

00:00:48
Speaker
Thank you. Happy to be here. All right. Why do you expect the impact of AI to be enormous over the next decade? Several of these companies, Anthropic, OpenAI, Google DeepMinds, explicitly aiming to build superintelligence.
00:01:06
Speaker
Superintelligence is an AI system that's better than the best humans at everything while also being faster and cheaper. That's why i think the impact will be enormous. I think that if you just meditate for a bit on all the implications of them succeeding at that before this decade is out, it will in fact be the biggest thing that's ever happened to the human species, ah I think.
00:01:32
Speaker
Is there a way to express the magnitude of change here? Well, it'll possibly be the end of the human species, for example. And if it's not the end of the human species, it will be ah transition from humans basically running the show.

AI Research Acceleration

00:01:48
Speaker
One core to this prediction of rapid AI progress is the notion of AI beginning to speed up AI research itself. How much should we expect AI to speed up AI research?
00:02:04
Speaker
We are quite uncertain about this. AI 2027 represents our sort of best guess quantitatively what this would look like. The way that we think about it is we break down ai capabilities into ah so sort of ladder of of capability levels. so And then we ask how long will each level come after the previous level?
00:02:25
Speaker
And I forget exactly what we have in AD2287, but it's something like six months to go from autonomous superhuman coder to a autonomous agent that can completely automate the AI research process, as well as the best AI researchers, while also being faster and cheaper.
00:02:46
Speaker
So like six months there, and then like a couple more months to get to... super intelligence for the domain of AI research. So qualitatively better than the best humans at everything related to AI research while also being faster and

Uncertainty in AI Research Speed

00:02:58
Speaker
cheaper.
00:02:58
Speaker
ah And how much qualitatively better? Well, we we said something like two standard deviations, I think. Or maybe we said twice as far above. Take the gap between the best human and the median human researcher. And think you said twice that gap.
00:03:13
Speaker
And then broad super intelligence would be like that except for everything, not just for AI research related tasks. And I think that's like, you know, a month or two beyond that. I i forget exactly what we say. If you're interested in the actual numbers, you can go look at, you can go read it at 20.7 and you can go look at our attached takeoff forecast, which has little back of the envelope estimates for all of these things.
00:03:35
Speaker
Again, with lots of uncertainty.

Path to Superintelligence

00:03:38
Speaker
But quantitatively, it's something like that. The bottom line being, we go in about a year from... AI systems that are able to operate autonomously and therefore and successfully automate the job of programmers.
00:03:54
Speaker
Basically, you can treat them as like a remote worker who is a software engineer and who's really good at their job. but
00:04:01
Speaker
It takes about a year to go from that to super intelligence, according to our best guess. But it could go you know, five times faster, for example, or several times

AI Takeoff Scenarios

00:04:15
Speaker
slower.
00:04:15
Speaker
um How much does it matter whether it goes five times faster or say it takes twice as long for for the end date of of reaching super intelligence?
00:04:30
Speaker
ah Well, the takeoff speeds is very important for overall, for the dynamics of how this goes down, right? so like let's Let's say we fix the the date of superintelligence as January 1st, 2029.
00:04:45
Speaker
And then we vary the takeoff speeds such that in one world, we get to the autonomous superhuman coder milestone two months before. And in another world, we get to the superhuman coder you know several years before, such as by the end of 2025.
00:05:07
Speaker
Those worlds are very different. and In the first world, it's just going to hit humanity like a truck. And the the president might not even know that the AIs have automated AI research within the company when the superintelligences already exist.
00:05:24
Speaker
In fact, theoretically, at least the company might not

AI Development and Competition

00:05:27
Speaker
even know. They might... still think, oh, there's this really exciting project where we've like taken our latest coding model that still hasn't been released to the public and we've had it you know do a bunch of AI research and then, oh, whoops, super intelligence. you know Now it's hacked its way out of the servers.
00:05:42
Speaker
Now it's taking control of everything, right? That's what the sort of two-month world looks like. And then the four-year world looks completely different, obviously. It looks like this...
00:05:57
Speaker
ah crazy race between companies, much like today, where everyone can like plot lines on graphs and see that their AIs are incrementally getting better and better and more autonomous and closer to closing the whole research loop.
00:06:10
Speaker
And then they do close the full research loop and they're completely automating the research. but it's not immediately getting them to super intelligence. They're sort of like watching the lines start to bend upwards on all the graphs, but it's going slow enough that humans are like able to watch it and like talk to each other about it and, you know, make products and make announcements to the public. And, you know, there might be whistleblowers and the there might be multiple companies that are sort of reaching similar levels and watching those lines go up.
00:06:41
Speaker
And, My guess is that most people are basically keeping their heads in the sand. Most people at the companies and in the government are basically keeping their heads in the sand about the first world and just sort of like telling themselves it's not going to happen and are instead planning for something more like the second world.
00:06:59
Speaker
So I think AI 2027 is a ah bit faster takeoff than I think most people are in the government and in the companies they're planning for.
00:07:09
Speaker
Yeah, it's a less convenient world, I think. But yeah.

AI vs Human Research Efficiency

00:07:14
Speaker
How does the AI research and development multiplier work? Because at various points in the timeline, you have AI research going at, say, 100x the current pace or 250 times the current pace.
00:07:29
Speaker
Could you explain how this could possibly happen? Good question. So... It's a bit, I'm not sure if it's technically quite right to say like 200 times the current pace. the thing when we The multiplier is relative to a counterfactual in which you didn't use AIs for the research.
00:07:48
Speaker
So when we say they get, you know, they we think that the the superhuman coder milestone would be a 5x multiplier, roughly, and the superhuman AI researcher would be like a 25x multiplier.
00:08:00
Speaker
And then it it doesn't stop there. but As you ascend to higher levels of superintelligence, you get up to like, you know, 2000x multiplier and stuff like that. um But what that means is that Imagine, you know take the situation where you have the super intelligence and then imagine somehow that like it was banned from doing AI research and you brought in the regular human corporation to like pick up where it left off and keep doing the research, you know, then things would go 2000 times slower is is the idea, you know?
00:08:34
Speaker
And, and so,
00:08:37
Speaker
It's important to note that this, like like, we're not saying, for example, that like, like, like take, take, take, a take any given trend in some metric, like for example, compute efficiency, like how much training compute it requires, it takes to get a model of GPT-4 level performance over the last like five years, so let's say that's been like cutting by like a third each year or something.
00:09:01
Speaker
And then we get like this 2000X multiplier. We're not saying that it's going to go through two thousand cuttings of a third over the course of one year, such that you end up with like ah one parameter model that's as good as GPT-4 or whatever whatever the math would work out to be. Like, no, obviously there's but there's like diminishing returns. Like this particular metric of compute efficiency would like hit diminishing returns after a few more orders of magnitude and top out.
00:09:28
Speaker
But that doesn't matter for purposes of calculating the the multiplier because the multiplier is so relative to if how long it would take for the

Superintelligence Timeline

00:09:36
Speaker
human scientists to do it, if that makes sense. So in other words, like you would hit the you know you'd you'd you'd tap out the diminishing returns in compute efficiency in like a few weeks instead of in like a few decades, for example.
00:09:48
Speaker
Yeah, I guess that's that's a good way to get an intuitive grasp of what it would mean to to speed up the pace of AI research. So what one question here is, how long do you think it would take with...
00:10:01
Speaker
unassisted ah human AI research to reach superintelligence? 95 years to go from SIAR to s ASI.
00:10:13
Speaker
19 years to go from SIAR to SIAR. So SIAR is... AI system that can do the job of the best human and AI researchers, but faster and cheaply enough that you can run lots of copies. so And then super intelligent AI researcher is vastly better, qualitatively, is like that, but qualitatively better than the best human researchers.
00:10:32
Speaker
um So we were thinking 19 years to go from SAR to SIAR if you were just using ordinary human research. human scientist progress, and then an additional 95 years to go all the way to artificial superintelligence.
00:10:48
Speaker
Obviously, massive uncertainty about all of these numbers. These are these are our our guesses as to sort of what the pace of ordinary human scientific progress would look like.
00:11:00
Speaker
Now, to be fair, part of the reason why we did this, part of the reason why we we set things up this way is that,
00:11:08
Speaker
you know, it's it's it's impossible not to have... some subjective guesses in a model trying to predict what the singularity will look like because we just don't know what the singularity will look like and we don't have enough evidence to sort of pin down exactly what it's going to be like.
00:11:25
Speaker
So we have to we have to like pick some variables and and make some guesses about them. And my thinking is that Our intuitions about how long it takes ordinary science to accomplish things are at least somewhat grounded in you know the last 50 years of human and science and the last 20 years of of artificial intelligence science and stuff.
00:11:48
Speaker
But I definitely you wouldn't put too much weight on them. So what's happening in 2027 is that you have what would otherwise have been decades of AI progress being compressed into several years.

AI's Economic Impact

00:12:01
Speaker
That's right. Like literally the way that the model works is we think like, okay, we query our intuitions for... how long it would take ordinary human scientists working in ordinary human corporations to get from this milestone to that milestone.
00:12:13
Speaker
And it's like, oh, maybe like 20 years, you know, this feels like a substantially more powerful type of ai than this type, but they are getting more powerful fast. I mean, look at the last five years, but still, this is like a big gap. So like maybe it takes 20 years, you know, and then we're like, okay, but then the multiplier shrinks it down a lot.
00:12:29
Speaker
Yeah. How can this happen when... So won't computational power be a bottleneck? Won't it be the case that and until you can get to next level AI, you'll have to build a cluster, you will have to source the chips, that all takes time, it's a physical process in the world?
00:12:47
Speaker
Estimates of how long it would take are supposed to be based on supposing that scaling up stops. so it's So it's like supposing that there's like sort of like an AI winter and they stopped massively improving their amount of GPUs. So they basically keep it the same amount of GPUs they have now, which would slow things down.
00:13:08
Speaker
But we're we're we're sort of imagining that hypothetical. And the reason why we use that hypothetical where the GPUs are slowed down is because that's the way to make it an apples to apples comparison to the case where it's AI speeding everything up because they're not able to speed up the acquisition of new compute very much.
00:13:23
Speaker
So the thing they're speeding up is something like algorithmic progress. How do we know how much how much extra AI progress can be made from speeding up algorithmic progress?
00:13:35
Speaker
I mean, it sounds like you're asking a question about the limits of

AI Development without Increased Compute

00:13:38
Speaker
of algorithms. And I would say there those limits are extremely far away from um where we are now. We're nowhere close to the limits of of what you can do with compute.
00:13:50
Speaker
so ah and So here we could talk about the analogy to ah biology. So think about the smartest humans, so you know, the the the best humans in given field, like John von Neumann or like, you know, their brains are not very big and their brains were not even trained on that much data, you know?
00:14:10
Speaker
And so that proves that it's in principle possible to have a relative, like you can have a relatively small rack of GPUs running a simulation of a John von Neumann level intelligence.
00:14:23
Speaker
And you could train it with a relatively small training run, at least in principle, you know, we have that existence proof. And if only you figured out like what was going on in,
00:14:33
Speaker
and in John's brain, you know, what the hyper parameters were basically that like made him learn so fast and, you know, and so forth. and So so that's that already proves that like you could get to something that is, i best i guess this would be like our superhuman AI researcher milestone.
00:14:54
Speaker
but But then it's like also, like John von Neumann's brain had all these issues. right like It was a ah wet wetware machine that had all these extra physical constraints that it had to work around that you wouldn't have to deal with if you were had a more free design space of an artificial simulated brain.
00:15:09
Speaker
You could, for example, just add like 100 times more parameters, you know, which would be ah big deal. You could also, you know, you you wouldn't have to worry about like being able to heal damage and stuff like that as much. And like there's so there's so many ways in which you could probably amp up the power ah more than starting from the John Van Orme membrane.
00:15:35
Speaker
thing. Also, the the the way that they can have many copies to then learn from each other's experiences. like thats That's a huge deal that John Bonomi can't do, but artificial brains can't do. you know And then, yeah, so so so i'm I'm pretty confident that even without acquiring any more compute,
00:15:55
Speaker
just using the existing GPU fleet, it is at least in principle possible to gradually work your way towards something that qualifies as true superintelligence.
00:16:07
Speaker
it's It's definitely not the case that you are like literally physically need more compute to have a superintelligence than this. One key question here is how much the world changes if we get to something like superhuman coding abilities.
00:16:21
Speaker
how much is that able to affect what happens outside of the data centers is one way to phrase the question. normally Normally innovation happens kind of gradually and you need it to spread throughout society and ah in ah and ah in a broad sense. and And that takes a lot ah lot of time before you can have the kinds of transformations that you're forecasting in AI 2027.
00:16:50
Speaker
why is why is Why is what you're forecasting different from what has happened historically? If you want to get a more better sense of what this looks like, you can read AI 2027, but the summary is,
00:17:03
Speaker
partly due to just the speed of this transition and partly due to the fact that the companies will be focusing on doing this intelligence explosion faster rather than transforming the economy. And so they're going to be doing training runs and stuff that focus on teaching ai research skills instead of on teaching, you know, lawyer skills or therapist skills or whatever, but whatever other skills you'd have in the economy.
00:17:26
Speaker
The result is that the economy just mostly is looking the same as it is today, with a few exceptions, when there's an army of super intelligences that's been created. And then, you know, the army of super intelligences goes out into the economy and transforms it.
00:17:40
Speaker
But it's it's it's it's less of a gradual, it's not a very gradual, it's going to be like getting hit by a truck, so to speak, in terms of the the the scale and rapidity of the transition.
00:17:55
Speaker
An analogy, I think, would be in in some parts of... think Think about the history of of colonialism. And there might have been some parts and in the world where it was quite gradual. And like first, you know they came on the ships and they set up trading ports. And then they gradually did lot technology transfer and maybe some immigration. And then like gradually, centuries later, there's... like this integrated society that contains a bunch of European settlers and also a bunch of natives and also the technology level has risen and it's all integrated.
00:18:35
Speaker
But then there were other parts of colonialism where it was like the Europeans came, they conquered, they brought their own people, they built their own cities, they set up their own factories, like and then they just pushed the natives out of the land, right?
00:18:48
Speaker
And I think it's going to be looking something more like that because even if it's peaceful, even even if it's like completely non-violence, you know you've got the army of super intelligences, consider some random industry that's yeah know like B2B SaaS or like ah yeah you know like machine machine engineering for for manufacturing and 3D printing or but like whatever you're picking industry. And then it's like, all of a sudden there's this army of super intelligences How are you going to compete with that? You're not going to compete with that. like
00:19:19
Speaker
they They will just wipe the floor with you insofar as they devote any attention at all to competing with you. you know And they'll just be limited by how much compute they have to to do all of the stuff.
00:19:33
Speaker
And they're probably not even going to bother directly competing in most industries because that's not even their best available option. The best available option is to just build a completely new self-sustaining economy you know, in special economic zones where they don't have to worry about the red tape and they don't have to worry about all the fiddly little bits of competing in the industry. And they can just like bootstrap to their own robot factories, robot mines, robot laboratories to do more experiments so they can get better robots, et cetera.

Alignment of Superintelligences

00:20:02
Speaker
And of course, they'll still be interacting with the human economy, but it'll be more like, it'll be more like they accept raw materials as input, right? and some manufactured goods so that they can go faster.
00:20:14
Speaker
And in return, they give IOUs of various kinds, you know like promises of of equity or whatever. Maybe they they do some software products that are cheap for them, but utterly transformative for the for the human economy.
00:20:30
Speaker
Maybe they do some hardware stuff if they really need to, but some yeah. Why do you expect this all to end badly for humanity? Again, you can read it at 2027 if you want the answers to this.
00:20:44
Speaker
But after the army of superintelligences is in charge of everything, it becomes really important whether they were actually aligned or whether they were just faking it.
00:20:58
Speaker
And unfortunately, it's quite plausible, and I would say even probable, that they will just be faking it. because our current techniques for understanding and steering and aligning ai systems are quite bad.
00:21:16
Speaker
they don't they don't They don't even currently work on the current AI systems. Current AI systems lie in sheets all the time, even though they're trained not to do that. And if the future paradigm looks anything like the current AI paradigm, we won't actually be able to tell what goals they actually have.
00:21:34
Speaker
will just be sort of looking at their behavior. And unfortunately, no matter how nice their behavior looks, that doesn't distinguish between the hypothesis that they just actually have exactly the goals that we wanted them to have versus they have some other goals and then they are playing along because it's in service of those goals to play along, right?
00:21:55
Speaker
And there's a lot more to say about this topic than this. but But I guess one one other thing I could say is that
00:22:02
Speaker
because we can't sort of like read and write to the goals directly and to the to this sort of inner thoughts of the AIs.
00:22:11
Speaker
we We are stuck on the outside doing this sort of behavioral training where we look at how it behaves and then reinforce it based on that. And it's just so incredibly easy. It's the default outcome to have a training setup that doesn't reinforce exactly the things you want to reinforce, right?
00:22:26
Speaker
Like you're you're trying to to whack it whenever it is dishonest, but since you can't actually tell what it actually thinks, you're, you accidentally whack it sometimes for saying the things that it actually thinks is true, you know, and, and sometimes you reward it for saying things that it didn't think was true.
00:22:43
Speaker
And so you're actually not training it to be honest. you're training it to be dishonest in a certain sort of way. Right. This is like very normal. And just like, this is unfortunately what you're stuck with. If you're, if you're, if you're doing alignments in anything sort of like the current paradigm, um,
00:22:58
Speaker
There are lots of ideas for how to improve on this, to be clear. Like you can go talk to alignment researchers and they'll have all sorts of ideas for for how to fix these problems. But the ideas tend to come at a cost. They tend to come at, you know, the cost of it takes more compute and you get an AI that's like somewhat less capable, for example, if you employ their fancy technique.
00:23:20
Speaker
So even if the techniques work and we don't know that they work yet, we would have to like test it a bunch and not even sure how you'd know if it was working. But
00:23:29
Speaker
but But even if they are techniques that are going to work, like you have to politically convince the relevant leaders to to take that hit and make that trade-off and slow down, basically.

AI Alignment vs Competitive Pressure

00:23:40
Speaker
Yeah. And that's that's, I guess, the difference then between the race scenario and the slowdown scenario in AI in 2027, where basically whether they're whether we have time to implement new alignment techniques or to do this properly where we are we're making sure that the the AIs are acting in our best interests, that is in in and that is in turn determined by whether we are in a race between companies and between countries.
00:24:09
Speaker
mean, I wouldn't say it's determined by it. Like, I think that people can still be ethical even if their incentives push in other directions, but I wouldn't bet on it. Yeah, which which you yourself have proven basically, I think.
00:24:22
Speaker
Perhaps, yeah, but thank you. but But them back to what we were saying. Yeah, like, like, in AI 2027, there's this sort of like one choice points where the story branches.
00:24:34
Speaker
And the choice point is basically do they slow down to implement some costly alignment techniques? Or do they just sort of like, implement the least costly alignment techniques that don't actually sell them that much, but also don't work, but they don't know that they're not working, you know.
00:24:49
Speaker
And so that's how you get the like, the world where the AIs are secretly misaligned, and then the world where the AIs are actually aligned. And you know we get into some technical detail in AI 2027 describing the nature of the choice that they're making and the the the particular alignment strategy that they do.
00:25:06
Speaker
But like the thing I'd want to say here is that actually it's going to be multiple choices like that. just like Basically, there's going to be extremely exciting and stressful year where a whole series of choices like that are given to are made by by the leaders of the AI project.
00:25:27
Speaker
Choices that basically look like we could design our AIs in this way, which would make them you know really smart, really fast, etc.
00:25:37
Speaker
Or we could do this way, which is safer because it's more interpretable or something like that. But they're not as smart, not as fast, you more expensive, etc. There'll be a whole series of choices like that. and Part of my pessimism about how this is all going to go is that I just expect them to basically pretty much consistently make the the the choice to go faster rather than the choice to slow down because of the race dynamics and because of the character of the people ah running this show.
00:26:07
Speaker
I think that that's sort of like the type of choice they've made thus far. and And I expect that to continue, basically.

Transparency in AI Development

00:26:18
Speaker
So I'm not at all saying that alignment is impossible, technically. I think it's it's it's definitely a solvable problem and there's a bunch of good ideas lying around that people are working on for how to solve it. But A2027 and I predict that it won't in fact be solved because the leaders of the relevant companies will be too busy trying to beat each other.
00:26:37
Speaker
How much does transparency matter here? And I mean, transparency from the of the AI companies themselves. So public insight into what they're doing, how they're doing it, how well they're succeeding at their stated goals.
00:26:52
Speaker
I think it matters a lot. It's my go-to recommendation for what governments and companies should be doing now. AI-2027 illustrates a situation where but Basically, all the important decisions are being taken behind closed doors by ceo of a company, possibly in consultation with like the president's advisors who might be looped in, but not in consultation with like the scientific community or with the public or with like other companies or with outside expert groups and nonprofits and so forth.
00:27:25
Speaker
And the reason why they're not in consultation with those groups, well, I guess partly they just don't feel like they have to, but partly also is that they've been keeping things secret by default. You know, they've occasionally published a new product or occasionally made an announcement.
00:27:40
Speaker
and And at one point there was a whistleblower, but but broadly speaking, the default is, of course, we don't tell people about what's going on inside our data centers with all the AIs automating the stuff and whatnot. Like that would leak information to China and to our competitors. And so and we don't want to do that, you know?
00:27:58
Speaker
so So things are sort of secret by default. And what that means is that people on the outside, including the scientific community, are stuck sort of guessing as to what might be going on inside and can't meaningfully contribute on a scientific level to making it safe, right?
00:28:14
Speaker
Like, if if the company had published, like... here's what we're doing, here's how powerful our AIs are getting, we're putting them in charge of all these processes, we're also putting them in charge of these processes, our plan is to have them do more AI research.
00:28:28
Speaker
Also, here's all the eval results of what they're capable of. Also, here is the spec that we're trying to train them to have and the goals and values that they're supposed to have. Also, here's our safety case for, like here's here's a description of our alignment technique and an argument with assumptions and premises for ah for why we think our alignment technique is going to work.
00:28:48
Speaker
And maybe also like here's some stuff on like how we're going to check if it's working. and you know, if they just like published all that stuff, they have that stuff internally, right? This is, you know, documents like this exist internally for for managing the whole thing.
00:29:01
Speaker
if they just published all of that, then outside scientific experts could read it and critique it and could say, oh, like this assumption is false. Or like, I see a way that that this could be disastrous. Like what if the following conjecture is true?
00:29:13
Speaker
Then, you know, your evals would come back positive, even though it would be a false positive you know even though actually things are dangerous. right So there could be all this like and scientific progress being made if you only roped in all these people on the outside. But if instead you just make these sort of vague announcements about how our AIs are getting very powerful and for national security reasons, blah, blah, blah, blah. blah And that's why we're doing this merger with, then like they don't have anything to work with. you know Can't to actually

Power and Secrecy in AI Companies

00:29:41
Speaker
contribute.
00:29:41
Speaker
Is this already happening? So to what extent is is the frontier of AI development already happening in secret? It is already happening. I mean, this this is like AI-22.7 is basically what happens if we don't change things dramatically from from the current status quo, right? Like currently by default, everything is secret in the companies and then they can sometimes choose to publish things, right? They might publish a paper on on some alignment technique that they tried or whatever.
00:30:07
Speaker
but And that's and it's I'm happy that they are publishing some things. That's better than nothing. But I think when you have we have a lot more that we need to do if we want to actually, you know, if you think about like humanity has, you know, what is it? 7 billion people in it and maybe something like 700 people who have expertise in super alignment, you might say.
00:30:32
Speaker
700 people who've like actually spent at least a year working on how do we understand, control, steer, align, AGI level systems and above.
00:30:44
Speaker
um and maybe something like 70 people who are like really good at it as opposed to just competent. And meanwhile, how many people are going to be actually at one of the companies?
00:30:55
Speaker
Like each company has only a tiny fraction of those people, you know? And moreover, it's a sort of biased group, right? it's It's not like a representative sample of all 700 people. It's particularly people who are, you know, at the company, know,
00:31:11
Speaker
there's a more of a group think risk and a sort of incentives risk. And it's, it's very easy to imagine, even if all the people, that you know, there's something like 10 people working on this at the company. It's very easy to imagine them all sort of falling into a sort of group think trap um and, and being biased towards an overly optimistic um conclusion. yeah,
00:31:32
Speaker
So both quantitatively and qualitatively, i think we'd be much more likely to figure out the technical stuff if there was this transparency. And then that's not even taking into account the fact that like governance-wise, things are a lot better if there's transparency. So that was just on a technical level, like how much human brainpower do you have trying to look at the warning signs, look at the evidence and figure out good techniques for keeping things aligned.
00:31:58
Speaker
But also i think that if there was transparency into what was happening, then other groups like Congress would wake up and you know demand more answers and start to negotiate regulations and treaties and things like that, right? So you'd be much more likely to get an actual change in the incentive landscape for the race and to get an actual sort of easing up of the race conditions and a bit of a

Internal vs External Use of AI Models

00:32:23
Speaker
slowdown that enables more time to solve the technical problems if only people knew the stakes and sort of knew what was what was happening inside these projects.
00:32:34
Speaker
And then finally, even if you're not at all worried about the alignment stuff and you think that that's all just going to be trivial, There's the concentration of power stuff. So like it's an incredibly important precedent who controls the AIs and what goals and values does the army of superintelligences have and like whose orders are they listening to?
00:32:55
Speaker
And unfortunately, right now, it's it's we we are on a trajectory for it to not even be... Not only is it the case that like CEO gets to decide... But it's also the case that like it can be secret.
00:33:09
Speaker
you know There can be literal hidden agendas that the AIs have. And this has happened, I think, at least twice that I know of. It's sort of splashy, scandalous examples that you've probably also heard about. One was the Gemini...
00:33:28
Speaker
racially diverse Nazis thing. And the other was Grok being instructed to, I think, not criticize Elon Musk. I forget exactly what it was. Or Donald Trump.
00:33:41
Speaker
So those are both examples of the company putting in a hidden agenda into the AIs, you know, and and so having them pursue this sort of somewhat political agenda and not tell the users about it, you know.
00:33:56
Speaker
And That's all fine and funny when we're just talking about chatbots. But if you have a literal army of super intelligences, it's deadly serious if the CEO can be giving orders to that army and nobody knows about what those orders are, you know?
00:34:14
Speaker
so So also for the constitution of power reasons, it's really important that, for example, companies be required to to publish their spec of like, what are the goals and values that we're putting into the AIs or that we're at least trying to put into the AIs, you know?
00:34:30
Speaker
And like, what's our command structure for who gets to say what to them? And, you know, probably we should log interactions with the AIs as they get smarter so so that if there's a paper trail, if...
00:34:44
Speaker
if someone is basically using the AIs to try to accumulate power over their rivals internally. Would you expect more models to be deployed only internally at the companies?
00:34:58
Speaker
Why are there incentives set up such that only employing deploying internally is is the most valuable option? Yeah, so this this is about takeoff speeds basically. so
00:35:13
Speaker
Consider the two-month takeoff that we talked about. In that world,
00:35:20
Speaker
you you don't really need more investment. And if you did get more investment, it wouldn't even be helpful. Like if you, if you managed to get some, some very rich clueless investors to give you another a hundred billion dollars, you couldn't actually translate that into more compute on very short timescales, you know, ah because it takes time to negotiate these deals and you need to build the new, the new data centers. And then you need to make sure that they're secure and integrated into your network without, without causing vulnerabilities and stuff.
00:35:51
Speaker
So basically, there'd be like almost no need to to try to raise more money and that world. And so you would actually like be incentivized to like stop selling products. you know like like Why have half of your compute serving customers when you could instead be using that compute to do research and to go even faster?
00:36:08
Speaker
you know um By contrast, in the like five-year world, then things are slow enough that... like you you kind of need to keep this this flywheel going of making products, making money, attracting investors so you can get more compute, so you can make more, et cetera.
00:36:26
Speaker
um I think that AI-2027 depicts this more intermediate world where it's sort of unclear how it would go. But I think it could go either way. And I think that AI-2027 we could very well see companies basically devoting more and more of their compute to internal R&D rather than to serving customers.
00:36:49
Speaker
Even though the like profitability of serving customers would be rising, even even though like their new AIs would be really powerful and could make a ton of money on the market. Just because it's even more valuable to use your the compute you have and the models you have to further improve AI.

AI Monitoring and Safety

00:37:09
Speaker
Exactly. even Even if these models are only deployed internally, how do you manage them? How do you control them? How do you how do you oversee a team of of AIs that are thinking much faster than you are that are simply better at coding than you are?
00:37:26
Speaker
I mean, do you mean like, what are they going to try or is it going to succeed? let's let's talk about both, perhaps. Be honest, you can explain what they're what they're likely to try and then why perhaps that won't succeed.
00:37:39
Speaker
So they'll definitely probably have lots well they'll probably have lots of monitoring, right? Where they have like older AIs looking at all the transcripts of actions taken by the newer AIs and trying to flag anything that looks suspicious.
00:37:51
Speaker
um So there'll be this sort of like... AI police state of AIs watching other AIs, et cetera. And humans will be sort of embedded in that at some level. They won't have enough capacity to actually look at everything far from it, but they'll be reading summaries and and investigating particular warning, particular cases and stuff like that.
00:38:10
Speaker
So maybe that's an answer to the the monitoring question or the oversight question. And then like, as far as the alignment techniques, like, I mean, insofar as you see examples of egregiously misaligned behavior or ambiguously misaligned behavior that could have just been an innocent mistake.
00:38:27
Speaker
What do you do with those examples? A very tempting thing that you can do is basically just optimize against them or like train trying trend against it.
00:38:36
Speaker
And the classic issue with that is that you're simultaneously training. You're just training the system not to do that sort of thing, which which could easily result in the system not actually being aligned, but instead just being better at noticing when, you know, when it can get away with stuff and when it can't, right?
00:38:56
Speaker
That's the sort of stuff that I expect to be happening by default. And then there's the question of, will it work? And again, my answer is probably not if that's all you're doing. If you're if you're just sort of, if if you're if you're doing basically only the stuff that doesn't cost you, that doesn't slow you down at all,
00:39:14
Speaker
I don't think that's going to work. It's going to look like it's working because the AIs will be really smart. And so at some point, basically, as the AIs are getting smarter than you and as they're developing sort of like longer term goals and they're able to sort of strategically think about their situation in the world and how they can achieve their goals, then it's going to look like it's working because it's in their interest to make it look like it's working.

AI Communication Complexity

00:39:37
Speaker
You know, like...
00:39:40
Speaker
Yeah, company management wants to go as fast as possible to beat their various rivals. And so they want all the all the checks and warning signs. They want the warning signs to go away and they want all the evals to come out, like all systems go.
00:39:55
Speaker
And guess what? The AIs are going to want the same thing because they also want to go fast and be put in charge of stuff and to be given more power and authority and trust. Or they they will if they have these sort of like longer term goals.
00:40:11
Speaker
because then it's it's useful for achieving your goals if you if you have more power and authority and so forth. And so they'll make sure that all the all the red flags don't appear and that the and so forth.
00:40:24
Speaker
Does the fast pace of AI research and progress in general that you project in AI 2027, does that depend these superhuman ideas on these super human AI coders communicating with each other in ways that we can't understand.
00:40:43
Speaker
Yes, sort of. I mean, it would still be a quite fast pace. if So for example, in AI-2027, they go back um in in the ending where they survive, the humans switch back to a, they make they make one of these costly trade-offs and they go back to a faithful chain of thought architecture and they actually do additional research to strengthen the faithfulness properties of the chain of thought so that they can actually trust that they are reading the actual thoughts of the AIs.
00:41:08
Speaker
um And that comes with a performance hit. It sets them back a few months and it sort of slows things down. But it works. And, you know, compared to what a lot of people are expecting, it's still overall a very fast pace of progress and it still counts as an intelligence explosion.
00:41:28
Speaker
They still get to super intelligence, you know, in a few months. do Do you think we got lucky with chain of thought? is it Is it helpful for us to be able to read what at least seems to be the inner thoughts of of AIs?
00:41:42
Speaker
Oh, yes. It's extremely helpful and it's quite lucky. And we best make as much use of it as we can while it still lasts. And hopefully we can try to coordinate to make it last longer.
00:41:56
Speaker
Unfortunately, we think that... you know, the industry will gradually move on and find more efficient methods and paradigms that are more powerful, but don't have this lovely faithful chain of thought property.
00:42:11
Speaker
And then there'll be economic pressure to to switch to that stuff. And that's what we depict happening in Anna 2027. And this this is just finding some way to communicate between copies of AIs that is more ah information dense then then what then than writing in English, for example.
00:42:30
Speaker
There's a cluster of related things. So one, there's a ah single AI talking to its future self. Currently, there's a sort of ah natural language bottleneck where as it's like autoregressively generating text, it literally cannot communicate with itself after a certain distance in the future, except through the text itself.
00:42:50
Speaker
And so there's this like compression, there's this bottleneck where like, it has like all these incredibly complicated thoughts in its, you know, billions of parameters, activations, et cetera. But then it can't just like send those thoughts into the future in any form. It has to,
00:43:04
Speaker
have those thoughts produce some tokens and then only those tokens go into the future, right? So so so there's that dimension in which they could be potentially a lot more capable if they didn't have an English chain of thought basically, or not one that was capturing their real thoughts, but was instead just like a layer of icing on top of the cake of their real thoughts, you know?
00:43:27
Speaker
But then there's also communicating between, you know, different agents in different parts of the company that are working on different tasks, but they can send messages to each other. Do the messages have to be in English or can they send like high dimensional vector messages, right?
00:43:40
Speaker
And then there's the question of like, well, what if it's not a high dimensional vector? What if it still is text, but it's not sort of like legible English text that actually means what it says it means? Perhaps it's some sort of hyper-optimized text that's basically in some alien language that's more efficient than English that they've learned over the course of their training.
00:44:00
Speaker
And That's sort of like an intermediate case because it's probably easier to interpret than the high dimensional vectors, but it's still like alien language that needs to be interpreted. um And so that raises issues.
00:44:13
Speaker
And then there's another version of of the loss of faithfulness, which is that even if it is in English, If the model is smart, it can you know use euphemisms and sort of be discreet about what it's saying so as to make it the case that humans and monitors looking at it don't notice some of the subtext.
00:44:39
Speaker
And so that's a way in which the chain of thought can be unfaithful, even if it's in English, right?
00:44:47
Speaker
And so more research needs to be done into that to try to like stamp that out and make it not as possible as it currently is.

Automating AI Alignment

00:44:55
Speaker
In the scenario in which we're in a race, the researchers at at OpenBrain try to automate alignment and it fails.
00:45:06
Speaker
why why do you expect that automating alignment would fail? um i mean, the whole problem is we don't trust the AIs. So if you're putting the AIs in charge of doing your alignment research, there's a sort of cart before the horse problem.
00:45:20
Speaker
So that's that's the first thing. It's not an insurmountable problem. So for example, like maybe you do have some AI that's like so dumb that you trust it. And then you can try to like bootstrap to smarter AIs that you can also trust because they were...
00:45:36
Speaker
you know, controlled or aligned by the the previous AI. So there's, there's stuff you can do there, but, but the core problem that you need to have a really good answer to is like, why do we trust these? Like, yeah, well, somehow the trust has to transfer from the humans all the way through to the super intelligence.
00:45:54
Speaker
And then there's it there's, so there's another issue, which is even if you are doing some sort of strategy like that, where you have like the dumber AIs that you actually do trust, there's the question of like, well, maybe they make mistakes, you know?
00:46:07
Speaker
it's It's one thing to train AIs to solve coding problems. ah It's another thing to train AIs to solve alignment. Like how do you get the training signal for that? you know You can like throw all this text at them of like all the stuff that's been written on alignment so far, but it does feel like a a domain that's like less checkable.
00:46:30
Speaker
than normal AI research, for example. So so it's's it's more possible to get a good training environment with a good training signal for getting your AIs to do AI research than it is to get them to do alignment research.
00:46:43
Speaker
So I think there the answer to that is to do some sort of hybrid thing where it's like you have human alignment researchers who are sort of managing and directing the research and making the judgment calls, but then you have AIs that are like rapidly writing all the code, for example.
00:46:59
Speaker
That feels like something that's definitely doable. But but the point is that you you still need those, like in that world, you're bottlenecked on the quality of the human researchers, basically, ah the human alignment researchers.
00:47:11
Speaker
As you mentioned, you're also bottlenecked on the fuzziness of the concept of alignment itself, where it's quite difficult to specify and write down what you even mean and, and yeah, train on that as ah as a reward.
00:47:26
Speaker
Yeah, so so there's definitely, like I'm not hopeless about this. Like I think there's lots of good ideas to try and

Risks of Silent Failure Modes

00:47:33
Speaker
things like that. And I could say a lot more about it, but I just sort of, again and again, like on on like a meta level, a big part of the problem that we face is that we are in a domain where there are silent failure modes.
00:47:46
Speaker
I mean, in in most domains, there are some silent failure modes. Like if you imagine like you're designing a car and you want it to be safe. Most of the ways in which your car can be unsafe will be immediately apparent in even basic testing. you know Like the engine catches on fire when when you try to start it or something like that.
00:48:07
Speaker
But then there are some ways that your car can be unsafe that don't appear in testing. Like, you know, the the metal that you used was like a bit too brittle or something. And so after 10,000 miles, it like starts to wear down and then this component breaks or something like that. That's like harder to discover through through testing.
00:48:27
Speaker
With AI alignment, it's like... There's this whole category of plausible silent failure modes where your AI is you know not actually aligned, but pretending to be.
00:48:39
Speaker
Or it's not even pretending yet, but like at some point in the future, it will realize it's misaligned and then it will pretend, which is even harder to fix because like if you look at its thoughts right now, you would see nothing wrong.
00:48:52
Speaker
you know So there's this whole there's all these like possible silent failure modes.
00:48:59
Speaker
But then unlike with the car, we can't just afford to actually fail sometimes. Like with the car, it's like, okay, you actually killed a bunch of people, but you just recall it and like fix the part and so forth. But with the AIs, if halfway through the intelligence explosion, as your AIs are automating all the research, including all the alignment research, if they decide that they're misaligned and they decide not to tell you about that,
00:49:19
Speaker
you're just screwed. Like you you're not going to recover from that. Do you think the main danger and risk here is inherent to AI as a technology or is because we're developing it in a specific way, specifically under these conditions of s ah extreme competition between companies and between countries?
00:49:41
Speaker
I guess it's a bit of both. Like the technical difficulties would still be there if we were developing it without a race condition. But we'd be much more well positioned to solve the technical difficulties if not for the race.
00:49:54
Speaker
I highly recommend looking into AI 2027. There's a lot of details that we couldn't possibly cover in a podcast like this. Maybe you can tell us a bit about the work that went into creating AI 2027.
00:50:10
Speaker
So it took almost a year. It was me and the AI Futures project team, which is Eli Liflund, Thomas Larson, Romeo Dean, Jonas Vollmer, and then we also got Scott Alexander, the famed internet blogger, to rewrite a bunch of our content to be more engaging and easy to read.
00:50:32
Speaker
And I think that was ah pretty important for the overall success of A2027.

War Games for AI Prediction

00:50:38
Speaker
We did ah a couple early drafts that we just completely scrapped to get ourselves used to the methodology. The methodology of forcing yourself to write a concrete, specific scenario at this level of detail that represents your best guess rather than simply trying to illustrate a possibility, for example.
00:50:58
Speaker
like we we were We weren't just trying to illustrate a possibility. We were trying to like yeah give our actual best guess at each point of like what happens next, what happens next, what happens next. And so we did like one or two versions of this.
00:51:12
Speaker
over 2024 that we were basically just for practice. And then we had our final version that was then undergoing multiple rounds of like feedback from hundreds of hundreds of people or like a hundred people or so.
00:51:27
Speaker
And was being heavily revised and rewritten by Scott and then revised again and so forth. And then our initial draft of this, we just had one scenario and it ended. It was basically what you now see as the race ending.
00:51:40
Speaker
And we wanted to have a more like optimistic good ending but we so we we then made the the branch point and we we tried to think like okay well what what would it look like to solve alignment but still within the same sort of universe as the first ah scenario and then know what would that outcome look like and so that was what the branch point was
00:52:06
Speaker
Yeah, ah we also did a bunch of these war games or tabletop exercises as part of our process for for writing this. So we would get a bunch of friends slash experts in the room, maybe be about 10, and then we would assign roles.
00:52:21
Speaker
You know, you're the leader of this company. You're the president. You're the leader of China.
00:52:28
Speaker
And then we would start in early 2027 and we would roll forward and ask everyone, what you what do you do this month? What do you do next month? And so forth. And we did about, i mean, at this point, we've probably done close to 50 of them um because people found them, ah people will like it quite a lot, actually. So we keep we keep keep getting all this inbound request for us to run these war games with different groups.
00:52:53
Speaker
And it really was a good sort of like writer's block unblocker to have done all of these rollouts with all of these different groups of people. it It helped us to like,
00:53:06
Speaker
have more ideas and also feel like we had a better sense of like what was possible and what wasn't when we were writing the race ending and then the slowdown ending. What do you learn from doing these, these war game exercises?
00:53:21
Speaker
Because I'm and thinking if you're playing the role of the American president or the leader of China or the leader of the top AI company and so on, it's quite difficult to simulate what they're thinking.
00:53:35
Speaker
And if things are moving very fast and the the the leaders have a lot of of power... their kind of the specifics of their psychology can matter a lot.
00:53:48
Speaker
So how do you how do you think about simulating ah decisions made by these people? I mean, it's certainly like a very low res, untrustworthy simulation, but the question is, is it better than nothing?
00:54:02
Speaker
And I think in moderation with grains of salt, yes, it is, is my guess. The thing that I usually say is,
00:54:12
Speaker
I mean, yeah, the future is really hard to predict. Who knows what's going to happen? The default strategy is to not think about it very much at all. And ah that's not so good because it feels like it would be extremely important to have a better sense of what might happen and and what you might do and so forth.
00:54:32
Speaker
So then like the next default strategy after that is to think about it, but in a sort of unstructured way of like, you know, you're at the cafeteria and you're chatting with your buddies about like what a gem might look like and so forth.
00:54:42
Speaker
And that's cool too. But this is a bit of a more like structured and organized strategy
00:54:48
Speaker
way of doing this basically where you have 10 people and instead of just a free form conversation where like people can get into arguments about like x and y and z you say okay like let's let's start with this scenario and then we'll do the next two months and then we'll the next two months and so forth and you can still think of it as a collaborative conversation where everyone's talking about what they think happens but there's a sort of division of responsibilities like you talk about what you think this actor would do you talk about what this actor would do and so forth And then so far as people disagree, then you argue about it. And then we make a decision based on what the aggregate of the group thinks. We'd take a vote, right?
00:55:22
Speaker
And so then the result of the war game can be thought of as like an aggregate of what the people in this room think would happen, having thought about it for a couple hours and sort of talked it over step by step in the structured way, you know?
00:55:40
Speaker
Then there's the question of like, well, what do these people in the room know? Probably not that much. Maybe it's not not super representative what will actually happen. And it's like, that's true. But as this is a start, you know? Especially if you get people in the room who are actually...
00:55:55
Speaker
relevantly similar to the people who would be making decisions. Like you get people who work at the AI companies to play AI companies. You get people who do technical alignment work to play the alignment team you or and or the AIs.
00:56:08
Speaker
um And you get people who in government to play the government. You had this... essay in 2021 that was quite successful in predicting five years out in the future.
00:56:21
Speaker
what's the What's the lesson you took from that? Is is it that you need that we need to, when we make forecasts, when we make predictions, we need to trust kind of trend extrapolations more than we would intuitively think we should?
00:56:35
Speaker
I mean, maybe. For me, I already sort of trusted the trend extrapolations to what I would think is an appropriate amount, which is why I made those predictions and why they were correct. I guess if if someone is wildly surprised by how I managed to be so correct, then you should probably update more towards my methodology being a good methodology.
00:56:55
Speaker
But I wasn't that surprised. And so ah so it was less of an update for me.

AI's Role in Social Media

00:57:03
Speaker
I think one of the biggest things I got wrong with what 2026 looks like, which was my earlier blog post, was the
00:57:14
Speaker
aggressive So I predicted ah sort of change in the social media landscape that seems to be sort of happening, but at a slower pace than I think I predicted.
00:57:25
Speaker
So and in 2021, when I wrote this, ah my reasoning was basically language models are going to be amazing at censorship. They're going to increase the quality of censorship, allowing censors to like make finer grain distinctions amongst contents with less false positives and false negatives, while also reducing the cost of censorship dramatically.
00:57:53
Speaker
And so they'll also be good to propaganda, but that's less important. And so my prediction was that the internet would start to sort of balkanize into, know, I sort of cynically predicted that that the the leaders of tech companies and the leaders of governments would would not coordinate to resist the temptation to use censorship and propaganda technology, but instead would like quickly slip into it and end up
00:58:21
Speaker
aggressively using language models as part of their social media recommendation algorithms and stuff like that um to sort of like put their thumb on the scales and and advocate for the political ideologies that they that they like.
00:58:33
Speaker
um And I predicted therefore that the internet would sort of balkanize into like a Western left wing internet and then their Western right wing internet, and then maybe like a Chinese internet and maybe like a couple other clusters as well.
00:58:47
Speaker
And that's, you know, people unhappy with the censorship and propaganda on some social media platform would then move to other platforms so that have, that cater more to their own tastes with the the type of propaganda and censorship that they like.
00:59:00
Speaker
And this has in fact happened. But I would say probably not as fast as I thought it would or something like that. Like right now we have Truth Social and we have Blue Sky and there's a lot of people sort of like self sorting into those.
00:59:16
Speaker
And there's like Elon purchasing Twitter and so forth and changing it from a sort of blue thing to more of a red thing. I guess it's hard to say. Like I i didn't have like a ah ah a good quantitative metric for like measuring the extent to which this is happening.
00:59:30
Speaker
But it does feel to me like it hasn't gone quite as far as I thought.
00:59:35
Speaker
And the lesson that I take from that is one about being careful about
00:59:42
Speaker
about the syllogism of like, this is possible, people are incentivized to do it, therefore people will do it. I think it's like, that's sort of true, but it like might take longer than you expect for people to actually do it.
00:59:54
Speaker
So you you described this technique of iterative forecasting that you used and in in both your 2021 essay and in twenty twenty seven So when you do iterative forecasting, you lay out what have what happens, say, in in in one year or or in one month, and then you base your next forecast on what you've written down.
01:00:16
Speaker
what What do you do if if it if the forecast begins sounding crazy? how do you How do you take a sanity check of what's happening? Because this seems to me like something that could easily veer off into fantasy land.
01:00:33
Speaker
In some sense, you're constantly doing sanity checks. And that's why it took us so long to write this, is that we would like you know write a few months and then we'd be like, wait a minute, that doesn't make sense. like They wouldn't do this.
01:00:44
Speaker
Let me try to think of some examples. Yeah, that would be super useful, actually, because... because i think i think I think in one of the earlier drafts, we had something like, and then the US does a bunch of cyber attacks against the Chinese AIs to to destroy their project or something. And then we were like, actually, that doesn't make so much sense because probably they would have really strengthened their security by now. like This is already late 2027. I don't think offense-defense balance works that way or something like that. So then we like undo, try again.
01:01:12
Speaker
I think there was another example of...
01:01:19
Speaker
I think in one earlier draft we had in the race ending where it was misaligned DAIs on both sides, we had them basically just like make a deal with each other to screw over the humans.
01:01:31
Speaker
um But then we were like, well, that doesn't make sense. How would enforce such a deal? How do they trust that the other side is doing it? And so we had to sort of like rethink things. And then ultimately we ended up with something similar, but we had like a lot more to say about exactly what that deal would look like and how they would enforce it.
01:01:45
Speaker
I'm not sure if that's an answer your question, but yeah. So like we were constantly doing this sort of like, does this make sense

Forecasting AI's Future

01:01:51
Speaker
sanity check? And we were constantly getting feedback from external experts and stuff, criticizing various parts of the story as unrealistic and then trying to incorporate that feedback as best as we could into changing things.
01:02:00
Speaker
One weakness of this method is that like, if you get all the way to the end and then you realize there's, you made a mistake at at the beginning, it's rough because you have to sort of throw away that entire, you basically have to just, because because then you like wasted a lot of work if you built on this, this false premise or something.
01:02:17
Speaker
And that's just like the price you pay, I guess, for doing this methodology is that you, you run a risk of, of some wasted effort. Although then again, I think that it's not like other methodologies don't have problems either.
01:02:31
Speaker
I think of this as as a complement to other methodologies rather than a substitute. what What you're doing is extrapolating the the compounding effects of of AI progress.
01:02:43
Speaker
And there wondering what happens to the so the reliability of the forecast or to the uncertainty of the forecast over time when you do that. Oh, it massively, and like every every additional, you know, chunk of time that you forecast, you're layering on additional choices.
01:03:02
Speaker
And so the probability of the overall thing can only go down as you make it longer. Like every ah is every every dish additional claim you add to the conjunction lowers the probability of the whole thing.
01:03:13
Speaker
So honestly, it's quite amazing that that the first thing I did, what 2026 looks like, was anywhere as close to correct as it was because there was so much like so many like sort of conjunctive claims being added.
01:03:28
Speaker
And I'll be quite pleased with myself if it had 2027 is as close to correct as that's what 2026 looks like was because... Yeah, it's sort of being more ambitious. Yeah, you're forecasting out ah to 2036, I think, in both scenarios, where and you and you're also forecasting much, much grander changes to to humanity then than you did and in what 2026 looks like.
01:03:55
Speaker
Right, then the second part is much more important. Like the
01:03:59
Speaker
i think I think the relevant complexity thing is, the relevant thing is like how much like radical change are you, are you how many how many stages of AI capability are you sort of going through? Yeah.
01:04:10
Speaker
So AI 2027 is something that could be falsified quite soon. We could get to 2030 and...
01:04:20
Speaker
the world looks very different from from what you've you've projected here. What would be some clear signs that we are not on the path that you forecast in AI 2027? The best one would be benchmark trends slowing down.
01:04:37
Speaker
So most benchmarks are useless, but some benchmarks measure what I consider to be the really important skills. but Why do you say that most benchmarks are useless?
01:04:50
Speaker
Because they don't measure something that's that important or that predictive of future stuff. so so And and they're because they're already getting saturated. Like a lot of benchmarks are like multiple choice questions about biology or something.
01:05:02
Speaker
And it's like, well, it used to be useful to to have benchmarks like that. But now it's like, we know that the AIs are already really good at all that stuff, ah really ah all that world knowledge stuff. And if they're not, it's quite easy to make them good at it.
01:05:14
Speaker
so So like multiple choice questions are basically like a solid problem almost. so what I think is the new frontier is... agency, long horizon agency, operating autonomously for long periods in pursuit of goals.
01:05:29
Speaker
And so there are benchmarks like meters RE bench and their horizon length benchmark that are measuring that sort of thing, agentic coding in particular.
01:05:41
Speaker
there's you know There's agency in all sorts of different domains. I think that Pokemon is a fun sort of like benchmark for a long horizon agency as long as the companies don't train on Pokemon at all, because then it's an example of generalizing to something completely different from what they were trained on.
01:06:01
Speaker
But, Anyhow, so there are some benchmarks. I would also mention maybe like SweetBenchVerified and stuff like that, but they're not as good. And I think OpenAI has a paper replication benchmark. There's a couple others, but basically agentic coding benchmarks, I think, are where it's at.
01:06:14
Speaker
And there's been rapid progress on them in the last six months. And if that rapid progress continues, then I think we're headed towards something like a 2027 world. But if that progress starts to level off or slow down, then my timelines will lengthen dramatically.
01:06:32
Speaker
And contrary-wise, if the if that progress accelerates even more, then timelines will shorten. Do you think we're seeing an acceleration of of the trend lines with reasoning models?
01:06:45
Speaker
Yes, but it's the acceleration that like I predicted. And so it's we're sort of on trend. you like when So the meters stuff came out after we had basically finished AI 2027, but we went ahead and plotted the graph and sort of like fit it to what AI 2027 was claiming. And it was like more or less on trend. And then we like went ahead and just like made it actually part of the prediction. So we...
01:07:06
Speaker
we we we assumed 15% reduction in difficulty of each doubling of horizon length. And then that made and a reasonably nice line that we extrapolated.
01:07:17
Speaker
And we like included that in the research page as part of our timelines forecast. And so so there's like a canonical... So ai twenty three seven makes a canonical prediction about performance on that benchmark over the next couple of years.
01:07:33
Speaker
And so will be very easy to tell if it's going faster or slower, I think. There are other things to say besides this, but that would be the main thing. What are those other reasons? So there's and the the next thing to say is that it's possible that we will get the benchmarks crushed, but then still not get AGI.
01:07:53
Speaker
So we could get to the point where, yeah, in 2027, we've got coding agents that can basically autonomously operate for like months at a time doing difficult coding tasks
01:08:06
Speaker
and that are quite good at that. And yet, nevertheless, we haven't reached the superhuman coder milestone.
01:08:15
Speaker
That's a bit harder to reason about. And unfortunately, it's going to be harder to find evidence about whether or not that's happening. You basically have to reason about the gaps between that benchmark saturation milestone and the actual superhuman coder milestone. Like, what is it that but could be going on there? Well, maybe it's something about like AI is getting really good at checkable tasks, but still being bad at tasks that are more fuzzy.
01:08:40
Speaker
i i think that's possible, although I wouldn't put on it.
01:08:46
Speaker
I think that by the time you're really good at month-long checkable tasks, you probably learned a bunch of fuzzy tasks along the way, basically. But yeah, we'll see. We'll see.
01:08:57
Speaker
As a last question here, what is next for the AI Futures project? We are doing a bunch of different things and then we'll see you what sticks. So a lot of people have been asking us for like recommendations.
01:09:12
Speaker
you know AI 2027 is not a recommendation. It's a forecast. We really hope it does not come to pass. But so now some of us are working on a ah new branch that will be the what we actually think should happen branch, which will be exciting.
01:09:31
Speaker
We're also going to be running more tabletop exercises because there's been a lot of interest in those. And so we'll keep running them and see if that grows into its own thing.
01:09:41
Speaker
I'll continue doing the forecasting. So we're actually about to update our timelines model. Good news. My timelines have been lengthening slightly. So I now feel like 2028, maybe even 2029 is better than 2027 in terms of a guess as to when all this stuff is going to start happening.
01:10:01
Speaker
So I'm going to like do more thinking about that and publish more stuff on that.
01:10:07
Speaker
Yeah, miscellaneous other stuff. Oh yeah, responding to all, we have a huge backlog of people who sent us you know comments and criticism and alternative scenarios and stuff like that. So I'm going to work through all of that and respond to the people, ah give out some prizes.
01:10:20
Speaker
Sounds great. Daniel, thanks for coming on the podcast. Yeah, thank you for having me.