Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
How AI Could Help Overthrow Governments (with Tom Davidson) image

How AI Could Help Overthrow Governments (with Tom Davidson)

Future of Life Institute Podcast
Avatar
0 Playsin 21 hours

On this episode, Tom Davidson joins me to discuss the emerging threat of AI-enabled coups, where advanced artificial intelligence could empower covert actors to seize power. We explore scenarios including secret loyalties within companies, rapid military automation, and how AI-driven democratic backsliding could differ significantly from historical precedents. Tom also outlines key mitigation strategies, risk indicators, and opportunities for individuals to help prevent these threats.  

Learn more about Tom's work here: https://www.forethought.org  

Timestamps:  

00:00:00  Preview: why preventing AI-enabled coups matters 

00:01:24  What do we mean by an “AI-enabled coup”? 

00:01:59  Capabilities AIs would need (persuasion, strategy, productivity) 

00:02:36  Cyber-offense and the road to robotized militaries 

00:05:32  Step-by-step example of an AI-enabled military coup 

00:08:35  How AI-enabled coups would differ from historical coups 

00:09:24  Democratic backsliding (Venezuela, Hungary, U.S. parallels) 

00:12:38  Singular loyalties, secret loyalties, exclusive access 

00:14:01  Secret-loyalty scenario: CEO with hidden control 

00:18:10  From sleeper agents to sophisticated covert AIs 

00:22:22  Exclusive-access threat: one project races ahead 

00:29:03  Could one country outgrow the rest of the world? 

00:40:00  Could a single company dominate global GDP? 

00:47:01  Autocracies vs democracies 

00:54:43  Mitigations for singular and secret loyalties 

01:06:25  Guardrails, monitoring, and controlled-use APIs 

01:12:38  Using AI itself to preserve checks-and-balances 

01:24:53  Risk indicators to watch for AI-enabled coups 

01:33:05  Tom’s risk estimates for the next 5 and 30 years 

01:46:50  How you can help – research, policy, and careers

Recommended
Transcript

Introduction to AI Governance Risks

00:00:00
Speaker
It's in everyone's interest to prevent a coup. Currently, no one small group has complete control. If everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction, then we can all kind of keep each other in check.
00:00:17
Speaker
So I do think in principle, the problem is solvable. You should always have at least a classifier on top of the system, which is you know looking for harmful activities and then kind of shutting down the interaction if something harmful is detected. We could program those AIs to maintain a balance of power.
00:00:34
Speaker
So rather than handing off to AIs that just follow the CEO's commands or AIs that follow president's commands, we hand off to AIs that follow the law, follow the company rules, report any suspicious activity to various powerful human stakeholders.
00:00:48
Speaker
And then by the time things are going really fast, we've already kind of got this whole layer of AI that is is maintaining balance of power.

AI-Enabled Coups: Possibilities and Threats

00:00:57
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Ducker, and I'm here with Tom Davidson, who's a senior research fellow at Forethought.
00:01:06
Speaker
Tom, welcome to the podcast. It's pleasure to be here, Gus. We're going to talk about AI coups and the possibility of future AI systems basically taking over governments or states.
00:01:20
Speaker
Which features would future AI systems need to have in order for them to accomplish this? What should we be looking out for? Great question. one One thing I'll flag up front is that what what I've been focused on recently is not the kind of traditional idea that AIs themselves will kind of rise up against humanity and and take over the government, but that like a few, know, very powerful individuals will use AI to seize a legitimate power for themselves.
00:01:47
Speaker
So the kind of phrase phrase that we're we're're we're often using is AI enabled coups, but where they're kind of the main instigators are actually good people. In terms of capabilities, um Yeah, I think there's there's there's a few different domains which, in my analysis, are particularly important for seizing political power.
00:02:09
Speaker
So there's the kind of skills that politicians and kind of business leaders use today. So things like persuasion, business strategy, political strategy, just kind of pure kind of productivity at a wide variety of tasks.
00:02:27
Speaker
and
00:02:31
Speaker
Then there's kind of more kind of almost hard power skills. So in particular, cyber offense, which is already somewhat useful in military warfare and has been becoming more useful.
00:02:43
Speaker
And then, you know, I expect that as as AI increasingly automates different parts of
00:02:51
Speaker
the military and as AI is embedded in more and more important high stakes processes that will raise the importance of cyber offenses now, you know, whereas you can't hack a human mind as as we hand off more important tasks to digital systems, they will be able to be hacked much more easily. So i expect cyber to come more important for hard power.
00:03:10
Speaker
And then, you know, the the ultimate kind of most scary capability that, you know, I think ultimately will drive a lot of risk is when we get to the point that AI systems and robots are able to fully replace human military personnel. That's fully replace human soldiers on the ground, boots on the ground, fully replace the kind of commanders and strategists.
00:03:30
Speaker
And, you know, that that that might seem like a long way off today, and but actually, you know, even just the last few years, we've seen a lot more importance of kind of AI controlled drones in warfare and i expect that trend to continue. And what we're already seeing is that, you know, as soon as the technology is there kind of reliably automate military capabilities, there's, you know, geopolitical competition drives that adoption.
00:03:55
Speaker
And so, you know, I think it's going to be surprisingly soon that we that we do get AIs controlling kind of surprising amounts of, of of you know yeah know, real hard military power.

AI's Impact on Military and Political Power

00:04:06
Speaker
And then one kind of wrapper for all of all all of these things is the the automation of AI research itself. So today there's a few hundred, um few thousand you know top top human experts that drive forward AI algorithmic progress.
00:04:25
Speaker
And my expectation is that you know there's a good chance the next few years that AI systems are able to match you even the top human experts and their capabilities. And that would mean we go from and maybe a thousand top researchers to know millions of automated AI researchers.
00:04:41
Speaker
And that could mean that you know all of these different capabilities, all of these different domains that I've been talking about, they all progress much more quickly than than we might have expected, um just by naively extrapolating the recent pace of progress.
00:04:52
Speaker
And you know in my view and in the view of many, the recent pace of progress is already quite alarming in that you know Five years ago, we just had really very basic language models that could string together a few sentences, a few paragraphs, and then went off topic. And now already we're getting kind of ah very impressive reasoning systems that are doing tough math problems and and helping a lot with difficult coding tasks.
00:05:14
Speaker
So, you know, bring that all together. I think there's a lot of kind of soft skills, a lot of hard power skills that are relevant here. But like probably the most important thing to be watching is how good AI is at AI research itself, as that could kind of bring, but make more happen um quite suddenly.
00:05:28
Speaker
yeah Could you describe in more concrete terms what an AI-enabled military coup would look like? Some example to kind of make this concrete for us. Yeah, absolutely.
00:05:41
Speaker
So you can you can and draw an analogy to historical coups where there's you know often a minority of the military members launches a coup and then kind of presents it as a fair complete and you know is able to prevent ah so chaos or discord or threaten individuals to prevent anyone from kind of actively opposing them. And then in the absence of active opposition, it just seems like, well, they've done it. and this is This is the new state of affairs.
00:06:06
Speaker
So you know that that that's a good starting point. Then you know the AI-enabled part is is where we deviate. So historically, you know needed at least you know a decently sized contingent of humans to go along with the coup.
00:06:19
Speaker
um And you needed to persuade you know quite senior military officials not to oppose it. I think that will change as we automate more and more of the military. And so the most simple way that this happens is just that the head of state, you know it could be the president of the United States, just says, yeah, we've got the technology now to make a robot army.
00:06:36
Speaker
And I want the ah army to be loyal to me. I mean, I'm the commander in chief. Obviously, that's how it should be. then going to follow my instructions. No need to worry about you know whether I'm going to order them to anything illegal. like we can put in maybe some kind of nominal legal safeguards. but Let's not worry too much about that. The main thing is that they're loyal to me.
00:06:53
Speaker
and then To my knowledge, you know that would be highly controversial or would definitely be against the principles of the constitution. But it's unclear to me that it would be literally illegal. We just haven't had this kind of technology. We haven't legislated for it.
00:07:06
Speaker
and The constitution is not robust to this kind of really powerful military technology. And so it's not surprising if, you know at best, this is just a very kind of unclear legal territory, but you've got the head of state pushing really hard for that robot army to follow their instructions. And the head of state in the United States has a lot of political power.
00:07:30
Speaker
And so, you know, the most simple way is that he just pushes hard for that. He gets what he wants. Maybe he's using, you know, kind of emergencies at home or, you know, tense geopolitical tensions to kind of push it through and say that it's necessary.
00:07:44
Speaker
Maybe he's firing, you know, senior military officials that disagree. Maybe he's already got Congress to be very, very kind of fervently supporting and and and loyal to him and not you know being that kind of careful and open-minded when assessing like but the the the opposition that people will be making as this has happened.
00:08:01
Speaker
So that that's the kind of the first you know really just plain and simple way that that that that we could get this. Robot army is built. It's made loyal to the head of state. Head state just instructs it, stage a coup, and it does it.
00:08:13
Speaker
you know robots around the White House and brutally suppress human protesters. And then you know even if people go on strike and stop working, then you know you can have then ai systems and robots replace people in the economy. So humans have kind of really lost their bargaining power that they normally have that would kind of strongly disincentivize military coups in most countries.
00:08:34
Speaker
Yeah, this is really a change from the ah from the normal coups of history where you would have to have buy-in from at least some segment of the population that are regular humans and you would need to kind of continually support that buy-in and make alliances and and uphold those alliances.
00:08:56
Speaker
But this is this has changed now that you're talking about technology and AIs and robots that can but basically be made loyal to a company or or a head of state in in a way that's more durable.
00:09:12
Speaker
Do you think we have other kind of historical precedents for thinking about how the dynamics of what it's like to attend a coup, how those dynamics play out?
00:09:26
Speaker
Yeah, just one quick thing on that last point. I want to emphasize how there is a bit of a phase shift at the point in which AI can fully replace other humans, you know, in the government, in the military. When AI is augmenting other humans, you don't you don't have this effect because a leader must still rely on those other humans to kind of work with the AIs to do the work.
00:09:46
Speaker
But there really is this phase shift when AIs and robots can fully replace the humans because then, yeah, a leader doesn't need to rely on anyone else. So I think that that's an important one to recognize.
00:09:57
Speaker
In terms of historical precedence, the other big one I'd point to is recent trends in political backsliding, often called democratic backsliding.

AI's Role in Democratic Backsliding

00:10:06
Speaker
So the most kind of end-to-end clear and cut case is Venezuela, where you had in in in the 70s a fairly healthy democracy that been there for decades, and then increasing backsliding, increasing polarization, kind of like what we're seeing in the US recently.
00:10:24
Speaker
And then, you know, an increasing, you know, explicit commitment by by kind of the leader that, you know, he wanted to remove checks and balances on his power and that the will of the people, you know, was being obstructed by various democratic processes and institutions.
00:10:44
Speaker
And then, you know, over the coming decades, you know, it has transformed into an authoritarian state. And, you you know, many commentators have pointed out these trends in the US recently over the part past 10 years.
00:10:58
Speaker
and And it even goes back before the past 10 years, to be honest, in terms of the broad kind of political climate. um And then there's, you know, kind of the example of Hungary, where where where again, kind of elected elected leaders are just kind of but removing the checks and balances in their power kind of buying off the media or kind of threatening media outlets to to be more pro-government not providing contracts or kind of litigating them if they criticize the government all these kind of standard tools where it's now like a lot harder to
00:11:30
Speaker
to point at one thing that's clearly egregious, but when you add up the kind of hundreds of little cuts, hundreds little paper cuts to democracy that are being systematically administered, you're you're seeing a real kind of loss of democratic control and concentration of power.
00:11:47
Speaker
And so again, you know, AI could you know exacerbate and enable that dynamic. And again, the most straightforward way is you're just replacing human powerful institutions. You're replacing the humans there with AIs that are very, very, very loyal and obedient to the head of state.
00:12:02
Speaker
So you know think about Doge. And you know they tried to fire people. There was pushback. you know The state needs to function. Imagine if you could just have AI systems that could fully replace all of those employees and could be made fully loyal to the president. How much easier would it be they kind of push through push push through some of those layoffs or even just create entirely new government bodies that essentially just take on the tasks that were previously done by our bodies and that those old bodies kind of rot away or kind of slows slowly kind of prevent them from making decisions.
00:12:35
Speaker
So, and and then the other big way is if if the kind of head of state is able to get access to much more powerful AI capabilities than their political opponents, maybe because state is very involved in AI development,
00:12:48
Speaker
then that's another way that they could get a head up, you know making more persuasive propaganda and more compelling political strategy to like you know embed their power more.

AI Loyalties and Power Structures

00:13:00
Speaker
You segment man the ways in which AI can enable coups into three categories where you can talk about singular loyalties, secret loyalties, and exclusive access.
00:13:12
Speaker
Perhaps we can run through those and talk about where those would play out. starting with singular loyalties, for example? Yeah, so singular loyalties is what we've we've just been talking about. That is deploying AI systems that are kind of overtly, obviously very loyal to just you know existing um powerful people.
00:13:31
Speaker
So in know in particular, I am thinking about the head of state here as the main the the main threat. And so i think I think we basically already covered it. The two the two main angles in my mind are i' deploying loyal AI's powerful government institutions and in the military.
00:13:47
Speaker
Secret loyalties is a very different threat model. um It's much more, as you would expect, secretive. So yeah the the main threat model I have in mind to make it concrete is that an AI company CEO has you know automated all of ai research. So they could fire their staff at that point because the AIs can just do the work.
00:14:09
Speaker
Instead, maybe they put their staff onto some product work, but you know the core work of driving AI progress ever further forward, making you increasingly intelligent AI is pretty much just done by AI systems.
00:14:22
Speaker
at that point, they realized, well, they're in a bit of precarious position that they' they're controlling this hugely powerful technology. Their power is pretty much unconstrained. not literally under the constraint, but there's very few checks and balances currently currently on on these CEOs.
00:14:36
Speaker
And they might anticipate that the government's going to realize how big a deal this is. They're going to kind of lose their influence. Maybe they worry the government will do something unethical with the AI technology. Maybe they worry that they'll be used for for a war or something.
00:14:48
Speaker
And there's all kinds of justifications they could come up with thinking, I don't want someone else taking in control of this really powerful technology that I currently control. And obviously I'll use it for good. And so they might you know speak to some maybe some AI advisors about this and say, well, what ah What they do here, it seems I'm in a little bit of precarious position.
00:15:06
Speaker
And then a solution that they might think of or an very smart AI system that's helping them might think of is, well, why don't you just create a backdoor in the the super smart systems that we'll be building next month, where like if if the government tries to order those systems to do something that that that you definitely approve ah disapprove of, then the AI systems will just refuse to do it those yeah Those goes robots, those powerful robots that are being deployed the army will just refuse to follow orders from the government if you know they're they're really awful and unethical and things you wouldn't like.
00:15:37
Speaker
And so then ah you can have all the future AIs now that are developed by by this organization are going to have this backdoor or what I call a secret loyalty, where though they they appear to follow the law, that they appear to kind of you you know up uphold democracy. In fact, they are secretly still kind of loyal to the CEO.
00:15:55
Speaker
And that could lead to a coup and and again the most direct route is a military coup where if ultimately this this company's AIs are building military robots, military drones, military communication systems, then because they themselves because the AIs building all this military stuff are secretly loyal, they can kind of pass their secret loyalties on to to the military infrastructure, or they can just insert vulnerabilities in the military infrastructure that only they know about.
00:16:26
Speaker
And in your here, we are we are you know I am primarily thinking about you know very superhumanly intelligent AIs, are able to kind of design complex machinery that a human can't kind of check and check if there's a backdoor.
00:16:42
Speaker
And that then means that we've now got a military infrastructure that is also ultimately can be controlled by the the CEO. And so at this point, if someone tries to remove the CEO, if something's really bad, having the CEO doesn't like, they could just be like, okay, I'm now in charge.
00:16:57
Speaker
I've instructed my... this military infrastructure to do the coup. Again, the robots around the White House. This time it's it's it's the company CEO that that that they're loyal to. And then you know the rest of the story can play out just as in our previous discussion.
00:17:13
Speaker
How do these secret loyalties relate to sleeper agents? And maybe you could ah you could explain what a sleeper agent is. Yes, great. So sleeper agent is the kind of standard term used in and in the technical literature.
00:17:28
Speaker
And today's sleeper agents are not very sophisticated. So an example of a sleeper agent is that if this if the AI sees that it's the kind of the any most yeah most of the time the AI will just write kind of but reliable code as instructed. But if the AI sees that the year is 2026, then the AI will write code with vulnerabilities. So there's this kind of trigger that that triggers this malicious behavior.
00:17:56
Speaker
And you know normally if you were interacting with the AI, you really wouldn't be able to tell because it would just always be you know acting in as you'd intended. um But yeah because it's just one trigger, it's very hard to find, but then you do get this malicious behavior.
00:18:11
Speaker
Now, I'm not worried about sleeper agents today because to to cause real harm, we would need a sleeper agent to be very sophisticated. it you know It could never accidentally trigger or very, very rarely accidentally trigger. And it would have to be able to do you know very intense, complicated tasks like build Building a military robot and ensuring that that robot was actually had a backdoor, you know, that's very, very, you know, far beyond what AI today can do. So, know, I think that sleeper agents provide ah basic proof of concept that it's possible for a malicious actor to kind of gain illicit control over a system and then have that system be deployed in the rest of the economy.
00:18:49
Speaker
um potentially without people noticing, um but they're not yet scary. And then the kind of secret loyalties is just what I call the kind of scary situation where you now have a very sophisticated AI system that doesn't just have any old, it's not any old sleeper agent, it's a sleeper agent, which is specifically loyal to one person trying to help them seize power.
00:19:08
Speaker
Yeah. So what we're imagining here could be, for example, US-based AI company integrated into the yeah into the US military. The CEO of the company,
00:19:19
Speaker
wishes to to ultimately be in control of what happens. And so he engineers or he instructs perhaps AIs or or human engineers to create a sleeper agent in these systems that can be activated as at his command, such that the U.S. military officials think they're in control of the systems,
00:19:41
Speaker
the systems behave in in ways that they approve of throughout perhaps a quite a long period until there's some way until the sleeper agent is activated in some way.
00:19:54
Speaker
And perhaps that will be more sophisticated than changing the date or giving it some phrase. But you you can imagine advanced versions of of sleeper agents that could actually behave in this way.
00:20:04
Speaker
Do you think that's realistic? Do you think sleeper agents can become that advanced? can become that advantage Yes, I do. I think, you know, we can, one analogy is to human spies, you know, human spies, know,
00:20:18
Speaker
they They're basically most of the time they're they they're kind of doing their assigned job as expected. And it's not like one simple trigger phrase or one year makes the spy do something covert or malicious. They are just fully aware of their surroundings. They're kind of constantly processing their inputs and they choose strategically exactly what the kind of ideal time is to steal some information, send an elicit email.
00:20:44
Speaker
And so, you know, in my mind, that's by far the most scary sleeper agent, not not not one that's triggered by a password, but one that is kind of holistically making ah decision about how when to act out. I mean, the password ones are actually quite fragile because, you know, if you were the military and you're deploying your AI system and you're worried there's a password, what you can do is you can scramble all the inputs so you can kind of ah paraphrase all of the instructions it gets.
00:21:12
Speaker
And that might just mean that the password, if ever ever someone tried to insert it, would be kind of re-scrambled and would just never actually come up. um So I'm not actually worried about the kind of simple password triggered, steeper agents.
00:21:26
Speaker
But again, they're are basic proof of concept. And I think that as AIs become as smart and smarter than humans, that there's reason there's a strong reason to think that it'll be possible to build much more sophisticated ones. one One thing i will briefly say is that you know people often talk about misaligned AI scheming.
00:21:44
Speaker
And you know this is just the same idea where, you know in fact, the argument for secret loyalties being worrying is much stronger, where you misalignment, there is evidence misalignment. We don't yet have you know strong evidence of you know really sophisticated scheming emerging accidentally.
00:22:01
Speaker
But if if humans and a human team of engineers or an AI team of engineers were specifically trying to build a system that that was kind of covertly thinking about when when when when when to kind of act out, then it's much more plausible that that that it could happen.
00:22:19
Speaker
And then you have exclusive access, which is different from singular loyalties or or secret loyalties. Why is that ah its own category? Yeah. So in my mind, the kind of singular or overt loyalties and the secret loyalties, both of those third models go through deploying AI systems in really important parts of the economy.
00:22:43
Speaker
So, you know, in particular government and military, what I focused on. But you know for those threat models, don you actually need the rest of society to choose to deploy those AI systems and hand off a lot of power to them.
00:22:55
Speaker
um And so I kind of have this third threat model of exclusive access to think about another possibility, which is that maybe even without people choosing to deploy AI systems and give them a lot of power, even without that, maybe systems can be powerful enough to help a small group seize power.
00:23:12
Speaker
So the the prototypical situation I'm imagining here is you know there's a kind of one ai project, which is you know somewhat ahead of the others, and maybe it it goes through intelligence explosion, whereas which by which I mean kind of AI can automate AI research and then AI quickly becomes super, super intelligent compared to humans.
00:23:35
Speaker
And then, you know, that project maybe has a few senior kind of um executives or senior political figures that are kind of very, very very involved and have a lot of control.
00:23:46
Speaker
And they might just be able to, you know, siphon off, you know, 1% of the project's compute and say, okay, We're now running these these these right super intelligent AI systems and saying, how how can we best seize power?
00:24:01
Speaker
um And then there's kind of millions of them. They're doing you know every single day, and they're they're doing a month of research. Every single week, they're doing a year's worth of research into, okay, how can we you know how can we gain this political system? How can we hack into these systems?
00:24:16
Speaker
How can we you know ensure that we end up controlling the military robots when they are deployed? by hook or by crook. um And I think that that that that fatt model could start to rep apply earlier in the game.
00:24:29
Speaker
That could start apply before anyone even realizes there's a risk because you know this is just essentially all happening on a server somewhere. But actually it's possible that the game can be won and lost by the massive advantage that small group get by by being able to kind of co-opt this huge, huge intellectual force.
00:24:47
Speaker
um And so I think it's worth tracking that threat vector independently. But it does you know it does definitely interact with these other ah with the singular loyalties and the secret loyalties, because and one strategy that your kind of army of super intelligent AIs may come up with is, why don't you like use the fact that you're head of state to like push for the robots to be loyal to you, and like here's how you could buy off the opposition and sow confusion.
00:25:08
Speaker
Another strategy might be, why don't I just help you put back doors and all this military equipment so that then you could use it to stage a coup. But there might also be other ways. you know maybe Maybe it's possible to very quickly create and entirely new...
00:25:22
Speaker
weapons which you can use to overpower the military without anyone knowing or maybe it's possible to you know gain power and and in other ways.
00:25:34
Speaker
Yeah, yeah. I mean, one thing that would make this kind of future hypothetical situation different from today is that today it seems that there are leading AI companies, but over time, capabilities kind of emerge in and and second tier companies and in open source.
00:25:51
Speaker
And so there's not that much of a gap between the leading companies and what is broadly available and perhaps what is publicly available. That's something that would change in in the scenarios you imagine. So perhaps explain why the gap in capabilities between the the one leading project and all of the others is so important.

The Capabilities Gap and Power Concentration

00:26:14
Speaker
A few factors there. So in terms of why it's important, it's just what you've said. mean, a lot of these threat models kind of exacerbated if there's one one one one group of people that has access to much more powerful AI than other groups.
00:26:32
Speaker
if you know If open source is pretty much on par with with with the cutting edge, then everyone will have access to similarly powerful for AI. I will say that even if open source is is kind of on par, that doesn't mean we're fine because we could still choose to deploy systems in the military and the government and still choose to make them loyal to the head of state.
00:26:51
Speaker
when we're when we're choosing to hand off control to AIs, it doesn't matter if there's 100 AI companies. and We're only handing off control to some AIs and maybe you know the government will ensure that they do have particular loyalties.
00:27:02
Speaker
So I will say you know this risk doesn't go away if if we have you know lots of different AI companies and open source close to each other. But it does it does become lower because the kind of exclusive access point where one group has access to super intelligence and the other group doesn't have access to much, that goes away.
00:27:20
Speaker
And I think it's a lot harder to pull off secret loyalties if everyone's kind of roughly equal to each other because it becomes a bit more confusing why your systems in particular end up controlling so much of of the military or what was so widely deployed. And it becomes confusing how no one else was able to realize you were doing the secret loyalties when they were kind of equally able to to to do it or equally technologically sophisticated and potentially detect your secret loyalties.
00:27:43
Speaker
So I do think it makes a big difference. In terms of why I think it's plausible that that there's a much bigger gap between the lead project and other projects, that there's a few different factors. The most plain and simple one is that the cost of AI development is going up very quickly. We're kind of spending about three times as much every year on developing AI.
00:28:03
Speaker
um And that's just going to get too expensive for many players. yeah if and when we're talking about trillion dollar development projects, which I do expect, then very few can afford that.
00:28:14
Speaker
And also there's just only so many computer chips in the world. If you want to have, that that you know, that currently the number of kind of computer chips produced each year is less than a trillion dollars worth. So if if we get to a world where,
00:28:27
Speaker
you know the the way to go to the next level of AI is to spend a trillion dollars, then you know only one company will be able to do that. and you maybe Maybe we stopped with earlier. Maybe we stopped with, you know there's two companies both doing half a trillion.
00:28:39
Speaker
But you know we would be really kind of kneecapping the level of progress if if we stopped long long long before that. and there would just be strong incentives for companies to merge or one company to outbid others in order to like you know really raise the amount of money that's being spent on AI development.
00:28:54
Speaker
you know This is all assuming that where we can build really powerful AI and it is e economically profitable, which but for me isn't all in the background of the scenario. um So that that that's the first kind of straightforward reason why i think we'll we'll see a kind of a smaller number of projects and we'll see kind of big gaps because when when you're spending 100 times less on development, then that's that's going to be a bigger gap.
00:29:16
Speaker
That's the first reason. The other reason I've already talked about the the idea idea of an intelligence explosion um when we automate our research, even if companies are fairly close, maybe that one is a few months behind, the company that's a few months ahead automates our research.
00:29:29
Speaker
In that next three months, they make massive progress. So then there's actually like a really big capabilities gap, even though it's still just a three month lead. So there's the question whether they can use that kind of temporary speed to kind of yeah get a more permanent advantage.
00:29:43
Speaker
And then the last big reason is just kind of government-led centralization. It's already been talk of Manhattan Project and CERN for ai I know i think there's that there' there's reasons to

Government-Led AI Projects and Risks

00:29:54
Speaker
do those projects. They can help with safety in some significant ways, but they would you know exacerbate this risk.
00:30:00
Speaker
Because, yeah, if you pool all the US or the United States computing resources into one big project, this can be way ahead of any other project. And you pool all of its talent, and all of its data, then, yeah, you'll you'll see a really big gap. And that that would definitely um make make it a lot easier for a small group to do an AI-enabled coup.
00:30:20
Speaker
Yeah, you're you're kind of putting a big ah big prize out there for someone who who's interested or who's considering a coup, right? If you're concentrating all of the power, all of the resources, all of the talent into one project, then, well, that's where you've got to go if if you are if you're coup planner.
00:30:40
Speaker
Yeah. And just to be, I don't particularly expect that anyone is planning any coups. In fact, I'd be very surprised. I'd more think it's, you know, you want to be powerful. You want to be a big deal. You want to be changing the world.
00:30:52
Speaker
So yeah, obviously you want to lean the main, lead the main project. And then you don't want anyone else to come in and mess you up, mess it up. So obviously you want to protect the fact you're leading that project. don't want anyone else to, you know, misuse AI. I think it's kind of step by step.
00:31:03
Speaker
you you just kind of head down that road of more and more power. then, yeah, you know, often in history, that that road does end in just consolidating power, you know, to a complete extent. And I mean, it it it can be... so what we're imagining here are times in which AI is moving in at incredible

Global Power Shifts and AI Dominance

00:31:20
Speaker
speed, right? the the The pace of progress is insane. There's a bunch of confusing information.
00:31:25
Speaker
People are acting under... radical uncertainty. And perhaps in those situations, it's tempting to think that you are the person that can that can lead this project. And perhaps you're doing this out of supposedly kind of altruistic reasons, you're thinking that I need to do this in order to prevent other people that would perform worse than me at at ah at this project.
00:31:49
Speaker
And so you're kind of slowly convincing yourself that it it will be the right thing for you to do to to take over in perhaps a forceful way.
00:32:00
Speaker
Yeah, you know, i don't think Xi Jinping or Putin think that they are the bad guys. you know, I think that they have, you know, probably of sophisticated justifications for what they're doing Perhaps here is a good point to talk about the possibility of one ah state or company outgrowing the entire world.
00:32:23
Speaker
this this This relates to the to the problem of exclusive access, because if you have ah one company or or a one government outgrow and outgrowing the entire world, then you have that company or government with exclusive access so advanced AI. So how could this happen?
00:32:41
Speaker
How likely do you think it is that growth could be so incredibly fast that one company would outgrow all of the others? Yeah, so there's two possibilities we could focus on.
00:32:54
Speaker
The one I think is is pretty plausible is that one country could outgrow all of the other countries in the world. So what that would mean is, you know, today the 25% of world GDP.
00:33:06
Speaker
But this would be a scenario where it is leading on AI. it is This is already the case, but you know it maintains its lead, it maintains its control over compute. um And then when it develops really powerful AI, it prevents other nations from doing the same.
00:33:25
Speaker
you know It's already beginning with x export controls in China.
00:33:30
Speaker
And that kind of you know embeds its lead. And then it yeah uses the AI to develop powerful for new technologies. And you know it's in control of that of of those technologies. It uses AI to kind of automate cognitive labor throughout the US and maybe worldwide.
00:33:48
Speaker
And you know countries that don't use its AI systems will will be really hard hit economically. And so we're kind of massively centralizing power in in the US. And if the US is able to
00:34:01
Speaker
maintain exclusive control over over you know smarter than human AI, then it seems pretty plausible to me. you know Very likely that the US would be able to rise to you know a strong majority, ah more than 90% of world GDP.
00:34:20
Speaker
um And there's you know there's a few different you know dynamics that are driving that. First is that labor currently, human labor receives you know about half of world GDP.
00:34:31
Speaker
you know, just half of GDP is paid on wages. AI will ultimately, and robots will ultimately be better than humans at kind of all economic tasks. And so if, if, if, if the U S controls all the AI companies that are replacing human labor, then, you know, that, that half of that, that kind of 50% of GDP, which is currently going to human workers will ultimately be reallocated to paying to the, whoever controls and owns those, those AI systems, IE U S companies.
00:34:59
Speaker
Um, yeah know There's a wrinkle there because some some some of some of that is is physical labor and the US doesn't currently have a lead there. yeah Physical robots, in fact, China is quite far ahead.
00:35:11
Speaker
But in terms of at least the cognitive aspects of of of our jobs, and so we're talking you know significant fraction of GDP that would just now be reallocated to US companies that control AI. So that that already gets them from 25% to above 50%. Then we've got this further dynamic, which
00:35:28
Speaker
is the dynamic of super exponential growth. So this relates to kind of previous work I've done on how AI might affect the dynamics of economic growth, but you know, kind of very potted summary is that it's often quoted that over the last 150 years, economic growth has been roughly exponential.
00:35:51
Speaker
um And what that means is that if two countries are growing exponentially, and one country starts off you know maybe twice as big as the other country, then at a later time, still one country is twice as big as the other country. So let's say you know the US economy is 10 times as big as the UK economy.
00:36:08
Speaker
Then if they're both growing exponentially at the same pace, then 10 years later, again, the US will still be 10 times as big as the UK. So that's exponential growth. That's what we've seen over the last 150 years.
00:36:19
Speaker
If you look back further in history, we see super exponential growth. That means that the growth rate itself gets faster over time. So, you know, an example would be that 100,000 years ago, you know, the economy wasn't really growing at all. If you think of what's growing, it was maybe, you know, doubling every 10,000 years or something in size, know, very extremely slow economic growth, know.
00:36:44
Speaker
Then going from about 10,000 years ago, it seems more like ballpark. there's ah There's a doubling of the economy every thousand years, still incredibly slow economic growth.
00:36:55
Speaker
You zoom back in and kind of 1400, you can begin to detect, you know okay, more like you know every 300 years or so, the economy is doubling. And then in recent times, we've seen that the economy is doubling every 30 years.
00:37:08
Speaker
So essentially, that you know the growth rate is getting faster, the doubling times are getting shorter. That's super exponential growth. And there's there's various reasons, economic reasons, theoretical reasons, empirical reasons to think that AI...
00:37:20
Speaker
and robotics when it can replace humans entirely will go back to that super exponential regime that that has been play throughout history. What that means is that the growth is getting faster and faster over time. And the reason the reason i'm i'm saying all this, the reason this is irrelevant is that, yeah go back to the example of the US s and the UK. The US is currently 10 times bigger than the UK.
00:37:40
Speaker
If the US is on a super exponential growth trajectory, its growth is getting faster and faster over time. And that means that even if the UK is on that same super exponential growth trajectory, as they both go super exponentially, the US will pull further and further ahead of the UK because maybe the US is doubling in 10 years because it's already bigger, it's already further along the curve, whereas the UK is still doubling only every 20 years. And so that means that the US is now, rather than just 10 times bigger the UK, the US is now gonna be 20 times, 30 times bigger size the UK.
00:38:17
Speaker
So if the US is able to to kind of, if if there is super exponential growth and the US is able to kind of be be bigger to begin with and therefore be further progressed on that super exponential growth trajectory, then...
00:38:36
Speaker
then that's another way that they could just you know continue to increase the their size of the economic pie and ultimately you know and come to completely dominate world GDP. So you know just just to sum up everything I've said, today, US is 25% of world GDP.
00:38:53
Speaker
If it controls and develops ai that that could easily boost it above 50%. I'd be very surprised if it didn't. And then from that point, you know it's already bigger than the rest of the world combined.
00:39:05
Speaker
If it's able to then go on the super exponential growth path, then it will go faster and faster over time and pull further and further ahead of the rest of the world that may be able to grow super exponentially if they can also develop ai But you know, we'll we'll still be falling further and further behind because because the nature of super-experimental growth.
00:39:24
Speaker
Yeah, this ah this actually seems quite plausible to me and not very sci-fi. the The thing that seems quite sci-fi is the notion that perhaps even one company could kur grow at such a speed that it would outgrow the rest of the world.
00:39:39
Speaker
How likely is that? Yeah, great question. I think it's a lot harder, but it is surprisingly plausible. So that first part of the argument I gave about how 50% of, you know, the world GDP is paid to human workers.
00:39:56
Speaker
yeah know, if that went to AI, that would be a big big chunk. It is possible that one company could get a monopoly on on kind of really advanced AI. So i we already discussed some of the dynamics there.
00:40:10
Speaker
where again, the simplest one is just a combination of an intelligence explosion, giving ah a company a big advantage, and then they're kind of buying up all the computer chips that the world is able to produce and outbidding everyone. if If a company does that, and already you know is you know seems to be outbidding other companies on on compute, although Google also also also has a lot of
00:40:34
Speaker
If a company is able to do that, they could end up just one company in control of literally all of the world's cognitive labor, know, because human cognitive labor will somewhat be kind of dwarfed by AI cognitive labor.
00:40:46
Speaker
So ah that point, that one company could be getting, you know, all of their or all of GDP, which is currently paid to kind of cognitive labor, which a large part of the economy. As I said, you know, maybe as high as 50%, but, you know, certainly as high as 30% world GDP.
00:41:01
Speaker
of while gdp If all that would then you know seemingly be going to this one company that that controls the world's supply of cognitive labor. So though I think that would take time, and obviously it's going to take a long time to um automate all the different parts of the economy, um there is just a basic dynamic by which one company can now be controlling you know double-digit percentages of of world GDP. And there's obviously questions, would a government allow that? Would they step in?
00:41:32
Speaker
And that's where we get into the, you know, the these dynamics of like, well, this company has all these super intelligent AIs on its side. Maybe it's able to lobby. Maybe it's able to do political capture to avoid the state setting in. Maybe it's able to be like, look, we're providing like economic abundance for everyone.
00:41:48
Speaker
you step in, like, you know, that, that, that, that might not happen. You know, we're, we're underpinning your nation's, you know, economic and geopolitical strength. And if you try and, you know, remove, um you know, step in and nationalize, then, you know, that's not going to happen. We're going to move to another country, you know, so you can, you can imagine, and maybe, maybe they convinced the the head of state to kind of support, support them. And there's some kind of alliance there, but you know, it, it's not completely obvious that, that the company would be shut down.
00:42:19
Speaker
it would It would have certain types of serious bargaining power. So if a company was able to maintain this position as kind of sole provider cognitive to labor, it would be able to get ah significant fraction of world GDP.
00:42:32
Speaker
And then it's then possible that from there, it could it could bootstrap. And this is where it gets a bit harder, but that the tactic it would it would need to pursue is it already controls most of the cognitive

AI Development in Different Political Systems

00:42:44
Speaker
activity.
00:42:44
Speaker
labor, pretty much all of it. The thing it doesn't control is all the kind of physical machinery and all the raw materials that are also needed to create economic output. But it can pursue a tactic of kind of hoarding its cognitive labor so that no one else can ever have access to that.
00:43:01
Speaker
and then kind of selling it at kind of really kind of mono monopolistic rents to rest the world because there's there's no one that can match it. it's you know It's offering everyone by far the best deal they can get, but just ah skimming off 90% of the value add from from companies using it its AI systems.
00:43:18
Speaker
So if it's able to do that, then it can kind of it can kind of reap and by far the the majority of the benefits of trade, and then maybe you can kind of by increasingly buy up physical machinery and raw materials from the rest of the world, design its own robots, buy its own land.
00:43:35
Speaker
you know, imagine like a kind of big special economic zone in Texas or something where, you know, this company is kind of unconstrained by kind of bureaucracy. And then it's also now, you know, got a big arm somewhere in Siberia and in Canada. It's kind of creating these big special economic zones by doing deals with specific governments.
00:43:57
Speaker
And, I do think it's a bit of a stretch that this all goes ahead without you know various other powerful political and economic actors pushing back.
00:44:10
Speaker
But like the kind of basic economic growth dynamics are surprisingly compatible with with with a company you know ultimately coming to control most of the cognitive labor and most of the kind of physical infrastructure that its AIs have designed using all the all the parts that it's bought from the rest of the economy.
00:44:28
Speaker
Yeah. And do you think this is a risk factor for AI enabled tools then just because you're concentrating all of the power and all of the resources into either perhaps one country or one company even?
00:44:40
Speaker
Yes, I definitely do. the The more realistic path is that a company kind of starts down this path of outgoing the world gets kind of huge economic power.
00:44:54
Speaker
Increasing controls the country's industrial base, its kind of physical infrastructure, manufacturing capabilities. And then from there, It's in a much stronger position to seize political control because it's got massive economic leverage and then it can also increasingly gain military leverage because as it, you know, as it increasingly controls the country's broad industry and manufacturing that will feed in to military power.
00:45:22
Speaker
So, you know, some of the you know possibilities I discussed earlier where, you know, you could potentially have your AIs be secretly loyal, that ultimately design the military systems. or you could just instruct your AI systems to start making, you know, a military that is is not legally sanctioned, but, you know, because the government doesn't have much to threaten you with, it kind of, you get away with it. i mean, it gets a little bit tough. You probably need to do that in secret. Otherwise the existing military could um could, could prevent it. But yes, I do think that,
00:45:59
Speaker
you know Being very rich helps with lobbying, it helps with all kinds of ways of seeking power, and then controlling yeah controlling a lot of industry can can potentially give you military power. You mentioned these special economic zones.
00:46:12
Speaker
that That's one way in which companies could kind of bargain with states in order to ah have favorable regulation policies. and to and should be able to carry out their projects without intervention, basically.
00:46:27
Speaker
Another way for them would be to collaborate with non-democracies that are perhaps controlled by a single, a small group or or perhaps even a single person.
00:46:39
Speaker
And in that way, it seems like perhaps it's easier to get something done in a non-democracy. And and that is a way to to grow fast. and And so perhaps there are incentives for companies to place more resources in non-democracies.
00:46:56
Speaker
What do you think about the the prospect of non-democracies outcompeting democracies when it comes to AI? I think it's a really great question and it's tricky because I think I agree. like Democracies have lots of checks and balances.
00:47:09
Speaker
They have a lot of bureaucracy, a lot of red tape, and that will disincentivize AI companies from investing. And then additionally, there are people really trying to seek illegitimate power, that will be easier to do in non-democracies because they're less politically robust.
00:47:26
Speaker
So that there there are these various forces pushing towards you know, this new supercharged economic technology being disproportionately deployed in non-democracies. um And I think that is scary.
00:47:40
Speaker
my my My own view is that probably we should, democracies should kind of do everything they can to to to avoid that situation, make it much easier for and robotics companies to to set up shop in in democracies, remove the red tape,
00:48:09
Speaker
trying to use export controls like are already happening to prevent Technology is being deployed in non-democratic countries, and that that goes beyond China. There's obviously lots of countries that are not allied with China, but are also non-democratic here.
00:48:24
Speaker
And the us you the US is in a strong position because it does have the stranglehold on AI technology at the moment. So I do think it can be done.
00:48:35
Speaker
But yeah, in my view, like it it will be really important to to to kind of work very hard to to find a kind of a non-restrictive regulatory regime.
00:48:46
Speaker
And it will also be very important to really try and pursue innovative innovations within the democratic process itself, where yeah democracy is great in many ways. It really distributes power and it and it has been very good at ensuring good outcomes for its citizens, but it's very slow.
00:49:05
Speaker
and often you know kind of nonsensical because you have competing interests that are kind of stepping on each other's toes and the resultant legislation is just a garbled mess. And so AI can potentially solve those problems.
00:49:18
Speaker
You can have AIs negotiating and thinking much more quickly on behalf of the the the kind of human stakeholders. You can have AIs nailing out agreements that aren't a garbled mess, but they're like really gave everyone what they truly wanted of the legislation.
00:49:33
Speaker
And you can still do all of that really quickly so that you're not falling far behind the autocracies that have just got one person immediately saying what to do. And I think if we did that, we you know democracies could outcompete autocracies because you know the big thing that often screws over autocracies is that one person is flawed, often makes big mistakes, people are afraid to kind of stand up to them.
00:49:54
Speaker
Yeah, that that would be more of my assumption. that would I would assume here that perhaps democracies with market-based economies have an advantage just because you can you can do a kind of bottom-up knowledge discovery. You can try different things out. You can see what works. You can have competition between companies and so on. but And perhaps in in non-democracies, so well, I mean, therere you can you can have one person or small group stake out of direction for what the country should do. But if that direction is what is wrong, it's probably difficult to change course.
00:50:24
Speaker
Yes, I i think could be right. I should have i should have you know given more weight to that that that advantage of of kind of democracies in terms of the free market but being you know in many ways much much smarter. But in terms autocracies that are good at harnessing free market dynamics, my worry would be that the AI helps them more than it helps democracies because
00:50:52
Speaker
AI will... well be able to kind of replace, ah currently you know, one person just can't think that hard, can't really figure out a good plan. But if, if, if that one all powerful leader has access to ah loads of AI systems that can kind of think things through and investigate lots of different angles, then, know, that if they're, if they're following its advice and they could get advice, which, you know, lacks the flaws that, that today, today systems had, and, and they could potentially move much faster.
00:51:21
Speaker
um But I think you're right that kind of economic liberalism is is still going to be important even after we get powerful AI systems and that could give democracy an advantage.
00:51:34
Speaker
This is but a bit of a tangent perhaps, but I'm thinking whether, so if you have a leader of a country that has a lot of power, perhaps complete power over that country, and that leader is equipped with AI advisors, advising him and and and kind of laying out kind of the landscape of options for him to choose from.
00:51:55
Speaker
Wouldn't his decision making still be, in a sense, bottlenecked by the fact that he's a human, by the fact that he has these flaws that we all have, the biases that we all have? So even with fantastic advice, I think it's it's quite plausible that that he would still make the same mistakes that we see leaders make today.
00:52:15
Speaker
I think that's true. i think it's also true in democracies, unfortunately, that if there's 10 negotiators and they each kind of still have biases and still refuse to listen to the wise advice they're getting from their AIs, that could still gum up the system.
00:52:29
Speaker
And it yeah, it does depend on how much humans come to trust and defer to their AI advisors. There's a possible future where the AIs are just always nailing it. They're always explaining their reasoning really clearly. And we are just like and increasingly convinced and happy to trust their judgment.
00:52:46
Speaker
And if is aligned, I think that would great future because I do think humans have all these very big limitations and biases, which if we can solve the alignment problem, AIs don't need to have. But there's also another future where humans just you know want to be the ones making the decisions, have these kind of pathetic,
00:53:03
Speaker
kind of motivations that they're still kind of influencing their decisions and that, yeah, that the kind of, that that continues to to to to to limit the quality of decision-making.
00:53:17
Speaker
Seeing things from above, right? From kind of like 10,000

Mitigating AI Coup Risks

00:53:20
Speaker
feet. thousand feet How should we think about mitigating the risk of coups here? is it Is it about removing people that would use AI to commit coups? Is it about kind of finding those people in the militaries, in the governments, in the companies perhaps?
00:53:39
Speaker
Or do we have ways to reduce the returns to to seizing so se in power? Yeah, i mean, from real 10,000 feet up,
00:53:52
Speaker
The way would characterize it is create a common understanding of the risks, build coalitions around preventing them,
00:54:06
Speaker
And then the existing balance of power can self-propagate forward. It's in everyone's interest to prevent a coup. Currently, no one small group has complete control or close to it.
00:54:18
Speaker
And so if if everyone can be aware of these risks and aware of the steps towards them and kind of collectively ensuring that no one is going in that direction, then we can all kind of keep each other in check.
00:54:31
Speaker
So I do think, you know in principle, the problem is solvable. And it doesn't require, you know, solving the risk of misalignment does require solving some tough technical problems. This doesn't in the same way.
00:54:43
Speaker
Yeah, you have a bunch of recommendations for mitigating the risks, both for ai development ah AI developers and governments. And perhaps, you know, we don't have to run through all of them, but you can talk about the most important ones for AI developers.
00:54:59
Speaker
I might characterize this, I might kind of talk about it by going back to those three threat models we discussed earlier. So the first one was singular loyalties or overtly loyal AI systems, where again, the you know the main risk there is AI deployed by the head of state and the military and the government that's loyal to the head of state.
00:55:17
Speaker
And so the main countermeasure that currently appeals to me is for us to yeah figure out rules of the road for these deployments.
00:55:29
Speaker
You know, obvious things like AI should follow the law. is deployed by the government shouldn't advance particular people's partisan interests, but should only do like, you know, official state functions.
00:55:42
Speaker
AI is in the military shouldn't be law to one person. You know, they should different groups of robots should be controlled by different people. And, you know, head head of the chain of command can still be head of the chain of command via instructing other people that instruct those robots, but they shouldn't all go directly to of the chain of command because that centralizes military power too much.
00:56:00
Speaker
So, you know, fleshing out basic rules of the road of that kind and then building consensus around them because, know, you know, companies might want to say to governments, yeah, we don't want you to deploy our systems, you know, if, if, if if they're willing to break the law, but if the government, if the government will have a lot of bargaining power, the executive in the United States can, can, you know,
00:56:25
Speaker
it's hard for companies to stand up to them. So what we want to do is you establish these rules of the road and then get brought by them from Congress, from the judiciary, from you know other branches of the military, from many parts of the executive.
00:56:38
Speaker
So then it's very then hard for say the president say, yes, let's like make this robot army loyal to me. And everyone's like, obviously not. We've all like agreed. That makes no sense. you know And then the president doesn't even bother trying because it's just clear that it would be a no-go that you know their mind doesn't even go there.
00:56:54
Speaker
In some sense, this is about kind of implementing the procedures and the transparency rules that we know from democracies today into how we use AI, both in governments and and in companies, I think.
00:57:08
Speaker
Exactly, and yeah. do Do you worry here that when, it so the government is seeing is looking at these companies from the outside, and they don't have full insight into what's going on. do So there are kind of protections for private companies that that mean that they can they can do things in secret without the government knowing, at least ah as as things stand now.
00:57:32
Speaker
Is that something that would evade these mitigations you're you're thinking of? So, I mean, for this first bucket, the singular loyalties bucket, it's it's mostly the the kind of heads of state that I'd be worried about.
00:57:46
Speaker
So, you know, it actually is probably good for the government or at least for the, you know, the head of state themselves not to have full... insight into literally everything the company is doing because that would give them too much power.
00:58:00
Speaker
But you know actually having different parts of the government having insight and into what the lab's doing, i think is very good. I'm a big big fan of transparency. um And you know we do have a good set of ah government checks and balances from different government bodies that we can deploy to kind of keep the lab in check using these other bodies, but also not allow like you know the executive branch and the and the president to get ah excessively powerful.
00:58:22
Speaker
So that that's the mitigations, the kind of singular overt loyalties. In terms of secret loyalties,
00:58:31
Speaker
the key mitigation is is what I'm increasingly calling system integrity. That is, you know, using established cybersecurity practices and machine learning security practices, i.e. kind of preventing sleep agents and backdoors in machine learning models.
00:58:48
Speaker
using all of that to ensure that your development process for AIs secure and robust, and that no malicious actor, be they and employee in the post-training team at a lab, or be they the CEO of the lab that is either malicious or is being threatened by the Chinese government to tamper with model development, that no person or no small group is able to significantly tamper with the behavior of AI models And no group is able to get illegitimate access to AIs that would help them seize power. So that's that's this idea of system integrity, which is yeah know essentially a technical project, which does just draw on existing practices, but is not yet implemented in any of the top labs.
00:59:39
Speaker
um I will quickly shout out for non-lab experts. for kind of people listening that aren't working at labs, I think there's a lot of really good technical research that can be done on kind of and investigating the conditions under which you can insert a steeper agent without a defense team knowing.
00:59:58
Speaker
And there's just loads of research that can be done in terms of the different settings there for attackers and defenders, which could then inform what what para parameters we need to be in place to achieve system integrity.
01:00:09
Speaker
know, if it turns out that it's, you know, It's very hard to make a so sleeper agent accept in the final stage of training. That's really useful to know because then we can focus our efforts within labs at yeah know that final stage, just just as a hypothetical ex example.
01:00:26
Speaker
So thats that's the kind of key mitigation in my mind for the secret loyalties, and then I'll quickly do for exclusive access. That one seems more difficult. I don't know, just just from from from me reading and preparing for this interview, that that one seems like a difficult one to handle where this is, this is this is in some sense, ah a deep trend in history and in in the kind of history of modern economics that you do see faster growth rates and you do see concentration into bigger and bigger
01:00:59
Speaker
economies, both and in countries and in in companies. So are you are you in some sense pushing against underlying trends if you're trying to mitigate exclusive access to advanced AI from one actor?
01:01:14
Speaker
I think you can you can do this in other ways. So you can have The law required that AI share, that AI labs share their powerful capabilities with kind of other organizations to act as a check and balance.
01:01:31
Speaker
So that, you know, labs should share their R&D capability, AI R&D capabilities with evals organizations. Here, you're thinking about giving insight into what they're capable of, not not actually sharing those capabilities.
01:01:45
Speaker
That would be too big of an ask, I think. I mean, I do mean the API access. So... you know if if a lot of the work in developing and evaluating systems is now done by AIs, then we want an and an evaluation organization like Apollo or Meta to also be uplifted.
01:02:04
Speaker
And so we want them to have access to a really powerful AI that can similarly you stress test how dangerous the frontier systems are. If they're only using human workers then that's going to be a big disadvantage.
01:02:15
Speaker
So no, I do want API access to to powerful capabilities for for other actors. You know, for example, cybersecurity teams in the government and in the military should have access to the lab's best cyber capabilities.
01:02:28
Speaker
And again, that that that should be a requirement by law.
01:02:33
Speaker
So you know generally, like even if there's a natural tendency towards centralization of power in one one organization, you can still require that that organization share its systems with the checks and balances.
01:02:46
Speaker
That's one thing. And the other thing is kind of preventing anyone at this organization from misusing the powerful AI systems.
01:02:59
Speaker
So... but the The biggest thing on my mind here is that today we still have helpful only AI systems where you can kind of get access to the system and then it will just do whatever you want. No holds barred.
01:03:10
Speaker
I don't think there should be any AI systems like that. I think you should always have you know at least a classifier on top of the system, which is you know looking for harmful activities and then kind of shutting down the interaction.
01:03:24
Speaker
if something harmful is detected. And if you have a special reason to use cyber offense you know for your job, or you have a special reason to do potentially dangerous biology research, you would have that classifier allow certain types of activity.
01:03:37
Speaker
But you should never have anyone accessing a system where you know anything is allowed. you know no one No one has legitimate reason to access an AI that will literally do anything. So what I want to aim for is a world where, yes, if there's a specific reason why you need to use ah a dangerous capability, absolutely, you can you can use that system.
01:03:55
Speaker
But that system will just do that one dangerous domain. It won't kind of do anything you wanted. Because that you know that that's ah that's a very scary situation where there's 100 reasons why the CEO could ask for access to a helpful only system. you know Maybe the guardrails are annoying.
01:04:12
Speaker
Maybe he wants to kind of you know did do do something which which the model is reluctant to do. But today, when you asked to remove some guardrails, you're removing all of the guardrails and now there's no holds barred. So and instead, you know we should we should be flexibly adjusting what guardrails are there you know by the use case and just you know never have a have a situation where where where there's no guardrails.
01:04:36
Speaker
i think that I think that could go a long way towards helping if if that was but was robustly implemented. With all of these mitigations for both secret loyalties and exclusive access and singular loyalties, you would worry that they would be disabled by the group planning a coup, right? Say say that, for example,
01:05:00
Speaker
you are ah the CEO of an AI company and you're giving api access so to evaluations, organizations, testing your model, trying to see what they're capable of. Maybe you just cut off access before you get to the really powerful model that could actually be the model that that helps you conduct a coup.
01:05:19
Speaker
Do we have ways of making sure these mitigations are entrenched before in in ah in such a way that they can't be removed by the group planning a coup?
01:05:31
Speaker
This is a great question. It is it is pretty tricky. CEOs, by default, i have a lot of control over their organizations. And similarly, heads of state, you know including the US president, has a lot of control over the military and over the government.
01:05:46
Speaker
So yes, there's a risk that one of these powerful individuals can realizes that maybe they want more influence by gaining control over AI and notices that there's these kind of pesky little processes that prevent that. And it's like, okay, well, let's remove them.
01:06:03
Speaker
I can give you know easy say, you know productivity reasons to prevent them, red tape reasons.
01:06:11
Speaker
And you know if they can make a plausible argument, then it could be hard to oppose them. So i do think it's a big issue.
01:06:21
Speaker
But by I'd say a few things.
01:06:24
Speaker
Firstly, something I mentioned earlier, i don't think that anyone is today planning to do an AI-enabled coup. The way I think this works is that people are faced with their kind of immediate local situation, something they want to do over the next month, and the blockers that they're facing to doing that specific thing.
01:06:45
Speaker
And what tends to happen is people you know tend to want more influence that that helps them get get stuff done. And so people will kind of bit by bit kind of move in the direction of getting more control over AI.
01:06:56
Speaker
But they won't be kind of thinking, yes, I need to make sure that I remove this whole process because that will allow me to do an AI-enabled code. That's kind of unrealistically galaxy-brained. And so what we could do is we just set up a very efficiently implemented and very reasonable set of mitigations that doesn't really prevent CEOs from doing what that that that they're trying to do.
01:07:17
Speaker
um And so the CEO doesn't find in their day-to-day that they're wanting to kind of like remove these things that are holding them back. But because these mitigations are here, the CEO never gets to a place where they're anywhere close to being able to do a coup or where there's any kind of pathway in in their mind to be able to doing a coup because they're they're constantly prevented from getting access to kind of really powerful AI advice that might point out ways in which they could do this because they're like surrounded by colleagues that like, you know, strongly believe that these mitigations are sensible and reasonable. and in fact, they are well implemented and, you know, there aren't many downsides.
01:07:54
Speaker
maybe an environment where they kind of get kudos for the fact that they've said, yep, obviously I'm not going to get access to helpful learning systems. That's crazy. And then that's kind of like something something that that that makes them seem good.
01:08:11
Speaker
So that that that's one thing to say. um Another thing is, again, going back to this point that there are currently checks and balances and there is not currently a situation where one person has power.
01:08:22
Speaker
you know if the if If the entire board of a company and other senior engineers recognize the importance of the mitigations, know about this threat model, then they will notice if the CEO is is moving that direction.
01:08:35
Speaker
um And you know similarly similarly within the government, there are checks and balances and and they they they could be activated if people are looking out for it.
01:08:46
Speaker
do Do you think these traditional oversight mechanisms like a board being in in control of of the CEO, being able to fire the CEO, or the possibility of ah Congress or the Supreme Court kind of overruling or constraining the US president, do you think those will persist in environments where ai is moving very fast and it is AI capabilities are are growing at a rapid pace?
01:09:16
Speaker
It's a great question. here's Here's one story for optimism. Today, things are moving fairly fast, but those checks and balances are somewhat adequate, at least to preventing really egregious situations.
01:09:32
Speaker
By the time that AI is moving really quickly, we'll have handed off a lot of you know the implementation of government, the implementation of things in the AI companies, the research process, we'll have handed it off to AI systems.
01:09:47
Speaker
And when we do that handoff, we could program those AIs to maintain a balance of power. So rather than handing off to AIs that just follow the CEO's commands or AIs that follow president's commands, we hand off to AIs that follow the law, follow the company rules, report any suspicious activity ah you know various powerful human stakeholders.
01:10:07
Speaker
And then by the time things are going really fast, we've already kind of got this whole layer of AI that is is maintaining balance of power. Like the whole AI government bureaucracy, the whole AI kind of company workforce, they are like better than humans today at standing up to misuse potentially.
01:10:27
Speaker
They are like less easily cowed and and and intimidated and they they they could actually make it harder for someone in a in a position of formal power to to kind of get excessive influence. So this is like the flip side of, you know, the the singular loyalties where you potentially deploy these AIs that are like explicitly loyal.
01:10:47
Speaker
You can actually kind of instead get kind of singular law following a balance of power maintaining AIs that you deploy. And so the hope is that by the time we ah really things are beginning to go kind of crazy and we're really seeing speed ups from may i We've already kind of to set ourselves up in an amazing way to maintain balance of power. And there's this this critical juncture where we are handing off to AIs.
01:11:10
Speaker
and And it's just, you know, what what what are those AIs... you know what are their loyalties? What are their goals? And you know I think we can gain a lot by making sure that those AI systems are maintaining balance of power, reporting you know illegitimate suspicious activities, um and are not kind of overly loyal to any one person.
01:11:30
Speaker
How do you think the risk of AI-enabled coups interface with kind of more traditional notions of of AI takeover? So just a ah misaligned highly capable or advanced AI system taken over in kind of contrary to the wishes of the developers or the other governments?
01:11:51
Speaker
Yeah. I mean, there's, there's, there's some close analogies. The, you perhaps the most analogous case is the case of secret loyalties where, you know, you've got, got these AIs that have been told by the CEO to have the secret goal of seizing control and then handing control to the CEO.
01:12:14
Speaker
That's just very similar to AIs that wanted to seize power from themselves secretly. And you know all the same stories could apply where the AIs make military systems and then they control the military systems and the robot army and then they seize power.
01:12:25
Speaker
And the only difference is, were they seeking power because it just kind of accidentally emerged from the training process, which is the misalignment warrior, or were they seeking power because the CEO program them in that way.
01:12:37
Speaker
You know, but that's the kind of seed of the power seeking. But then with the secret loyalties to that model, the the rest of the story is, you know, pretty similar. I mean, there's still differences, you know, in the secret loyalties case, the CEO might be doing more to help the AIs along with their plan.
01:12:53
Speaker
You know, maybe even in the misalignment case, the AIs have managed to kind of manipulate the CEO into into doing similar things. So that's the case where it's,
01:13:04
Speaker
Most analogous. I think the yeah and and another difference that's salient to me is that if there are lots of different AI projects, then AI-enabled coup seems a lot harder because you'd need like lots of different humans to kind of coordinate, to kind of seize power together.
01:13:25
Speaker
which seems, yeah while while I can totally believe that um one one one person might try and cease power, does seem less likely to me that there'd loads and loads of humans that would want to to do that from lots of different labs.
01:13:39
Speaker
um Whereas for the misalignment story, it It is you know more likely the case that if one of these labs has misaligned AI, then maybe you know lots of them have misaligned And so then it's it's more likely that you would have you know maybe 10 different AIs colluding and then seizing power and taking over.
01:14:02
Speaker
And so that kind of collusion between multiple different AIs is is more likely in the case of misalignment than in the case of an AI-enabled coup. Just because if there's one misaligned AI, then there's something about the training process for AI systems that that are causing and misalignment. And then it will be a common feature among among many companies.
01:14:22
Speaker
Exactly. Whereas just the fact that one CEO instructed ah a secret loyalty would not, to the same extent, make you expect that other CEOs have done the same.
01:14:33
Speaker
So you mentioned this possibility, but but you think yeah what do you think of the prospect of a president or a CEO of a company being duped by a misaligned AI into conducting a coup on its

AI's Influence on Leadership and Governance

01:14:46
Speaker
behalf?
01:14:46
Speaker
So you can imagine a president or a CEO kind of thinking that that he's conducting a coup to remain in control, but he's actually acting on on behalf of a misaligned AI.
01:14:59
Speaker
Yeah, I think it's an interesting threat model. And yeah some people who think about takeover threat models take it pretty seriously. And it's just you know it's just a case where we're just completely mixing these two threat models together.
01:15:12
Speaker
you know People who are worried about AI takeover for this reason should be very supportive of the kind of anti-queue mitigations I'm suggesting. Because if we implement checks and balances that prevent any one person from getting loads of power, then that AI will not be able to convince them to try because they just won't be able to succeed.
01:15:32
Speaker
So you know i see i see this as like you know and an additional reason to worry about AI-enabled human coups and to try and prevent them is that, yes, even if no human wanted to do this normally, misaligned AI might make them try.
01:15:51
Speaker
in terms of how plausible I find the threat model, and honestly, i think that if a human tries to seize power, The main reason is that that human wanted power. like This is just something we know about people. We know it about you know heads of state today.
01:16:10
Speaker
you know It's very clear that but you know many heads of state in the most powerful countries in the world are very power-seeking. We know it about CEOs of you know big tech companies. We know about you know ah but about about some of those leading leading AI companies that we we do know that they're very power-seeking, those CEOs. and so I don't think we need to to theorize that like they were massively manipulated by the AI and convinced to become power seeking.
01:16:37
Speaker
like i think I think it's more likely that if they seek power, they just they just did it for the normal human reason. i I do think AI will ultimately get good at persuasion.
01:16:48
Speaker
um I don't particularly expect it to be hypnotic level persuasion, though though you know obviously there's massive uncertainty here. And yeah, I do i do think that a very smart AI, where there's a human that's already kind of kind of interested in seizing power and already kind of makes sense for them to maybe do it, yeah and a miscellaneous AI could totally nudge them in in that direction and then could implement that in ah in a way that actually allows the AI to seize power later. I think that is that is very plausible.
01:17:20
Speaker
When we're thinking about distributing power and and kind of having this balance of power, we can imagine the models being set up via post-training, via the model spec, via various but mechanisms to have obeyed the user's unless unless ah the what the user is instructed to do is is in conflict with what the company is interested in, and perhaps obey the company unless what the company it's using the model for is contrary to what the government permits.
01:17:55
Speaker
But when we set it up in those levels, you ultimately end up with the government in in control in some sense. And I guess that exposes you to so risk of of a government coup then.
01:18:09
Speaker
If you have at the ultimate top layer of the stack, here's what the models can and cannot do according to the government. Well, I'd say a couple of things. First is that the government isn't...
01:18:24
Speaker
a monolithic entity. And so that government decision of what the balance should be could be informed by multiple different stakeholder groups. And then ideally, you know, it's ultimately democratically accountable. I do think that democratic accountability becomes more complicated in a world was where there's massive change in a four-year period.
01:18:41
Speaker
Just for for the simple reason that there there's no election during a period where where a massive change is happening. so So the feedback loop is too slow. Exactly. You know, I think the risks of AI-enabled coups will probably emerge and then you know be decided within a four year period as in like it it will be resolved whether or not it happens or doesn't, all without any like intermediate election feedback.
01:19:06
Speaker
you know That doesn't mean that democracy can't have an effect because politicians anticipate you know what what future elections will we will find and want to maintain favor throughout their terms, but it does it does pose a challenge.
01:19:20
Speaker
but But sorry, i was I was kind of saying, even absent that, there's there's many different stakeholders and in the government. And so, you know, they would it would have to be a large group of government employees that were kind of trying to do a coup.
01:19:33
Speaker
And then they would have kind of, the companies would know that they were setting these odd restrictions um on on the on on the kind of behavior. And so the companies would know and they have leverage and power.
01:19:44
Speaker
And then, you know, it could go public. So i do, I don't, I don't think it would be that easy for the government do coup. First, there's a difference also between allowing the government to set restrictions on what the models can do and then allowing the government some kind of access to to commanding future AI systems in certain directions.
01:20:04
Speaker
So it's kind of setting setting limits versuss versus steering the systems. Yeah, exactly. i mean, the distinction I was going to highlight was between specifically making AI systems loyal to, for example, the head of state and the setting very broad limits where there's just like, you you can pretty much do whatever you want, except for these are obviously bad things, where that second option doesn't really enable anyone to do a coup.
01:20:29
Speaker
It just enables everyone to do whatever they want. And then you kind of blocked out all of the kind of coup enabling possibilities through through those limits. you know as as long as you haven't made those systems loyal to a small group. So given that there's this obvious option to just put in these limits that block coups but don't enable coups, and given that there's you know wide range of stakeholders that could potentially feed into what the AI's limitations and instructions are, I think it's very, very feasible
01:21:02
Speaker
to get to a world and that that where there's robustly not central centralization of power. There's obviously a big uncertainty over whether we will actually get our act together and and get those limits put in place in the right way.
01:21:14
Speaker
When do you think these threat um the threat of AI-enabled queues materialize? Is it at some specific point in AI capabilities, or does it simply scale with the systems getting more advanced?
01:21:29
Speaker
When do you think the threat is ah is at its peak? It's a good question. For the threat models that I've primarily focused on, they but require pretty intense capabilities.
01:21:41
Speaker
So that, for example, the secret loyalties threat model more or less requires AIs to do the majority of AI research. So we're talking about you know fully replacing the the world's smartest people in a very wide range of research tasks and coding that's that's pretty intense and then a lot of the threat models that i focus on route through military automation that is ai and robots that that can kind of match you know human boots on the ground and that's you know that's that's pretty advanced again that said
01:22:16
Speaker
i I think you know you can probably do it with less advanced capabilities than that. So like we yeah drones today are already pretty good, already providing, you know making a big difference in in some military situations.
01:22:31
Speaker
So it's you know not out of the question that you know more limited forms of AI and robot military technology could could be enough to facilitate a coup.
01:22:44
Speaker
It's a bit harder because if if they're limited, then there's a question of why the existing military doesn't just kind of seize back control after a bit of time. And so probably that scenario also has to involve things like and maybe the current president you know supporting the coup and therefore pressuring the military not to intervene or some other source of legitimacy for for the coup beyond the AI-controlled drones.
01:23:09
Speaker
controlled drones
01:23:14
Speaker
And then there's also kind of more kind of typical types of backsliding, like has already been happening in the US that I think you know could be exacerbated through enabled surveillance and you know AI kind of increasing state capacity in other ways.
01:23:32
Speaker
And again, you know that backsliding doesn't require you know super powerful AI. you know You could probably do a lot of monitoring, a lot of kind of content moderation on the internet, a lot of surveillance with today's systems.
01:23:47
Speaker
um it doesn't get you all the way to one person having complete control where they can just quash any resistance with a robot army and replace everyone in their job with a with an AI. And so no one has any leverage.
01:24:01
Speaker
So I think, you know, to get to that real intense, this is like the most intense form of concentration of power via ai that requires really powerful AI, but to just kind of significantly exacerbate existing trends in in political backsliding and, you know, to make it easier to do a military crew, I think, you know, more limited systems would suffice.
01:24:25
Speaker
yeah we We discussed earlier the the possibility of of one country or one company outgrowing the rest of the world and kind of concentrating power into into those entities. i Now you mentioned one person. do Do you think that's actually ah plausible scenario in which you have, say, one CEO of one company being being the person in in control of the world via a concentration of power and then a coup?
01:24:52
Speaker
ah hundred 100%. Yeah. percent yeah I mean... yeah The story I told earlier about secret loyalties, yeah know meaning that now we backdoored wide range of military systems, meaning that you can seize power, that's that's one route.
01:25:05
Speaker
And then, you know again, there's this other with the company and masses amounts of economic power by kind of having a monopoly on AI cognitive labor. And then you're leveraging that leveraging that to get more economic power, more political influence.
01:25:21
Speaker
Yeah, i mean i do I do think it's possible You know, again, there's this big shift once air can fully replace humans, where today, no, one person can never have absolute power. They have to rely and others implement their will.
01:25:35
Speaker
And this is what makes kind of current exist currently existing dictatorships unstable, where there's always a threat of kind of internal revolt or outside factors threatening the dictatorship.
01:25:47
Speaker
But this this could potentially change, yeah. Yeah, there's always a threat of revolt. And then to to guard against that threat, the dictator needs to share their power to some extent, has to compromise.
01:25:59
Speaker
But yeah, you could get it all concentration one person with sufficiently powerful AI. Do you think we we move through a period of increased threat of AI-enabled coups and then reach some kind of stable state? Or do you imagine that there's ah a constant kind of risk of AI-enabled coups in the future?
01:26:20
Speaker
I think we move through it.
01:26:23
Speaker
Yeah, it's it's it's this point about once we have deployed AI across the whole economy, the government, the military, if those AIs are maintaining balance of power, then we could fully eliminate the risk of AI-enabled coups.
01:26:38
Speaker
no It would just be as if our whole population was just you know so committed to democracy, would never seek power, and you know never help anyone else who who who wanted to undermine any democratic institution.
01:26:51
Speaker
um you you know you could we already have strong norms and you know favoring democracy, but you know they're far from perfect and they have been eroded over recent decades. But you could you could just get rock-solid norms. they're They're programmed in.
01:27:03
Speaker
you know They cannot be removed. except by will of the people. i mean, there's a there is a bit of a question because you still want to give the human population the ability to change the AI's behavior and its rules.
01:27:16
Speaker
So the human population could always choose to move to an autocracy. um so I suppose I shouldn't say that we could fully eliminate the risk because you know we will always have that you know democracy.
01:27:30
Speaker
There's always the this point that democracy could you know vote to stop being a democracy. But I do think yeah we could get to a point where it absolutely cannot happen without most people wanting it to happen.
01:27:43
Speaker
And you so so you would get to a point in which kind of future AI-enhanced societies, you could say, are more stable than than current democracies. And they're more they less at risk of coups or democratic backsliding than than current democracies.
01:28:01
Speaker
Much, much more. Yeah, you you could get much more.

Predicting AI-Enabled Coup Risks

01:28:05
Speaker
but robustness there. I mean, just that there's this constant dynamic in today's societies where people care about democracy, but they also care about a host of other things, you know their own achievements, various other ideological commitments.
01:28:17
Speaker
And so you know depending on how dynamics play out, depending on how technology evolves and what people's incentives are, sometimes people push against democracy. you know that That's what the Republican Party has been doing. In some ways, that's what Democratic Party has done as it's kind of increasingly put you know pretty ideological people in in powerful institutions.
01:28:37
Speaker
So with AI, you you can you get much more control over those dynamics because you can just you know you can just make it much more the case that that that democracy is not being compromised.
01:28:51
Speaker
Are there any ways for us to, are there any kind of risk factors we can look at if we're interested in in predicting coups? Do you think there's something we can we can measure or something we can track to see whether we are at risk of ah ah an AI-enabled coup?
01:29:08
Speaker
It's a great question. I don't think I have an amazing answer, but some things that come to mind
01:29:14
Speaker
the the the capabilities gap between top AI labs and then the gap again with open source. The degree to which AI companies are sharing their capabilities with the public and if not with the public, then with you know multiple other trusted institutions, you know like sharing their strategy capabilities with kind of US s political parties and parts of government.
01:29:46
Speaker
um
01:29:49
Speaker
the The extent of economic concentration, you know, how much how much, what are the revenues and net worth of particular companies, or particular AI companies?
01:30:06
Speaker
It's another one. What is the extent of government automation and military automation by ai systems? And when that automation is happening, how robust are the guardrails against breaking the law and guardrails against other forms of illegitimate power seeking?
01:30:28
Speaker
um How much transparency does the public or the judiciary or the Congress have into how
01:30:37
Speaker
dangerous AI capabilities are being used by by AI companies and by the executive branch. So, you know, take the example of military R&D capabilities. That is, yeah really smart AIs that can design super powerful weapons.
01:30:52
Speaker
It's scary if companies can just use those military R&D capabilities without anyone knowing. It's also scary if a small group of people from the executive branch can use those capabilities without anyone else knowing how they're using them.
01:31:07
Speaker
because they could be designing powerful weapons and making them loyal to a small group, you know, so transparency and into these like high stakes capabilities and how they're being used by a, by a broad group.
01:31:18
Speaker
It doesn't have to be public. um Probably shouldn't be public, but you know, we have checks and balances already. So, you know, another, another kind of question is as, as, as these high stakes use cases start occurring or they become possible, do do we know that there's transparency requirements in place? you know as we As we increasingly see ai companies contracting with Palantir and other military contractors, we can kind of begin see they're making increasingly powerful um weapons.
01:31:50
Speaker
is the Is there a process of oversight? Do we know that if someone was trying to you know make AI military systems loyal to them, that they would be spotted? that that That's another indicator we we can look at.
01:32:05
Speaker
You can look at you know all the kind of standard democratic resilience indicators that and the social scientists have come up with. There's various things about and kind of free and fair elections, about civil society, about freedom of press, that have been getting worse recently in the US.
01:32:24
Speaker
um But there's there's various indicators here. You can look at the degree of government censorship over freedom of speech or what's on the internet.
01:32:36
Speaker
and the degree of surveillance that the government's doing. but if you If you take all of these ah things into account, yeah how do you think about the the risk of an AI-enabled coup in the next 30 years, say?
01:32:52
Speaker
The next 30 years?
01:32:56
Speaker
i think it's high. think the risk is high. would guess it's 10% or something.
01:33:04
Speaker
And that, you know to be clear, you know if if it was just existing political trends, ignoring AI, i' be you know maybe a few percentage, maybe on like 2% or something. you know though There's definitely a risk of that. and And I'm thinking about the US
01:33:21
Speaker
here. A big part of my my current worries are not about the indicators. but it's about my expectation that AI capabilities will keep well increasing quickly and even more quickly. And then the kind of absolute lack of interest in regulating AI companies right now in the US and the the difficulty that we will have of constraining the executive under the current situation where, you know, the president is using, you know, sophisticated legal strategies to increase their own power and is, you know, succeeding on many fronts. You know, the U.S. is not doing a great job at constraining the executive. So, you know companies are unconstrained. The executive is poorly constrained.
01:34:10
Speaker
Those are the key threat actors here. so you know, with fast air capabilities progress plus that lack of constraint, lack of transparency, you know, the default is that, know, A lot those indicators I said get worse and none of the indicators get better, like transparency.
01:34:25
Speaker
And so that but makes me think this is very plausible. Yeah, I mentioned 30 years, but but what about five years? Five years, that's tough, isn't it?
01:34:36
Speaker
It's really tough. I mean, yeah, I think there's a risk. I... I wouldn't think there was a risk if it wasn't for the AI research causing an intelligence explosion angle, but AIs are a lot better at coding and cognitive kind of research related tasks than they are at, know for example, yeah controlling robots and stuff.
01:35:02
Speaker
And so even if the third model ultimately comes through robots or comes through crazy levels of persuasion,
01:35:10
Speaker
It's just you really can't rule out a scenario where yeah AI research is is is automated in three years' time. Then in four years' time, we've got super-intelligent AI controlled by a few few people. Maybe it's got secret loyalties.
01:35:23
Speaker
Maybe it's being deployed in the government and being made overtly loyal to the president. And then you know a year later, you know it's it' backsliding or it's political capture or it's robot soldiers.
01:35:36
Speaker
Yeah. how How do you think about the badness of the outcomes here? how How much does the badness depend on the ideologies of the people who are, who are conducting the coup? Or and what should we look out for?
01:35:52
Speaker
if Because, i mean, i guess we can rank queues by badness, and which is not an exercise I think we should actually attempt, but we can we can kind of talk about the factors involved, about what would be the worst kind of coup and what would be a a slightly better kind of coup, slightly less bad kind of coup.
01:36:11
Speaker
Yeah. So we could, you know, we lets let's imagine it's one person that sees power. Actually, no, that's the first distinction to draw. If there's a group, then even 10 people is better than one person.
01:36:24
Speaker
and and why And why is that? Yeah. So 10 people, you get a diversity of perspectives. So more kind of moral views represented. And there's more kind of room for compromise between those perspectives.
01:36:37
Speaker
There's more room for kind of reasonable positions to win out as there's kind of some some deliberation as as actions are decided upon.
01:36:46
Speaker
There's slightly less intense selection for psychopaths than if it was just one person.
01:36:54
Speaker
So yeah, if it's just one person, that's bad. That's particularly bad. 10 people are still very bad. 100 people still pretty bad, but you know it's that there's big differences there, big differences.
01:37:07
Speaker
if if If we're now just thinking about you know one person or or the average person in the group, then we could think about how competent they are. And then we could say something about kind of how how kind of virtuous their motivations are.
01:37:21
Speaker
um i I do think competency is important. like I think it's probably underrated in most political discussions how important it is to just be ready but really really competent. youre Thinking about something like responding to COVID,
01:37:36
Speaker
or thinking about something like, you know, trying to deescalate a conflict, you Russia, Ukraine, or trying to deescalate Israel conflict, like actually just being very competent and but very good at getting things done is important.
01:37:50
Speaker
And,
01:37:53
Speaker
as we' As we mentioned, if you're just willing to rely on AIs and you you know align those AIs in the right way, anyone could be really competent, but yeah that's not guaranteed.
01:38:04
Speaker
ah People you know may may really want to to cling to their current views without without changing their minds. Let's take the example of Donald Trump. If a really smart AI system told him, look,
01:38:17
Speaker
tariffs are definitely bad for the US economy. They're definitely bad and won't give you what you want. but Would he change his mind? i would i would guess no. So you know lots of lots of smart people have already been saying that and and his supporters.
01:38:32
Speaker
yeah I don't actually know like the economic details here, but like my understanding is that most people think that they're pretty bad. And it'll still be the case that you know Trump will be able to find people telling him that what he thinks is is good. And he'll be out to program his AIs to keep telling him that if he wants to.
01:38:48
Speaker
So there's no guarantee that that that he will become super competent or that whoever sees power becomes super competent. So there's there's this kind of like, there's ah there's a form of loyalty that actually undermines competence just because you're you're not, you're loyal to such an extent that you're not providing feedback that's that's useful because, you know, negative feedback feedback feels bad to receive.
01:39:09
Speaker
And so there's does that kind of loyalty. I mean, maybe this is a bit contrived, but do you think there's a sense in which in the singular loyalty scenarios, the AIs could be so loyal that

Ensuring Competence and Loyalty in AI Systems

01:39:23
Speaker
they are,
01:39:25
Speaker
they're kind of undermining the competence of of the person that is that that they're singularly loyal to. Yeah, it's a really great question. I haven't thought about this, but yeah, in a way, the most extreme version of singular loyalties will just agree with whatever the most recent thing that you know the the the dictator has said.
01:39:44
Speaker
it's like going you know It's a version of sycophancy, which we already see. without questioning. and And we'll do that even when it's not in that person's interests because that's the kind of type of loyalty that's demanded, where there's a more kind of sophisticated type of loyalty where you're still completely loyal, but you're also willing to challenge them when when you think it's in their and in their best interests.
01:40:10
Speaker
So that's a really nice distinction. And yeah, I suppose one way of yeah thinking about competence is thinking about what kinds of loyalties
01:40:20
Speaker
the dictator would demand from their AI systems. um another Another way of thinking about it is how much they would listen to their AI advisor. Even if the AI has the kind of sophisticated type of loyalty and is trying to tell the dictator what to do, the dictator could just ignore them.
01:40:37
Speaker
And you know you see that again. you know AI is a fairly sycophantic. They will also challenge you sometimes. And then it you know it's up to you whether you listen. So that's sort of the confidence bucket, which I think is really important.
01:40:49
Speaker
And I do think there are differences between potential coup instigators on that on that front, which which could be significant. Well, yeah, I guess my my expectation would be that lab CEO coups would be more competent than heads of state. But you know even even within lab CEOs, there are some that are more dogmatic than others.
01:41:13
Speaker
And I think that dogma would get in the way of competence.
01:41:19
Speaker
That's competence. And the other thing I mentioned was kind of kind of broadly, what are your goals, what are your values or or kind of more character? And here, you know, but i one thing I think is really important is being open-minded, being willing to bring in lots of different, you know, diverse perspectives into the discussion and empower them.
01:41:40
Speaker
to you know really represent themselves and grow and flourish. um So I think a very bad thing would be yeah ah particular person who becomes a dictator. They implement their vision for society.
01:41:51
Speaker
and of Much better to be and empower all the different kind of ideologies and ideas to kind of you know become the best versions of themselves.
01:42:03
Speaker
And then you know we can kind of collectively grow and improve our understanding of how to how to run society. um So yeah know sometimes people focus when they're thinking about values on like, okay, are you ah you a this type of utilitarian or, oh no, I hope you're not a deontologist or, you know it can get very kind of specific and finger pointing.
01:42:24
Speaker
You know, my view is more that, you know, we don't really know what the right answer is. And the most important thing is is is being pluralistic and, you know, letting a thousand flowers bloom.
01:42:36
Speaker
So we discussed the possibility of getting to a stable state in which we've avoided an AI enabled coup. And now we have say we have aligned superintelligence, kind of that where the risk of coup is is very low.
01:42:51
Speaker
Do you think this is something that happens for one country and then that one country is is is in control of the world to such certain an extent that it's that this is not a process that other countries are undergoing?
01:43:04
Speaker
To be more concrete here, for example, if the US goes through a risk of AI-enabled coups, but managed to kind of stay... so to remain a stable democracy.
01:43:17
Speaker
Is it the case that Russia or China will go through a similar ah period of risk of queues? It's a great question. And it will depend on the US's kind of posture towards the rest of the world geopolitically.
01:43:34
Speaker
And it will also depend on you know whether the US has gained a huge military and economic advantage, you know like outgrowing the world or just developing powerful military technology, as we were discussing previously.
01:43:47
Speaker
But you know you can imagine one scenario where
01:43:51
Speaker
and the US isn't that that much more powerful than the rest of the world yet. And isn't that like kind of inclined to to intervene, which has been kind of the recent trend.
01:44:05
Speaker
And then you China developed similarly powerful AI a few years later. And Jinping uses it to cement his control over China.
01:44:17
Speaker
So then you now have one kind of AI-enabled dictatorship that is extremely robust. And then you have the kind of US, which you know has has avoided that risk. And now they're kind of, you know maybe they're competing against each other and kind of you know the Cold War III, I'm trying to kind of, Cold War II, sorry, trying to outgrow the world.
01:44:37
Speaker
Or maybe maybe they're striking deals because you know they recognize it's not good to compete. And you know China just kind of indefinitely remains a dictatorship and you know, that, that's just a permanent loss for, for the world.
01:45:00
Speaker
But you, but you could also imagine a different scenario where the U S is very far ahead and maybe, you know, it just wants to really secure its position geopolitically. And so it, you know, it and instigates and enable crews and other nations where it's really putting kind of U S representatives
01:45:17
Speaker
up on top of those those nations. That could be through secret loyalties, it could sell AI systems, let's say to to India that are secretly loyal to US s interests, or it could give give give some particular politicians in India access, exclusive have access to super intelligent AI to help them gain power.
01:45:39
Speaker
So you could you know you could apply those same threat models we've discussed, but with the kind of us as playing the strings,
01:45:46
Speaker
or Or you could have the US s just kind of taking control of other nations in more traditional ways, um you know, just military conquest and kind of really leaning heavily on kind of extracting economic value out of other countries as they outgrow the world.
01:46:05
Speaker
So, you know, yeah, kind of a wide range of options. here really. Yeah, yeah. as ah As a final topic here, perhaps we can talk about what listeners can do if they want to help try to prevent AI enabled coups and specifically where to position themselves.

Preventing AI-Enabled Coups

01:46:22
Speaker
should Should they be in AI companies? Should they be in governments? Should they be in perhaps eval organizations? where where Where is the position of most leverage?
01:46:35
Speaker
Great question. I think being a lab is a great place to be.
01:46:42
Speaker
I talked about system integrity, kind of robustly ensuring that AI's don't secret loyalties and behaviors intended. That's something that companies need to implement. So if you have interest or expertise in sleeper agents or backdoors to AI models or cybersecurity, then I think being part of a lab and helping them Achieve system integrity is an amazing way reduce this risk.
01:47:09
Speaker
um Another thing you can do at labs, if if you're you know if you're interested in, if you're if you're worried about kind of the the risk of you know government yeah heads of state deploying loyal AIs and seizing power, is you can help labs develop terms of service.
01:47:27
Speaker
where when they sell their AI systems to governments, they have certain the mitigations against misuse. um that Maybe you know one way to frame this is, look, with with you're using really powerful AIs and we can't guarantee the safety of those AI systems unless we have some degree of monitoring to ensure that the AI systems aren't doing anything unintended.
01:47:50
Speaker
That monitoring could then be sufficient to allow the for the prevention of... of of coups because you're being monitoring not only for kind of accidental misaligned AI behavior, but you know that will also thereby mean you're monitoring for you know a bad human actor giving them illegal instructions.
01:48:10
Speaker
So that that you know labs will be drawing up contracts with governments, times of service. They will be thinking about you know, the guardrails, if any, that go on a place on the systems that they that they sell to governments.
01:48:24
Speaker
But I think, you know, there's very careful work to be done thinking through, okay, how can we structure those guardrails? How can we explain them in a way which is, you know, but very unarguable and that doesn't seem like we're kind of trying to like, you know, constrain the government, you know, private companies don't, you know, it's not really legitimate for them to kind of constrain the government. But, you know, I do think there's an important an important thing to be done here in preventing air-enabled coups. So kind of threading that needle. There's another thing that you could do in a government, but you could, in a lab, but you could also do that kind of work for ah think tank or for a research organization that's kind of interlinked with government like, like RAND, I think potentially do some of this kind of work thinking about, you know, what should, you know, be, be in the terms of services between, between labs and governments.
01:49:17
Speaker
Let me think. I think another big thing is that for for for this system of integrity, yes, we want labs to implement it, but we also want there to be some external organization that can certify that.
01:49:29
Speaker
Currently, no external organization is working on this. Mita's not working on it. Apollo's not working on I don't think any evaluations organization is. But there will be, at one point in time, a very strong demand for for for guarantees that the systems that these companies have developed don't don't have sleeper agents.
01:49:50
Speaker
Because if you're deploying a system in critical infrastructure, in important government roles in the military, you want to know that it does what it says on the tin.
01:50:01
Speaker
um And so I think that you know a company can make a wise investment by developing that capability, the ability to understand how to insert secret loyalties, how to detect them, being able to assess a company's process to see whether it's sufficient.
01:50:17
Speaker
but how How close would that work be to kind of interpretability work? isn't Isn't there overlap there kind of like looking into what is the system actually doing?
01:50:28
Speaker
You know, fine, kind of reverse engineering some some process to see how things ah how things work. Is detecting sleeper agent like a specific constraint problem?
01:50:42
Speaker
Yeah, it's a great question. It's a specific type of problem so you could use internals work like interpretability to detect steeper agents that's not the only method you could use you could use behavioral tests you could look at the training data i think a mixture of those is useful it's also it's a specific adversarial setup so and in interpretability we're often interested in all kinds of kind of facts about the model's behavior and its cognition
01:51:13
Speaker
with With secret loyalties, we're specifically concerned with one red team trying to produce a secret loyalty and then a blue team trying to detect it. And so it's it's adversarial in that you yeah the red the the the blue team has a tough task.
01:51:29
Speaker
The red team will be purposely trying to hide the sleeper agent. And so that, you know, that will raise a whole host of new considerations and important things to dig into.
01:51:40
Speaker
So you you can imagine being like, allowing the red team you full control over all aspects of the development process. Can they create a secret loyalty? Maybe they can, maybe it's pretty hard to test. Okay, but now you only allow them to intervene at one part of the development process.
01:51:54
Speaker
Or you allow the the blue team to look at all the data that they use for training, or you allow the blue team to look at all of the code that they wrote. And you could get potentially quite a sophisticated understanding of the conditions under which the blue team win, conditions under which the red team win.
01:52:09
Speaker
This research doesn't need to be done in a lab. it It could be done by an external organization. i think it's, you know, big, big missing miss missing focus of of of today's technical work.
01:52:20
Speaker
um And ultimately that that would inform the assessments of of the lab's attempts to do system integrity. So for technical researchers out there, i' I'd really highlight that possibility.
01:52:35
Speaker
another another Another kind of piece of work for for the right person would be beginning to understand the kind of existing military thinking around autonomous thomas of systems.
01:52:49
Speaker
This is already obviously a live issue for militaries. They are increasingly deploying AI. would be nice to marry up that existing expertise with these kind of risks about more powerful systems, enabling coups, and kind of get to a consensus within that military community of basic principles like law following, like distributed control over military systems.
01:53:16
Speaker
and you know figure out a kind of a military procurement process, which is both practical, but also robustly prevents this kind of stuff. um So you know if there's anyone listening that has a way in, I think that's that's potentially pretty pretty valuable.
01:53:34
Speaker
Although that there's also a risk of poisoning the well if it's done badly. So proceed with some care. Yeah, yeah. Perfect. Thanks for chatting with me, Tom. It's been great.
01:53:46
Speaker
Yeah, real pleasure. Thanks so much, Gus.