Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Andrea Miotti on a Narrow Path to Safe, Transformative AI image

Andrea Miotti on a Narrow Path to Safe, Transformative AI

Future of Life Institute Podcast
Avatar
3.7k Plays26 days ago

Andrea Miotti joins the podcast to discuss "A Narrow Path" — a roadmap to safe, transformative AI. We talk about our current inability to precisely predict future AI capabilities, the dangers of self-improving and unbounded AI systems, how humanity might coordinate globally to ensure safe AI development, and what a mature science of intelligence would look like.   

Here's the document we discuss in the episode:   

https://www.narrowpath.co  

Timestamps: 

00:00 A Narrow Path 

06:10 Can we predict future AI capabilities? 

11:10 Risks from current AI development 

17:56 The benefits of narrow AI  

22:30 Against self-improving AI  

28:00 Cybersecurity at AI companies  

33:55 Unbounded AI  

39:31 Global coordination on AI safety 

49:43 Monitoring training runs  

01:00:20 Benefits of cooperation  

01:04:58 A science of intelligence  

01:25:36 How you can help

Recommended
Transcript

Introduction to AI Risk Discussion

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Docker, and I'm here with Andrea Miyati, who is the director and founder of Control.ai, which is a nonprofit working to reduce catastrophic risks from artificial intelligence. Andrea, welcome to the podcast. Thank you very much. Thank you for having me. Fantastic. You have a new report called A Narrow Path, which is about exactly this topic, how to prevent catastrophic risk. What is the narrow path?
00:00:28
Speaker
So what we lay out in this report is a plan for humanity to survive superintelligence and thrive and flourish beyond that. you know There is one key threat that we deal with. It's the development of artificial superintelligence, AI that is smarter than humans across a general range of tasks. In many cases, it could be smarter than all humans combined. And this is something that you know for some people,
00:00:53
Speaker
could be far away, but increasingly there's a concern that this this might be coming quite soon. There are definitely a handful of companies investing tens of billions of dollars soon, and hundreds of billions to make this happen.

Existential Risks and Timelines for AI

00:01:06
Speaker
And the thing is, if We develop super intelligence right now, as many experts warned us, you know now even newly minted Nobel Prizes, but also CEOs of the very companies developing this technology. World leaders like the previous UK Prime Minister, Rishik Sunak, we face an extinction risk and you know we need to chart a different path. this is what
00:01:30
Speaker
We tried to do here to chart a path where humanity keeps the benefits of AI as a tool, as a tool to and empower humans, as a tool to you know provide economic growth in applications that help us, but not as a successor species, not as an entity more powerful than us that overpowers us.
00:01:51
Speaker
I guess the recommendations in this report depends on when we think we will have something like AGI or even more advanced systems like superintelligence. And you you give an estimate of within three to 15 years. what What is the ground for that estimate?
00:02:10
Speaker
Yeah. So the girlfriend product estimate is a combination of the general estimates from people in the field with, you know, we've had recently some ultimate CEO of open AI, estimating something that is roughly three or three, three plus years in this view for super intelligence in a recent blog post. And 15 years is kind of a time scale where in case the current scaling paradigm doesn't work too much or slows down. There will still be a lot of algorithmic improvements and other advances that will still give us gains during that period. But
00:02:42
Speaker
Ultimately, I don't think that the exact timeframe matters that much. Like we kind of expect that this technology might be developed relatively soon, especially soon for a government's timescales.

Challenges in AI Understanding and Control

00:02:55
Speaker
We know governments are not that quick at many things, even a you know five to 10 year timeframe is a challenging deadline when you're facing extinction. So the reality is that this is coming. We don't know exactly when it's coming, prediction in future is hard, but we need to plan and we need to plan Now, if it comes in three years, we're going to be in real trouble. And even even if it comes in 15 years, we're still going to be in real trouble unless we plan. The timing matters, but it doesn't matter in a deep sense. What matters most is that we have an unsolved problem of how to align and control these systems, how to understand these systems. And if that problem isn't solved before we develop the the advanced systems, then we might be in trouble.
00:03:38
Speaker
So maybe you could you could tell us about how much do we understand about AI systems currently, and how is the understanding that you would like to see before we develop super-insulsions different from the understanding we have? Yeah. So as you said, the a big issue ah with AI, especially these advanced powerful AI systems, is that we understand them in many ways very little. And like we are approaching this without some scientific foundations that we did have for other large-scale transformative technologies.
00:04:07
Speaker
or intelligence, we have neither science or intelligence to let us may let's just make predictions about, you know, if you put this amount of computing power and this amount of data, what will you get? And we don't even have a metrology, ah measurement science of intelligence. We don't even have like a yardstick to be like, okay, ah well, clearly GPT-4 is as smart as two human beings and as smart as 10 mice. We do have a deep intuition as as humans of like,
00:04:35
Speaker
yeah We look at a rock and we think like that's probably like around zero intelligence or some output, very little. A flower is clearly more than a rock. It can take some limited amount of interaction with its environment. A mouse is more than a flower. and We are seeing AI systems that can do very complex things, but are they, as kind of quotes, are they cat level? Are they dog level?
00:04:59
Speaker
are the human level without a measurement science, we're just making guesses and unanchored guesses. And the reality is intelligence is not magical. Intelligence is just the ability to accomplish goals. It's a physical phenomenon like any other phenomenon that we have in our physical universe. And we were in this situation with other things in science before, like temperature wasn't like a clear measure of temperature wasn't developed until just a few a few centuries ago. And it just took people to do empirical experimentation and drawing up skills to actually measure this you know clearly tangible phenomenon that we didn't have real measures for. and Once we had those measures, and then we could make clear predictions and clear you know drawing the line, understanding different levels and so on. We will need the same for intelligence to be able to actually predict in advance, not just after deployment or you know even sometimes never, but
00:05:55
Speaker
to predict that before we build it, how powerful is it going to going to be? What's it going to be capable of doing? What is it not going to be capable of doing? that's These are the foundations needed to be able to control such a powerful technology, to know once to predict what it's going to do.
00:06:10
Speaker
So regular listeners to this show will know about scaling laws. And these laws seem to be something that we can use to predict AI capabilities in advance. In fact, you just mentioned them as something that, you know, when we scale up compute, we expect to see certain capabilities arise. Why are these scaling laws not enough for us to have a a a good understanding of of how AI capabilities emerge?
00:06:35
Speaker
Yeah, so scaling laws are definitely helpful, but they're you know they're ultimately empirical observations. that They're not and fundamental laws granted in theory like we would have in physics or other more mature disciplines. and the The proof is in the fact that people aware of scaling laws still cannot predict in advance what their models will be able to do. Like they can roughly see that given you know one order of magnitude bar of computing power, you will get a more powerful system, but they're not able to predict exactly at this level. The AI is going to be able to ah use coding at this level of human performance. At this level, the AI will start to persuade humans at this level. There are no such precise predictions being made. And this is actually the kind of things
00:07:21
Speaker
that we could get ah if we had a deeper theory of intelligence and an actual measurement of intelligence. so You can see how many people put it in it. That's correct. ah Compute is a very good proxy. It's kind of the best we have given that we don't have the actual measure, the actual yardstick for intelligence. We use compute as one of the main proxies to approximate it, but it's still a proxy and we we can't always rely on it.
00:07:50
Speaker
And there are many other areas that can improve even while keeping compute constant. Things like algorithmic improvements, things like just switching to the Chinchilla scaling laws compared to the previous scaling laws led to massive improvements in performance, even given the same amount of computing power.
00:08:09
Speaker
you know, things like OpenAI's recent model or one that uses a lot more inference compute rather than training compute to kind of essentially post-process its it's outputs and make them better. That's not really well captured by just training compute proxy. So we need need to go beyond, it's a very useful tool, but we need to go beyond that. We need to have a defense in depth approach to AI policy that doesn't just rely on compute thresholds.
00:08:36
Speaker
And that is actually what you lay out in an arrow path.

Phases of AI Development: Safety to Flourishing

00:08:39
Speaker
I think before we get too deep into this interview, I would love you for you to to lay out the three phases of an arrow path and what you would like the world to do in in each phase.
00:08:48
Speaker
Yeah, absolutely. So kind of as we've covered so far, we focus on one specific threat model, superintelligence. AI is smarter than any human or smarter than most humans combined. This is the ex explicit goal of some companies. It's the explicit goal of people developing this technology. And if we don't put safeguards in place, the default outcome of that development is human extinction. So what do we what do we do about that? Well, we chart a path of three phases, phase zero, faced with this threat. The first thing we need to do is to stop dying, stop losing. you know If we develop this right now,
00:09:29
Speaker
It's game over. Until we have the safeguards in place, until we know how to control it, we just cannot afford to develop super intelligence. Everybody loses if we do. It's not going to be the victory of one nation and over another, one company over another. Humility loses. The only winner is super intelligence. So in that phase, the goal is safety, preventing the development of super intelligence for at least 20 years. Then beyond that, if we do succeed at building our defenses and building ah beneficial AI tools while preventing super intelligence. Then we will face another issue, which is international competition, intercompany competition.
00:10:08
Speaker
rogue actors trying to develop this technology. We need to make an international system that is stable and where if we get these measures implemented, they just don't just collapse under competition and pressure. And this is why the goal of phase one is stability, building an international AI oversight system that is stable and beneficial for all participants. And finally, once we are safe, we're not dead.
00:10:33
Speaker
And we're not going to die anytime soon. We have a stable international system. So we have a way to prevent rogue actors. We also have a way to keep major AI players together rather than escalating into an arms race on AI. Then we can look into building a flourishing future. So the goal of phase two is flourishing, develop transformative AI, not super intelligence, but transformative AI.
00:10:59
Speaker
as a tool to ah benefit humanity and to achieve all of the economic growth and scientific potential that we do want from AI and we can get from AI if we get all of these things right.
00:11:11
Speaker
This is an alternative plan compared to the plan of the ah leaders of the AGI corporations. Why isn't it that what they're doing will work? So for example, why won't reinforcement learning from human feedback, which is the currently a dominant technique, why isn't that enough to to keep us safe and to to keep AI models aligned to our our values?
00:11:35
Speaker
For example, when I use chat activity, it seems relatively helpful and aligned to me. Why won't that just continue to be the case? in two parts are specifically about refers to learning from human feedback. That's a plan that I think even even the companies themselves do not expect will scale to intelligence. They're just quite upfront about it. I've also when I use GPT, I'm pretty happy with the answers I get, but you know what what these companies are building is not more chatbots. you know If we only had chatbots, we would have some things to worry about, but not at this scale. The reality is that what the development is going towards and is already being built for is to have
00:12:12
Speaker
You know, generalist agents, AIs is agents that take actions in the real world autonomously on their own, connected, they already are, you know, connected to tools, connected to the internet and so on. And what RLHF ultimately is, is it's essentially kind of training the model with ah with a stick and with some reward. It's like, you know, you have a, it's like having up an animal that you don't really understand. And whenever it does something nice, you give it a candy and whenever it does something that you don't like, you just hit it with a stick.
00:12:39
Speaker
Not very nice, but you know that's how RLHF works. It doesn't tell you much about internally how the system is representing this information and this learning. It's not telling you whether the system will just go and do something else when it's in a different you know environment in the real world and you couldn't predict and ultimately it doesn't even address a lot of the fundamental issues that come with scaling to super supreme intelligence. Things like using AIs to improve other AIs or like having these AIs act autonomously in the real world. This will lead to the AI itself changing to like next generation of AI changing. that This is a very, very limited way to deal with that. But I would say a at a kind of a ah bigger picture level,
00:13:27
Speaker
I am not sure I have seen even a plan from the companies. like I've seen you know maybe in concepts of a plan, things like RLHF, but I don't think the companies are quite public, at least ah with their plans, if they have them at all. the you know The plan seems to be let's continue building super intelligence and let's see what happens and you know we will dedicate some of our budgets to doing some safety research, but deployment comes first and you know release comes first anyways. you know Even then, even like even if one of them figures out the way to keep the systems controlled, how are they going to make sure that the other companies don't just deploy the dangerous system? right like That's not not a single company issue to solve. It's not even a single country issue to solve. you know Maybe you can get
00:14:12
Speaker
all of the companies in one country to well follow the solution to this problem that has been found. No, not found yet, but found in the future. What about another country where they're just developing the same thing? How do you make them do that? How do you make sure? And this is why we need a plan that just goes that goes also just beyond the technical solution. The technical solution is part of the problem and we need it urgently, especially on this tight deadline, but there are a lot of governance and policy questions at the national and global level that are just you know left undone and we need to do them fast.
00:14:47
Speaker
We do seem to be in this tragic situation where there is, say, an AGI corporation developing more and more advanced systems. And in response, and another corporation arises where their claim is, we will do this more responsibly. We won't raise a head. And then you you suddenly get this proliferation of companies competing with each other, where it is is ah created in response to the to the other ones and not acting responsibly.
00:15:15
Speaker
Yeah, it's quite ironic. It's a big irony, it seems, yeah yeah. You even have the ones that start having, you know, safe in the name to... And they come to tell you, well, if it is safe in the name, it's gonna be fine, right? Yeah, yeah. You're referring to Ilya Sutske versus Safe Soup and Tilt. It's incorporated, yeah.
00:15:31
Speaker
Exactly. All right. So phase zero is about safety. And there the plan is to but prevent the development of superintelligence for 20 years. Where does that number come from? Why 20 years? Why not five years? Why not 50 years? This number is essentially a forcing function. It's a way like it's very, very often in policy, you know, one ends up going or like vague ideas or ideas that are difficult to operationalize. Like but first of all, putting a number forces you to check. and This is why we put A number to check, okay, if I actually assess all of these measures and they all get, you know, are all in place, do I predict to expect this actually stops it for 20 years or not? It lets you do the exercise more like a, you know, solving a mathematical proof exercise rather than having a general feeling of whether things are going to help on the margin or not. Then,

Timeframe and Control Measures for AI

00:16:26
Speaker
you know, why 20 specifically? It takes a long time for governments to do things.
00:16:30
Speaker
So some people might think that 20 is too little. I definitely know a few. Some people might think that 20 is too much. I think generally people that know how how long it takes to build institutions and build new foundations for science, I think generally are on the on the side of it might even be too little. But if you might, you can do kind kind of speed around the whole process and set up new international institutions, very robust ah safety policies at a national level in all major AI players.
00:17:00
Speaker
achieve global coordination on this and build the foundations for you know scientific foundations for building safe transformative AI, then we can you know we can do this in less than 20 years. But I think we should account, number one, for these things are hard and take time and we cannot afford to miss the deadline.
00:17:19
Speaker
and Second of all, you know planning fallacy, it takes a long time and it's it's quite complicated to get agreement on a lot of things. It's possible, we have died before, but it's quite complicated. I think 20 years gives us a good yeah good window to still take. it Also, this doesn't mean stopping other AI development. In these 20 years, we will have massive transformation. A society with no super intelligence is still a society where You know, the vast majority of and labor will be automated at current rates, the society where we will have a completely different that education environment, a completely different work environment in the next five to 10 years, even without super intelligence.
00:17:56
Speaker
It's a very important point. I think it's worth stressing that the what control AI your organization is against is not all AI development. It is this development specifically of superintelligence. And so, yeah, is is this is not about shutting down all AI development. It is about this specific issue. Exactly.
00:18:16
Speaker
And it it seems to be at least in principle that you know we should we should not develop superintelligence for as long as it takes to solve the problems we need to solve. So solving the problem of understanding these systems and aligning these systems. But I can see the point and in having this 20 years where you're not developing such a system as something for people to coalesce around. And and I take it that that this is the this is the reason for 20 years specifically.
00:18:43
Speaker
Yes, exactly. The the first objection that that comes up when you propose something like this is that setting up international institutions that will control what everyone on earth can do, specifically that they can't develop superintelligence, that is a level of kind of authoritarian control that we simply can't accept.
00:19:03
Speaker
And so weighing that risk, the risk of authoritarian control over technological development against the risk of superintelligence, some people might come down on the side of we need to prevent this this control from from being implemented.
00:19:20
Speaker
Well, I think sometimes people get a bit ideological on these issues also because they they forget that no we live in a world where you cannot just build a nuke in your backyard, right? Like we already have the things and and for good reason, and they they have nothing to do with you know international totalitarianism are more similar. like we You cannot build biological weapons, you cannot build a nuclear backyard, and we're very happy that you cannot do that because otherwise it will be a very unstable world. The world is made of trade-offs. Obviously, like I'm not going to just lie and say there are no trade-offs. Obviously, there is a trade-off between ah speed and and safety. Sometimes there is a trade-off between security and proliferation, and we need to be
00:20:01
Speaker
know to make mature decisions as you know as countries, as individuals, and as civilizations to take these trade-offs in a way that maximizes our collective you know our collective gain, our collective benefit. The reality is that if we just rush ahead right now and anybody on the planet can just develop superintelligence,
00:20:20
Speaker
we will die or you know we're quite likely to die. I believe we will die. Some people are skeptical, but the yeah the fundamental reality is that if we if we have the system the system set up like that, it's utterly unstable. It's not goingre going to be possible to prevent all risks that come from it.
00:20:36
Speaker
So we need to find a different path. We have found one with nuclear weapons. Again, it's thanks to the efforts that countries like the US and scientists on both sides of the Iron Curtain did that we have a stable nuclear international system. It has a lot of flaws, but we're still alive today. There could be a different world where there was a massive nuclear war after World War II, and you know me and you and everybody else listening wouldn't be around here today to talk about this stuff. So we did manage to do it with other extinction level technologies. We also as a civilization, and you know if everything goes well and we succeed, we'll keep building bigger and bigger technologies. That's great, but also that comes with more and more responsibility. The bigger the technology you build, the more the
00:21:25
Speaker
downside risks, the more the blast radius of it is going to increase. And if every time we build a technology, we just distribute it everywhere with no safeguards to everybody all the time, this is a you know a recipe for a civilization that doesn't doesn it make it at some point. You just need one or two or very few actors to screw it up for everybody else.
00:21:45
Speaker
And you know if we are to become a great civilization, even greater than what we are right now, and explore the stars and live healthy and long lives, a much larger scale than we do right now, we need to grapple with these fundamental questions and we need to be able to deal with technologies that affect everybody on the planet.
00:22:06
Speaker
Right now, it's it's difficult for me to imagine a technology that could be as transformative as superintelligence. But I can imagine a period, say, after we developed superintelligence, where we suddenly are faced with a bunch of new technologies and some of these technologies could be destructive. So it's it seems like a very a good idea to have set up a system for for handling such technologies at an international level. You sketch out what the conditions for achieving safety would look like.
00:22:36
Speaker
And I would like us to to go through each of them because these are these are actions you could take while developing AI that would be especially dangerous. And the first one of these is that we do not want AIs to be improving AIs. Do you think this is to some extent already happening and how do we prevent it from advancing further?
00:22:58
Speaker
Yeah, so this one is a crucial one it kind of to give the the full picture. The reason for this one is is that whatever safeguards you put in place, whatever you know limits you put in place on AI development, if AIs can improve themselves or can be used to improve AIs, it's very easy for them to now break out of these limits, of of these bounds that you put in place, and that's going to nullify any defenses that you've you've put in place. so like it's ah We put it as a necessary condition,
00:23:26
Speaker
do have a safety regime that works because this just trivializes kind of any defense you have in place if yeah it had happened. Some amount of this is happening. Now, in and our policy, we focus we worked a lot to try to find a measure that is you know really surgical and affects the most dangerous forms of AI's improving AI's while also leaving a lot of know beneficial normal software untouched.
00:23:54
Speaker
so some Sometimes you need to make trade-offs, but when you can't find a precise way, it's even better. so For doing this, week we coined the idea of found systems. so there's a Sometimes in in common parlance, we use AI to talk about a lot of different things, but there's a scale of two kinds of AI that are intuitively quite different. There's normal software that humans like me or you write down, and then there's this in some ways quite strange form of AIs, which are a lot of things like child GPT or modern large average models that are not really designed or or written by humans, but they're kind of grown. They're found via mathematical optimization. This is why we call them found systems. And
00:24:37
Speaker
This you know is one of the reasons why we it is so inscrutable for us in many reasons. It's because we don't just design them, we don't don't just draw them out and have a plan for them or we just find them via mathematical optimization. And also these are the ones that are most concerning and most dangerous because we don't have good ways to understand them and and bound them to put like to understand in advance what the what's the limits or what they will be able to do and you know what are the what's going on inside them and how can we you know limit in which directions they they grow and so for no AI is improving the eyes as we focus on no found systems and no
00:25:15
Speaker
AIs, grown improving other grown AIs. Why? If we have AIs, improving AIs at machine speed, there is you know very little way for humans to actually keep up with what's going on. They're much lesser than us. We don't even understand them right now. Imagine in an intelligence explosion where they keep improving themselves or improving other AIs, and that's you know a recipe for a disaster and for having an uncontrolled intelligence explosion that we just cannot follow.
00:25:44
Speaker
While we're happy with, you know, we want to leave a lot of normal applications like humans improving AIs or like humans using handwritten code to optimize other handwritten code with the way that we we spell out the policy, we you know we find a way to to leave that untouched. So what would be excluded here would be, for example, a researcher at OpenAI using a language model to help him generate code and then using that code to improve their systems.
00:26:12
Speaker
and Would you also exclude, say, NVIDIA using AI for chip design? ah In this case, ah yes, we don't cover using the eye for ship design. For the case of the OpenAI employee, that's quite interesting. so no Obviously, laws are laws are an exercise in trying to bring precision, but you can go infinitely deep with legal definitions that there's always going to be edge cases. so that kind of The ah clear case that we want to prevent is having GPT-4 considerably help with the development of GPT-5. That's not what we want to see happen.
00:26:47
Speaker
What we are clearly not covering is things like using author AI to record this meeting and transcribe it and then review it or you know to record an AI development meeting and transcribe it and review it. Of course, the reality is that things are always on the spectrum. Right now, we already see companies, you know despite I think some ultimate said at some point that recursive self-improvement is something that's so dangerous, OpenAI would never do it. Yet in practice, we do see that in companies right now, like Antropic and like OpenAI,
00:27:17
Speaker
They do say that they they use their own and current ah systems to speed up their machine learning engineers for the development of the future systems. and Again, this is a spectrum. It's always difficult to draw draw a line. Obviously, the more this is done and the more this is delegated to the machine that we don't understand, the more this is dangerous. and so and We need a policy that kind of says, no,
00:27:41
Speaker
the full recourses of improvement with machines is not allowed. You should have protocols in place to prevent this. You should teach your employees that this is something very dangerous that you shouldn't do. And then for the edge cases, we have we have courts and we have discussions to find exactly how they work. Yeah, makes sense. Another development that could be dangerous is if you allow systems to break out of their environments. Maybe you could describe how could this happen and Isn't this a case where this isn't even in the interest of the AGI corporations themselves? So perhaps there's some room for agreement here. Yeah. So this is a case, this is another one of these conditions that is really foundational because similarly to AI's improving AI's, you know whatever limits you put in place to limit the power and the danger of AI systems, if the AI is capable of just breaking out of their virtual or physical environment,
00:28:39
Speaker
they're going to nullify all of your countermeasures. So again, if the AI can just open his room and leave the room and do something else, then your room is useless. If your prisoner can just open the jail cell, the jail cell is useless.
00:28:55
Speaker
In this case, this is something ah ironically like, well, and I was developing this policy. I thought of it, you know, I thought it could happen quite soon, but I could not expect that in September, it essentially was reported in the O1 model card.
00:29:10
Speaker
They're doing the up open AI testing of O1. O1 did something similar to this. you know there's contentional whether it's There's always contention whether it's exactly breaking out or not. This is why we need clear rules put by the third parties and having third party inspectors decide and adjudicate, not just the companies themselves. But in this case, the model essentially was given a computer security challenge, a capture the flag challenge to solve. The challenge was broken in a certain way.
00:29:37
Speaker
And the model realized, well, I still want to solve it. So it found a way to gain kind of root access to the virtual machine where it was going on, start a new environment, and then solve the challenge in that new environment. It essentially broke out of one level of boundary, not all of them.
00:29:55
Speaker
you know didn't leave the server, but it broke out of one level of boundary and that was set up for it to solve a challenge that was impossible otherwise. So this this is very concerning. you know If you have a system that is this capable and you put safeguards in place, you're goingnna you know the system can just find a way to just break out of them and and leave and nullify all of your safeguards. so And indeed, this shouldn't be ah in the interest of the companies to have this happen. But again, there are trade-offs. The more power, the kind of the more raw power and the more more raw access you give to your AI systems, the more in the short term you might feel like you have an economic incentive. yeah It's it's you know just doing it' just just doing whatever it thinks is best. It's just has access to everything. It can can solve all of your problems.
00:30:44
Speaker
the you know the the count The other side of that is, yes, with complete free access and complete no restrictions, then the risks are really severe and you nullify all of the other safety measures. So we need the common sense approach of, no, no AIs that can break out of their environment, make sure that they cannot.
00:31:02
Speaker
build your environment in a way that doesn't make this possible, and the burden of proof should be on the company to show that they're not, definitely they're not developing these types of models intentionally. If they find this, they find a way to remove this ability or constrain it, and if they cannot, well, this this shouldn't be around.
00:31:20
Speaker
The case you mentioned from the 01 model card is like a miniature version of what we might expect with more powerful models. You can imagine something like a a hedge fund deploying a model that's ah that's trading and giving it a bunch of access to their system and the the model beginning to do something that that is that is not within the bounds of what they expected.
00:31:42
Speaker
so Trading more money or breaking laws or something like that. That's a much more large scale that doesn't seem out of out of the question. And I also think we will to both professionals, but also kind of the the the broad public ah will begin to give these systems more and more access.
00:31:59
Speaker
you will probably be attempting to give these systems access to your email, to your calendar. And just in those very kind of limited circumstances, with that access, what ah what could i a a model be capable of doing? Could a model buy more compute power for itself on Amazon or something like that?
00:32:18
Speaker
But do you worry less about this because, again, it might be in the interest of the companies themselves to to to prevent this? I mean, this is this isn't in the interest of ah me as a consumer, for example, or even the people who run the the hedge fund, which would would yeah face legal trouble because their model escaped from from from the boundaries that it was believed to be operating within.
00:32:42
Speaker
but I think this is precisely why we need to have these as rules, not just as general incentives. i mean In theory, everybody has an incentive not to go extinct from AI yet. and and This risk is even recognized by the companies developing some of the most powerful models, yet this is still happening. right so If we just rely on incentives, and we we could also make incentives stronger. like There is a you know There's a strong case for ah liability ah because liability aligns, you know, the incentives of the, of the developer with having to prevent these kinds of things from happening in the first place. Otherwise they will face penalties. This is a, this is exactly the yeah approach here. The approach here is to say, here are some conditions that we actually need to enforce. We can't just rely on the goodwill of companies. No, even if all companies are well intentioned.
00:33:33
Speaker
competitive pressures, rogue actors, accidents will happen unless there are protocols in place, rules in place, ways is in place to mitigate this. And we know we just need to make it actually compulsory. And like it has happened with many other industries, this will quickly improve the situation and make sure that these things actually don't don't happen. What is an unbounded AI and and why why is that dangerous?
00:34:01
Speaker
One other one of our conditions is no unbounded AIs. I know the name of this one is a little bit arcane, but I will try to make it clear. so The idea here is that in almost any you know high-risk engineering sector, we actually know in advance before even developing a system or building a system, think of like a nuclear power plant, you can sketch out a or building a plane that needs to fly and not kill a bunch of people.
00:34:27
Speaker
the The developers and the builders are going to sketch out a blueprint. It's going to have a lot of assumptions and a lot of calculations on exactly how this thing will fail, what are the safety margins, you know what amount of pressure this can withstand. you know In the case of a bridge, you need to know in advance, and you can know in advance if you make your calculations, how many cars can it withstand before it's going to collapse. and we we kind of In many ways, we expect that this is happening with with everything that that we deal with. But AI is is very strange. it's ah It's an industry that's kind of developed without all of this basic and normal common sense.
00:35:01
Speaker
engineering practices where developers don't even know before finishing the training run what their AIs can do and and don't even design them in many ways, right? they we've We've talked about how they're kind of grown rather than designed and just discover capabilities is way long, you know way after the fact, even after the model has been tested and evaluated.
00:35:24
Speaker
for a week, maybe one year later, you're still discovering new capabilities and new ways that you can do stuff. And that's just an essentially an untenable way to deal with safety. If this was the approach with safety for bridges or for nuclear power plants, bridges wouldn't stand and power plants would explode all the time. What we need to have is instead AIs that before deployment, a developer should just be able to say, i can you know given this conditions that I have in mind, I can expect that this AI system will be able to do this, but not that. It's quite easy for some AI systems. For example, if you have a CNN, a convolutional neural network.
00:36:06
Speaker
to scan for cancer, for example. It's quite easy to save as a developer. Look, this this model is only trained on cancer pictures, it's only trained on images, and it can only output text.
00:36:19
Speaker
I am pretty confident it's you know it's not going to be able to hack other systems. It's not going to be able to generate or execute code. I'm i'm not giving it that access to like anything that can execute code, and it's not going to be able to self-improve. Let's go for a bigger system like AlphaFault. It's a very powerful system.
00:36:41
Speaker
you know It's a marvel of science and engineering, but it can just encode proteins. You know you you can be quite, maybe not 100% certain, that's always hard, but you can give a very confident estimate. The system is bounded just to the protein folding domain. you know It's going to output protein structures. It's not going to output images. It's not going to output other stuff, especially if we don't give it access to black tools that could later do that.
00:37:06
Speaker
Yeah, so another way of of making this or stating this difference might be saying that unbounded systems are general systems and those systems are are opposed to narrow systems like alpha fold, where you get some form of safety built in because yes, alpha fold is superhuman in the domain of protein folding, but it's not superhuman and in other domains.
00:37:26
Speaker
i so I actually do think that it is possible to bound general systems. It shut just hasn't been done. ah you know In principle, it's definitely possible. you know yeah Maybe not in the full limit. Again, we go back to like you know full super intelligence. It's very hard to like give it like actual God-like AI.
00:37:43
Speaker
how are you going to predict these bounds? But with better understanding, there is no reason why you know you're better interpretability or with better understanding or new new ways to to design and and train for fully general systems, you could still give you know certain bounds. and In practice, what we ask for is like for for any capability of concerns or any capability that is considered dangerous or is illegal in a certain jurisdiction, a classic example would be the most you know Increasingly, most jurisdictions that will be concerned about AIs enabling the development of biological weapons or you know ah assisting with classified information about nuclear weapons. Now, if you if you do have as know if you as a developer, you have a way to demonstrate that your fully general system
00:38:29
Speaker
will not generate this information, that's great. you know but We don't need to be prescriptive. we just like Companies and developers should find ways to innovate to to match this. We can leave it to them. you know If you have a full interpretability technique that lets you do this or you you ah A simple way could be, look, my model, I can guarantee you it has not been trained on a single, you know, on any chemical data. It just, there's no way to talk about the chemistry. There's no way to talk about chemistry. It's only going to talk about other stuff, nothing in the training data. It hasn't really derived this from learning about the English language and so on. That's great. That's no that's that's enough proof that we need, but we do need to start having these things. Otherwise, if we don't, you know, if we don't,
00:39:18
Speaker
they also cannot bound the ways in which they can fail or act. that Again, it breaks all of the other assumptions that we can make about them because we don't even know what the full extent of what they will be capable of of doing is. So I think here we arrive at phase one, which is about stability.
00:39:37
Speaker
And this is about how to implement a system that actually upholds the prohibitions on systems that are that are dangerous and and we would like to avoid. So you have and a number of conditions for for stability also. And perhaps we could we could walk through those in a similar fashion.

Global Stability and AI Regulation

00:39:52
Speaker
Yeah, where the first one is nonproliferation. And this I take it that this is just about we don't want AR systems spreading across the globe in a kind of unregulated way. How do how would you explain nonproliferation?
00:40:06
Speaker
Yeah, absolutely. So so in non-proliferation, and you know here yeah's it's a common idea in arms controls and with other powerful technologies like nuclear, where you know ultimately if you have a technology that can be dangerous or you can have a trifecta, the more people and the more countries that have access to it, the more something can go wrong. you know You're kind of multiplying the downside risk by the number of actors that can use it. So you need to find a way for the most but fundamentally dangerous systems to not proliferate it across them. The more they are, the harder it is to govern them.
00:40:44
Speaker
the harder it is to monitor them. And also if they proliferate, it makes it really difficult for agreements to stay in place because countries will start to feel like, well, you know, I've, I finally decided to implement safety measures and, you know, limit the development of AI in a certain way. But all of these other countries are just rushing ahead and they're getting access to it and they are, you know, developing beyond this safeguards. Why am I limiting myself?
00:41:13
Speaker
competitive pressures you know build builds up over time and leads to kind of skirting the safety measures rather than following them. Would would nonproliferation set bounds on open sourcing AI?
00:41:26
Speaker
So, well, I think the open sourcing AI question is quite an interesting one, first of all, because it's very often a misused term. There are very few, especially of the most powerful models, there are very few that actually have an actual open source license or that actually follow the principle of open source. They generally just release the weights. So I usually like to call it open weights, which is not the same as i yeah having the source code, just being able to replicate it at at home, you know, good luck finding out of the compute that a meta has and so on.
00:41:56
Speaker
But yeah, like ultimately, the reality is that for sufficiently powerful AI systems, we cannot have them spread across the planet and ungoverned. you know if If you accept the premise that some AI systems will lead to catastrophic risks, if you have 8 billion people on the planet, and those 8 billion people on the planet include terrorists, they include sociopaths, they include people that just want to Hurt other people, they include rogue states. The moment they have access to those, then you need to find a way to remove access to those systems from from them, otherwise you face a threat. So at a certain level of capability and at a certain you know certain level of danger,
00:42:39
Speaker
it is untenable to just a proliferation. One sad thing is that I actually do really understand the approach of many people that are in good faith, like trying to concern about, you know, overreach of, you know, international measures and like, you know, global surveillance. But the the sad thing is the more we proliferate powerful AI, the more the only way that we have a stable future system is with more and more surveillance. Like if you have new level technology in every house,
00:43:10
Speaker
The war is either a has much more surveillance than we have right now, or we we die in a nuclear war. So we should rather seek to you know not proliferate now and have roughly, you know, similarly similarly powerful to like what we have on nukes measures on AI that don't invade our privacy in their life, right? And Russia had to proliferate and then the only way we have ah we have to get out is much more invasive measures. They might know't they might still be worth it because you not dying is a pretty nice deal, but I would just rather have a future where we keep a lot of our privacy and we don't have dangerous technology proliferated and we can easily govern it in a few areas. Which type of international structure would we need in order to prevent proliferation?
00:43:56
Speaker
Yeah, so the structure that we propose is quite similar to what we have in place for nuclear, ah because there are a lot of similarities actually between AI and nuclear, some differences, but a lot of similarities. A few of them are, they're both extremely powerful technologies that are inherently dual use, you know you can use AI in a lot of narrow civilian beneficial applications. You can also build AI to create technologies that are as powerful as nukes if not more, say with nuclear weapons and nuclear civilian power. They both
00:44:32
Speaker
AI, despite being software, it's still quite reliant on a physical resource, a bit like your uranium for nukes and for AI, this is compute. So there is this there are various inputs to AI, but the most important one, definitely the most governable one is compute computing power. these are This is not just digital, these are like physical big, you know, GPU is big machines, big supercomputers that sit in large scale data centers, they're fairly easy to monitor, they're physical, they're large, they're expensive, and there is a very thin supply chain that produces them globally, so they're quite easy to try.
00:45:07
Speaker
and this know maps maps quite similarly to uranium and plutonium for nukes. A few differences are that nukes are just a weapon. you know They are just ultimately, even if they are extremely powerful, they're just sitting there and the way they can go wrong is by human misuse or accident. With AI, especially with AI that many companies try to build,
00:45:31
Speaker
we are dealing with agents, we are dealing with ah essentially entities that we should model that are much easier to model as essentially oft adversaries or like a fleet of human operators but on a computer or you know a fleet of analysts but on a computer rather than just a kind of a static weapon.
00:45:50
Speaker
you know they take actions on their own, they're they're built to take action on their own, and they can model you like an adversary would, you know, say they're much closer to an adversary of a foreign force rather than just a weapon. and This is a key difference. But yeah, in terms of international structure, ah what we propose is very similar to what exists for nuclear. So we have, we propose and international agency that is a akin to the IAEA, so the International Atomic Agency, that would at the same time have monitoring and ah verification mechanisms to kind of monitor the stock puzzle or compute, monitor that the safety conditions are being enforced across countries, make sure the countries have licensing regime to themselves, monitor these safety conditions and let countries also monitor each other.
00:46:41
Speaker
You know, always trust, but but verify, it's very important that each other are not violating these commitments. And I think this is absolutely possible as we've done it with nukes. Wouldn't this run into kind of the classical problem of international law, which is that we don't have strong enforcement mechanisms and it's difficult for for countries to agree. So how, for example, would would the the US and China agree on on these ah kind of international structures?
00:47:08
Speaker
yeah so It's worth noting that the IAA does exist and it it works pretty well. so like We have succeeded in some areas in some pretty dicey areas. It's not easy. you know this is why This is a narrow path. the The reality is that there are a lot of paths in front of us. This is also has been what Geoffrey Hinton this week said in his ah Nobel Prize response that we are at a bifurcation point in history where we are facing this extinction level threat coming up very soon in the next few years, we will need to figure out whether we have plans to deal with it or not. So most of the paths are going to lead to, you know, paths where we don't try to cooperate or we try to cooperate and fail or something goes wrong. They're going to lead to a bad outcome, but we need to try.
00:47:57
Speaker
to go for the good outcome, and we have achieved it before. Again, the IAA exists for nuclear weapons. The biological weapons convention has ah prohibited biological weapons worldwide, and it sometimes there are defections, but you know all in all, we have a very limited number of biological weapons being actually used. Human cloning was a case where it was a very strong and immediate ban on a technology that would have been very, very ah economically and militarily and strategically viable for countries. And this has been adopted in the US, in China, in Russia, and so on. So we we don't need to try. And the important thing to understand, and this is why I think
00:48:41
Speaker
some other approaches prop proposed in this field are either naive or disingenuous is that the alternative to cooperation is war and war is nuclear war. you know We are in a world with powerful you know nuclear states and you know we either find a way to cooperate and to prevent the development of superintelligence together, or the alternative is that, you know, one country will need to force the others to prevent the development of superintelligence. And forcing a nuclear power is quite tricky. So we should test the cooperation route. And again, always trust, but verify, or you know sometimes not even trust, just verify, but we should test the cooperation route before going for the
00:49:27
Speaker
war route I do not believe there is a route where somehow you know one player just takes over all of the other ones, but there is no conflict. Trying to take over other countries leads to conflict. That's the definition of take over, and that's how you know international geopolitics works.
00:49:43
Speaker
How do you trust but verify? So how specifically, how do you verify? Do we have interesting technical measures or solutions for how to how to look at ah which training runs are are on the way in in different countries and so on?
00:49:58
Speaker
That's a very good question. The good news is that we increasingly have more of those technical solutions. Before going to the technical solutions, I do want to stress that the first step is we need the processes and institutions in place. like We shouldn't feel bottlenecked by the technology. The technology helps, but kind of the foundation also for using the technology and deploying these better and better monetary technologies are actual political agreements and deciding that the policy you know is worth pursuing. Generally, you know despite despite all things, if major governments, especially governments like the US decide to do something, it gets done. The main thing is actually decide to do something and then you know we will find a way
00:50:45
Speaker
to solve it via you know going from the old-timey in-person monitoring via inspectors coming with with a suit and with a bag to very sophisticated new technologies. The good news is that we also have more and more sophisticated new technologies. So there are a few ways. So some ways are, again, more a legacy, but can still be used. So data centers are large. They consume a lot of energy. They produce a variety of signatures that are verifiable.
00:51:14
Speaker
But we also propose in our plan a specific approach of limiting the total amount of flops, the total amount of computing power per data center, a pretty high limit. But still, this is to ensure that if somebody wants to illegally break out of the limits, we have the equivalent of a nuclear but breakout time that we can calculate in advance.
00:51:37
Speaker
of they will need to like smuggle more GPUs into that, and it will still take them you know a certain number of years before being able to train such a larger model. And this we can just do this right now. We don't need technologies just by having an enforced limit on the total size of a data center, not just the total size of a single training run.
00:51:59
Speaker
In the near future, sure there are proposals you know that are being worked on right now. There are prototypes being worked on right now. One of the while the various proposals is guaranteeable chips that people like Yoshua Bengio and others are um working on is to have on-chip monitoring mechanisms. This will be kind of the the gold standard for mutual verification where you can have On the chip, a way to verify whether the training run is authorized, whether the entity doing the training run has still a valid license. You can verify the amount of computing power being used for the training run, a variety of other specifications that you might want to verify.
00:52:40
Speaker
in this kind of mechanism. And that that in many ways would be much, much, much, much more visibility than we even have on nuclear. So it's ah an easier job once you have this kind of stuff than we do with nuclear.
00:52:52
Speaker
yeah And it might be possible to get these hardware mechanisms because, as you mentioned, the supply chain is extremely concentrated into ASML and TSMC and NVIDIA as as the main players. And therefore, it seems like if if those players could be convinced that this would be a good thing, it might be possible. so do Do you worry about distributed training runs? So say, instead of having a one training run running in a data center with a million GPUs,
00:53:19
Speaker
you spread those out over a hundred different data centers and and thereby conceal that you're you're actually training an enormous model. Yeah, so for this, I mean, that's obviously going to be yeah one threat model from actors that are trying to defect. ah This does lead to a lot of inefficiencies at the moment. you know It's kind of naturally a big factor in in training rounds at the moment is interconnect speed. So kind of like how quickly does information flow between nearby GPUs? And you know here you're doing it over the internet, you're doing it kind of spread out potentially globally.
00:53:53
Speaker
across the jurisdictions, you will have challenges there. But this is why we do need to have, and eventually, we need to have an international oversight system where you know some countries get together and decide to implement this and then make sure to monitor and sanction countries that do not follow the system. We need to have a carrot, we need to find ways to get to make it incentivized joining the system. But also, obviously, if there are defections, like in any other area of international the law, they should be prevented and it should be sanctioned and limited. And a carbide that is trying to like circumvent these measures should be sanctioned and there should be incentives against doing these kinds of things. But this is the the kind of stuff that, for example, with a robust military regime, even without on-chip mechanisms, is you know strongly solvable. It's it's ah solved by these approaches.
00:54:51
Speaker
What if it turns out that using inference time compute is a really efficient way to get capability gains or performance gains? So you mentioned this earlier in the episode, but when you use chat DPT and it's big it's thinking about something that's that's inference time compute, it's ah it's not entirely public how that works, but it's somehow reflecting on its own output and and getting to a better result.
00:55:16
Speaker
and What if it turns out that techniques like those are perhaps a better use of of compute than than using compute for for training bigger models? Does that kind of invalidate the whole regime of compute governance with these hardware mechanisms?
00:55:31
Speaker
I think it challenges it, but it doesn't invalidate it. And also this is ah precisely the reason why we draw out those safety conditions in phase zero that go beyond just compute thresholds. So like, as we discussed earlier, compute is a very good proxy, but it's just one proxy. You know, ideally we would have these.
00:55:51
Speaker
and We should build them these clear metrics of intelligence and ability to predict over time, you know whether you put more inference time compute, less inference time compute. ah Also, the safety conditions were drawn out and all the all of the various phases were drawn out in a paradigm agnostic way. so while Obviously, now we all have in our mind the current scaling, LLMs plus RL plus other things as the main dominant paradigm, we should have systems that are know institutions that are robust enough to deal also with paradigm shifts. Scaling stops, but you know there's going to be other breakthroughs. you know we
00:56:29
Speaker
this is ah problem we need to solve at some point, you know, if it's not by scaling, there might be like breakthroughs in RL that pushes forward and other other things like that. And so this is why the defensive depth approach of having some measures that limit the general intelligence of AI systems via compute, this is why we have a licensing regime. Some measures that are just fundamentally necessary, like no AI is improving, ai no AIs, no AIs that can break out of their environment.
00:57:01
Speaker
And we need to complement the compute approach with starting to build this understanding of intelligence at a theoretical and empirical level. And we should, for example, match them. One proposal that we have in our licensing regime is to have a measure of which test AIs and see how they perform against remote workers. And if they exceed the performance of remote workers, they should be equivalent to crossing a compute threshold. So this is to capture a breakthrough in algorithmic efficiency that just makes them much more generally intelligent while still being below the compute threshold. We should have these other ways. We should have these fail safes to capture that.
00:57:43
Speaker
that That sounds to me almost like a natural benchmark or something that that that will emerge in the in the wild if you have ah you have these these services online where you can purchase, where you can make contracts with remote workers. And if if it turns out that that those from ah remote workers begin being outcompeted by ah AI remote workers, well, then you have then you have a a definite increase in capabilities, whether or not that comes from another enormous model or better use of inference time, compute or whatever that comes from. is that Would you formalize that into a benchmark or would you would you collect the data after the fact from these platforms online or or from from whatever source? I think governments should start working on formalizing these benchmarks. This is is kind of exactly how we build other sciences starting from zero, things like temperature. you know We started with people putting their left hand into pot of cold water, their right hand within their
00:58:41
Speaker
in into a pot of hot water and then note him down the feeling of difference. and You need to start somewhere and then bootstrap to a robust measurement science, to a robust metrology. And we we should just start doing this with AI. Governments are so luckily building up their capacity with things like the AI safety and and evaluation institutes. This is yeah a great place to start developing this. we you know We should involve much more economists to do rigorous empirical studies We see a lot of these things like you know AI passed an IQ test or AI passed an example. That's great, is ah but we need to do more. We need to do it formalized. We need to do it more. We need to have a big investment into this. We need to understand exactly at which levels
00:59:25
Speaker
AIs are, and that's going to be a good fail-safe to combine with the compute-based measures to catch exactly this natural metric. Because ultimately, we know that us humans, you know as general intelligence, we can develop AI. So roughly, this is the you know the the big intuition behind AGI and superintelligence.
00:59:45
Speaker
If you have an AI that is as competent as a human and they can do AI research and they can do activities as well as a human, you are facing a ah very quick ah rapid acceleration into more and more powerful AI systems. We know we're capable of doing this. We know that you know machines are much faster than us. Once we cross the threshold of generally capable as a human,
01:00:09
Speaker
we are in the danger territory. So we should start actually measuring this yardstick. Yes, it's hard. Yes, the first tries are going to be not perfect, but this is this is how to start.
01:00:20
Speaker
If we return to the issue of international cooperation, from the perspective of a country deciding whether to sign up for for some international agreement, how are they making that decision? Well, they're thinking about the kind of ah benefits and costs, right? And so you write about how there should be benefits to to cooperating with these international agreements and structures. Say you are in Nigeria, for example, and you're deciding whether whether it's to sign up, you might think, well, I mean, maybe I can i can grow faster. My economy can grow faster if I if i don't sign up because, you know, these these models can be very powerful. What can be offered on the other side? with what Which types of benefits can be offered for cooperating with with these international structures?
01:01:02
Speaker
Yes. A fundamental and important point of an international structure is to also have these benefits for cooperation. On one hand, again, there should be a clear incentive and there there should be ah the incentive should be made clear that we if we don't cooperate, we are facing an existential level threat. global security it does not no Global security is national security. You are at risk as well. But it's good to add to this essentially negative incentive, the positive incentive, we should have some of the most risky, ah but also potentially most beneficial ah AI development done in an international agency modeled similarly to the idea of a certain for AI. In this plan, we call it GUARD, Global Unit for AI Research and Development, in which there will be security, like national security level protocols put in place,
01:01:57
Speaker
that enable more advanced AI research than the one that is ah allowed to be done in private companies. And the benefits of this, once they're proven safe, they are shared to a signatory conference. So you have you have, on one hand, the non-proliferation regime and monitoring regime via the IAA for AI, the International AI Safety Commission. And on the other hand, you have an outlet to channel no right then devolving into comparative pressures, as you say, while counter-moving ahead on its own and ah trying to develop its own AI and risking to bridge the superintelligence line accidentally or willingly, you have a joint effort to build powerful transformative AI technology that will then be given, proven safe, of course, to secondary countries to address challenges like the fundamental scientific challenges, automating certain amounts of labor, and so on.
01:02:53
Speaker
Wouldn't you expect many of these advances happening in the kind of AI or CERN4AI international research lab? Wouldn't those advances be dual use in the sense that you know if if if you gain some knowledge, if you find out how to do something in it in us in a fundamentally ah better way, that can often be both be used for both good and bad? And so I would worry if I'm Nigeria thinking about whether to sign up, would I actually receive these benefits? Would I be provided with the benefits of this CERN4AI?
01:03:23
Speaker
Well, so this is why, so this is one of the things that is, you know, it's still going to be an issue even if we do solve AI control that like, you know, AI misuse will not go away. So they in a similar fashion, so a controllable transformative AI is going to look in many ways like alpha fold, you know, you can imagine that alpha fold or, you know, a more powerful alpha fold for most of chemistry and biology.
01:03:48
Speaker
5, while it's not going to be an adversary on its own, it's not going to be a generalist agent, if in the wrong hands it's going to enable the development of novel pathogens, enable the development of synthetic biological agents that could you know kill millions of people on the planet. So you will still face the issue of nonproliferation and misuse.
01:04:11
Speaker
This is just the nature of a powerful technology, but at least we will have solved the problem of losing control to the system itself. But this is why we do need to have an international agency that has very strong monitoring mechanisms, very strong safety protocols, national security level,
01:04:28
Speaker
security in it where only research that has been you know proven safe can and safe by design can be released outside or can be released via some more specialized channels, for example, only to the governments of the participating countries, but not just for open use. Because indeed, this is going to be very powerful stuff and it can still be weaponized even if it doesn't just if it's not threatening itself on its own.
01:04:58
Speaker
Let's talk about phase two, which is about flourishing. So this this ah sounded a little bit more teary perhaps than talking about safety and security and so on. the The main or at least the first point that you argue in this phase is that we need to develop a science of intelligence, what you might call a mature science of intelligence. What does that entail?
01:05:21
Speaker
To build things that are reliable and controlled, we need to understand them, predict them, and measure them. This is how humanity has mastered a lot of domains. This is how we have you know safe passenger planes that bring us in the air every day. This is how we no extract awesome energy from nuclear power plants, and this is how we're going to master AI if we do succeed.
01:05:46
Speaker
The situation at the moment is that we yeah know i as we've talked before, as a simple solution to the superintelligence problem would be if we could just know the line at which too much intelligence leads to superintelligence. We could just draw a line there, apply a safety factor like it's common in safety engineering.
01:06:06
Speaker
and and stop before that. you know Get all of the intelligence from AI systems that we that we need and can harness safely without crossing the line into systems that can overpower us. We have a really hard time drawing this line right now because we don't have a science of intelligence and we don't have a measurement science of intelligence. We cannot directly measure this. We should build it. That's going to be the key to understanding many more things about the universe, but also just fundamentally building AI systems that we can you know in advance predict how powerful they will be, exactly what they will be able to do, exactly what they will not be able to do, how much they can fail, and so we can make them fail gracefully. you know Think of the idea of when you build a nuclear power plant, you want your a your
01:06:54
Speaker
require my law to showcase that it is not going to just collapse and explode if there is too much rainfall in your area. right and We can do this because we understand physics. We understand nuclear physics. we you know We have strong foundations in safety engineering and we can do this. We can do this for a bridge.
01:07:13
Speaker
carrying cars. We should get to the same level with intelligence and thus with AI. and so we we The first recommendation is let's build the science of

Science of Intelligence and Alignment Challenges

01:07:22
Speaker
intelligence. This is going to unlock a lot of benefits, but we need to do the hard work of building it, starting from empirical measurements you know across AIs and then building the general theory of understanding how all of this works.
01:07:35
Speaker
And and how how is that different from what people are trying to do today in in computer science, in psychology, in and and cognitive science, and so on? aren't you know this This seems like like ah something that kind of existing academics would be very interested in. Perhaps this is this is like this they would see this as a real breakthrough. But we we are not there yet. But but how how is how is this different from what people are already trying to do?
01:08:01
Speaker
So I think especially in machine learning, there's been a bit of and of an inversion of priorities where there's been kind of a continued chase of brittle benchmarks to just get you know capabilities to to go up without actually trying to understand the deeper principles of this. There there have been some good approaches recently, like but book principles of deep learning that try to find the physical foundations of how deep learning works, but we essentially just need much more of that and much more of this rigorous scientific approach and and of empirical testing that is rigorous and scientific that we can bootstrap on to build the science and aimed at understanding how intelligence works and how to measure it rather than just making the line go up the next brittle evil that gets gained very quickly. This is that the modern history of
01:08:57
Speaker
machine learning in the past 10 years has been just benchmarks being made in quite a crude and simple fashion and being kind of gamed and and smashed across over and over and over. you know There's a talk right now of saturation of benchmarks like MMLU and so on. Saturation meaning AIs are like Sometimes training, even on the content of this benchmark, they're being like directly optimized to beat the benchmark and these benchmarks don't tell us much. This is why, like we talked before, we should have some grounded measures. By grounded, I mean just you know anchored in in reality, things that you know we can just test against, real phenomena that we see, things like let's actually measure the performance of remote workers and let's just compare AIs on exactly the same tasks.
01:09:44
Speaker
Not a toy, you know not a toy benchmark, not a ah list of multiple answers and questions, which is a common standard now, but let's run the AI on the same tasks. Let's see how it does. Let's see exactly where it fails. Let's bootstrap on that and build from that. And I think that's totally possible. This is this would be a very exciting scientific renaissance that they could come from investing a lot more in these and in these grounded approaches.
01:10:10
Speaker
Yeah, ah Dan Hendricks, who is a previous guest on this podcast and the creator of and MMLU and math and and some of these benchmarks is right now trying to create the the most difficult kind of test for AI as possible. something He calls it something like humanity's last exam or something like that.
01:10:29
Speaker
just and just to give listeners a sense of kind of how models have just broken through these benchmarks. And we we we need new ones that are much more difficult in order to to kind of keep up with with AI development.
01:10:41
Speaker
And I think in this case, I think a part of the issue is is this focus on always making them harder for AIs, while you know ultimately what we're interested in is not how hard is something for AIs. This tells us very little. it's only This can only tell us how one AI compares relative to another AI.
01:11:00
Speaker
What we really care about, and this is the approach presented here, is how do AI is compared to us? Or like, how do AI is compared to other intelligences? Because we want to draw these kinds of lines. So this is what, this is why I think we need, you know, we obviously need more benchmarks in everything, but what we especially lack is these grounded benchmarks that don't just tell us where the Claude is, you know, two points above Chagipiti. And we we don't really know what this actually means in the real world.
01:11:28
Speaker
And more like, can Claude automate a salesperson? Can Claude automate a salesperson 80%, 90%? And those are ultimately the questions that policymakers need to have answers to, that we need to have answers to to plan our society around and to understand when we're crossing that line.
01:11:48
Speaker
it's It's a very ambitious project trying to build a a kind of fully general science of intelligence. I guess such a science would be able to kind of compare the intelligence of a squirrel with a human with a current and AI model with a future a my AI model and so on. So it's very ambitious. You mentioned that that this is how we've made progress in other domains. And I'm somewhat skeptical here. I think if you look at the the the history of innovation, it seems that perhaps you get you get kind of tinkering and engineering.
01:12:18
Speaker
And then you get some product brought to market and then it kind of fails or it's somewhat unsafe and then you you iterate on on that product and you get something better. So it's it's it's you get kind of product and engineering before you get a grand theory. And here I'm thinking perhaps of the steam engine or similar similar examples. But yeah, it seems like you you don't have the grand theory before you have the product. So why would that be the case in the case of AI?
01:12:47
Speaker
So in the case of AI, it's possible to build the product before having the grand theory. I mean, we we're doing it right now. The issue with that is that we have no way to make it safe. So, and like with other technologies, you can, you know, it's a tragedy, but it it doesn't end human history to have one factory explode and then you can learn from it, one plane crash and then you can learn from it. But the reality is that when you're dealing with a technology where the you know blast radius is all of humanity, you're not going to have retries. so You're going to have very few and you will have an enormous amount of damage before those, so we cannot afford to do that. But we have a lot of ways to
01:13:30
Speaker
and particularly test I don't think this is a dichotomy, and indeed this is a false dichotomy. As we've just discussed, we can do a lot of this empirical testing and bootstrapping from simpler problems to more complex problems. Let's just test right now how do systems compare to humans across tasks.
01:13:48
Speaker
you know we We used to have simpler, you know without having a grand theory, we used to have simpler measures like the Turing test. The Turing test has been the gold standard and it's been like a commonly accepted met metric of when is AI you know crossing the like real AI you know as comparable to humans.
01:14:06
Speaker
benchmark for a century, essentially, we've broken past that. The first step to start to understand the phenomenon is to take goalposts seriously and not just keep shifting them. We should have a big moment of realization like, yes, we've had ah perhaps crude, but you know better than nothing, measure the Turing test and we've broken past that. We have ah things like exams that we put humans on and AIs break those, let's try to figure out what's going on. Let's get some signal from this and let's make better tests, rather tests and let's bootstrap our theory from that.
01:14:43
Speaker
Yeah, it's interesting. the The conclusion that some people have taken away from AIs, in some sense, passing the turn test and passing very, very difficult exams and so on, is that these were never good tests and exams to begin with, right? And so it's, it's it can sometimes be difficult to kind of update and to think about how how should we update our beliefs and in the face of this new this new evidence. But if if we had a science of intelligence like you describe, I think that we would have much more clarity. Like if we had a science of intelligence, like we have a science of physics, for example, there's there's very little room for for disagreement, or much less room for disagreement. Yeah, and going back on the iteration, so with with the with nuclear, like if we did not have an understanding of nuclear physics, where we started building nuclear weapons,
01:15:31
Speaker
imagine what would happen. right like That's ah that's like a technology that has a much bigger radius. We still made mistakes. like We still made quite dangerous calculations. There was you know the famous episode of ah the makers of the ah the first atomic bombs, tried to calculate whether it would ignite the atmosphere and not be completely sure. you know Luckily, it didn't. and they they They could do the calculations because they had a theory.
01:15:54
Speaker
Without a theory, you cannot even do those calculations. You're just in the dark. We've had examples of of destinations that ended up having a blast radius of like multiple orders of magnitude larger than what the The maker is expected to almost kill the pilot or let massive fallout around. And this is even with the theory. In AI, we are working like like that. We're working with a technology that will have a bigger blast radius than that without a theory. That would ah would have been untenable with nuclear weapons.
01:16:25
Speaker
Now what we could get if we had a mature science of intelligence is some specification of of what these systems can do and guarantees for what these systems cannot do. Maybe you could explain the advantages there and and how realistic is it to get a science so precise that you can guarantee things about AI systems.
01:16:45
Speaker
Yeah, so this is ah another key key component of making transformative AI safe and controlled and you safe by by design. And it's to have a way to specify what we actually want this AI to do, what we want this AI not to do.
01:17:01
Speaker
and so this This approach is quite similar to actually like things that we already have in the in their realm of ah formal specifications and formal methods that are used in other areas of computer science in nuclear engineering and so on.
01:17:17
Speaker
where the you know The challenge is, how do you kind of how do you tell the AI that this is a no-go and this is not a no-go for all possible scenarios? This is not not not impossible. like we We do it, it's it's costly, it takes time, but we've made you know massive, massive, massive progress in this area in the past decades, what would have taken like decades of work. and you know Formally proving systems is famously hard, but it's not impossible. And we've made massive records on those. but but what what What examples do we have of of systems where we have formal proof that they aren't buggy, for example? yeah What would be examples of software like that?
01:18:00
Speaker
you know This has been an approach that has been used in in some areas, like especially like military ah applications. or There is some software for you know military helicopter software that we approach like this. There is some nuclear software that works like this. We also don't need to fully formally prove it. right like du kind The first step is like let's approximate.
01:18:21
Speaker
formally proving some core components of it, some core parts of it, formally proving it with you know within certain assumptions. you know We go back to the bounds point of like if we don't give the AI, you know if we only give the AI this amount of compute and we put it in an environment where it doesn't actually access the internet, you know given this, we can we can show that it's not going to be able to do crazy things. you know Maybe it's just It's going to explode a little bit, but not going to take over the world. And that's, you know, I mean, it's impossible and we're making massive progress on this with also partly also thanks to AI, you know, without having to develop AGI or super intelligence, there has been very impressive breakthroughs by DeepMind with alpha proof recently ah where they've been using kind of a combination of systems to accelerate here improving and similar things that, you know, would also help with this. There's been the work by.
01:19:14
Speaker
David Ad and Yoshua Bengio at ARIA in the UK, the Innovation Agency of the UK, who build this specification language for AI systems. We can already do this kind of stuff without waiting for a full general theory of intelligence. we just It's just a big effort that requires some of the best minds of the planet. Put onto this, you know we we need mathematicians, you know call call to action, great mathematicians, great theoretical computer scientists. This is a great challenge to work on. Very important for humanity.
01:19:44
Speaker
And perhaps um some something where we should devote more computing power and more AIs to to try to help with the automated theorem proving and automated kind of verification of software. This would be, a this would be if you could get a system like AlphaFold, but specifically specifically for proving statements about software, that would be a great advance, I think.
01:20:07
Speaker
Yes. And that's that's possible. And that would also stay in the realm of narrow AI systems or bounded AI systems where we know it's on doing it's only doing that. It's not doing something else. and we We can have it very powerful, but help us solve these problems.
01:20:23
Speaker
So if we get the science of intelligence, if we go through the phases that you've described of of safety, of stability, and of flourishing, then then we are faced with a a a situation in in which we can begin to automate cognitive labor and physical labor. This might be a somewhat beyond the scope of an arrow path, but what do you what do you foresee then? I mean, they it's not as if all challenges are solved, right? Automating physical and cognitive labor would completely changed society. And so do you think perhaps even if we have all of these steps secured, and that we would you know we would still face great challenges?
01:21:05
Speaker
Yeah, we will face great challenges, but we would be out of the extinction threat that comes from super intelligence. We obviously don't have all the answers. Humility doesn't have all the answers, and we should think really hard about those questions. Like as you said, if we do succeed into implementing these phases, and we succeed at not having super intelligence, but building controlled transformative AI, you know, AI that is a tool for us to automate anything that we want. And also we should, do there will be areas where we should decide as a society that we do not want to automate and we don't want to delegate it to machines. That's, you know, that's an unsolved societal question of like, what are those areas and
01:21:49
Speaker
How much you know do we want to have politics automated? Do we want to still have our leaders be be human or not? i know I think we should, but the the real answer is that we should develop due processes where we make collective decisions together that are kind of justified and deliberated together. This is the spirit of democracy to get these answers.
01:22:12
Speaker
And there will still be a lot of challenges that are unsolved. One big one is is concentration of resources. know By default, this is a technology that concentrates power, like whoever controls the development of this transformative technology and can essentially, you know at some point, automate large amounts of labor will have an enormous power and enormous economic power and end military power, the disposal. This is why we set up guard in phase one to make sure that the benefits there is a credible commitment to distributing these benefits to others as well. And there is not a single poll running away with it. So we will see to solve the question, or how do we distribute this wealth? How do we allocate it? you know Do we allocate it
01:23:02
Speaker
We will keep facing a lot of challenges that we face right now, just amplified during a radically different future, a radically positive future, but still filled with challenges. The idea, sometimes proponents, AI and of super intelligence just talk about where we achieve super intelligence and it's aligned, we achieve post-scarcity. In some ways, I think this is a bit of a misnomer. In some ways, we are post-scarcity in a lot of areas right now, especially compared to our ancestors.
01:23:31
Speaker
In other ways, scarcity will never go away. We will always have to make trade-offs. There are only so a finite number of atoms in the universe. in ah In a very deep, deep sense, we will never escape scarcity because we live in a physical universe. But we, as you say, we we have already, we're we're kind of approaching scarcity in some domains and we will perhaps approach scarcity or post-scarcity, sorry, in in some domains and we will approach post-scarcity in other domains.
01:23:57
Speaker
Exactly. And we will still face trade-offs between things like explorer versus exploit. You know, we will have a technology that can just, no, we will never need to make decisions. Do we invest a lot of this surplus into ah exploring the stars and setting on new planets and so and setting up new new things or should be invested to make everybody immediately extremely wealthy and have access to any material goods they want. You know, it's those things are not going to go away. Some things are never going away. And this is another reason why super intelligence is a dream is is
01:24:32
Speaker
sometimes quite utopian in an ungrounded sense because there are things that are just logical impossibilities or like moral impossibilities. Some people want other people to suffer. Is alignment satisfying? Is there a desire for others to suffer? This will contrast other people's desires. Some people want positional goods. They want to be the best at something.
01:24:56
Speaker
Only one person can be the best at something at any given point. How do you satisfy that? Human values are complex. We don't even know if there are like universal you know full human values. If there are, we should discover them, but we're not there yet. A lot of these questions are not going to be answered by more technology. They're they're going to have to be answered by more you know human tackling these questions, and trying to find trade-offs, compromises. We will never get rid of compromises.
01:25:23
Speaker
we will probably not satisfy all ah preferences fully. And in fact, if I'm not mistaken, I think we have some impossibility theorems that that state that we cannot satisfy all preferences at the same time.

Involvement and Collaboration in AI Safety

01:25:37
Speaker
As a final question, I would like us to explore how people can can get involved. So if you're in politics or if you're a technical person in computer science or a mathematician,
01:25:48
Speaker
perhaps lay out for us how how you can get involved and you know what you see as the most and fruitful areas to get involved. you know In our path, we set out one path that we think is going to help humanity make it across the other side and survive super intelligence and thrive. A lot of questions are unanswered. A lot of things will need to be strengthened, tested, adapted to specific contexts. A lot of technical challenges will need to be solved, which are and really exciting. One of them
01:26:24
Speaker
is but we will need much more work and much more bright minds on things like on-chip verification mechanisms. Now, if you are interested in that, look into those those the the current projects, think about setting up your and your own project. This is going to to be a growing area and an area that we know very, very much need for robust yeah governance. People that work on benchmarks,
01:26:49
Speaker
do work on grounded benchmarks we want. We would love to see governments, economists, and computer scientists while working together to make these grounded benchmarks a reality, comparing actual economic performance of humans with performance of AIs to have clear metrics of what they can do.
01:27:08
Speaker
And for people in and policy, you know if you are concerned about these risks and you're thinking about measures like this in your country, please get in touch. Also get in touch if you think there there are ways to strengthen these measures. you know ah This is a very complex problem where nobody has a solution and our goal with an error path was to go from zero to one. We didn't find a plan to deal with extinction risk from AI globally, so we made one. Now, you know we need all of you to make it.
01:27:40
Speaker
better and to find the way that you can implement it, adapt it, transform it into real laws that can pass in your country, in your jurisdiction, and to make this into a reality. So if you have any ideas on that, please do get in touch at hello at narrowpath.co. That's the email that we use for all the feedback and you know share it, share criticism, share feedback. Helps a lot. Fantastic. Thanks for talking with me. Thank you so much.