Introduction of Gus Stocker and Dan Hendricks
00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Stocker. I'm here with Dan Hendricks. Dan got his PhD in computer science from UC Berkeley, and he's now the director of the Center for AI Safety. Dan, welcome to the podcast. Hi, nice to be here.
AI Evolution and Societal Impacts
00:00:16
Speaker
You have a fascinating paper about evolution and AI and humanity's relation with AIs in the future. It's called Why Natural Selection Favors AIs Over Humans. Maybe you could just sketch what that paper is basically about. Sure.
00:00:33
Speaker
I think what I'm trying to do with that paper is point at some robust processes that are shaping AI's development and if we can say that these processes are relevant, then we can say a lot of things about what AI systems might end up being like and their key properties in relation to humans.
00:00:51
Speaker
Maybe it's most effective to start with a sketch of some of the scenarios I might have in mind. So right now, it looks like we're in an AI race in the corporate sphere. Basically, we've got companies like Microsoft against Google, against other new competitors like Elon's organizations, potentially there'll be new ones as well like Amazon.
00:01:12
Speaker
And what's also happening is that there are pressures to be automating or outsourcing more and more tasks to different AIs. So we might use
AI in Corporate and Decision-Making Roles
00:01:22
Speaker
AIs to write our emails right now or generate some art. Later, we might even not just have them generate the email, but maybe they'll automatically write many of the less critical emails for us. We could imagine this process continuing
00:01:36
Speaker
all the way up to outsourcing more and more important decision-making tasks to different AI agents. And then eventually we could have AI systems do things like be CEOs of companies, where there for being a CEO, it's often a game of stamina and knowing how to do a lot of different little things.
00:01:56
Speaker
And it's possible that they could be more competitive. If we'd imagine that there'd be companies that would resist these sorts of trends that wouldn't try and outsource anything to AIs, wouldn't replace people with AI systems, then they just might simply be out competed and become irrelevant. If they try and bargain with these larger trends, they end up losing influence. And so what they have to do is they have to align with that process.
00:02:20
Speaker
So we can see that this process is shaping the goals and directives of a key part of society, and we're outsourcing more power and decision-making to AI agents. So whereas a lot of other stories might say, we've got some AI system, and it's going to try and seize power or something, go from being relatively powerless to very powerful, actually, we might just end up giving them a lot of the power by default. And then there's the hope that we can keep that process in control.
AI Competition and Survival Dynamics
00:02:49
Speaker
in later stages of this process, we've got AI CEOs. And this isn't terribly implausible, I should say. For instance, NetDragon WebSoft, which is a large video game company in China, had a press release where they're looking to have an AI CEO. So I think this is a possibility, definitely, in time. Do you think that's more of a gimmick? Or do you think that's more? I think it's more of a gimmick currently. Yeah, I think it's more of a gimmick currently. It still is a larger company.
00:03:17
Speaker
Nonetheless, there isn't a, only humans can be in control of companies or there doesn't seem to be any precedent like that. Eventually, we may be having AIs directly interacting with other AIs. They're making, they're communicating with each other. They're interacting and competing with each other in this sort of marketplace. We have more survival of the fittest dynamics.
00:03:42
Speaker
And I think when we have survival of fittest dynamics, then we end up getting stronger selection pressures for AI agents that have more survival instincts. So things like self-preservation, things that are better at amassing their power, things that aren't know when to say the right thing in the right situation, but they might be a little deceptive in some other types of situations.
00:04:08
Speaker
not always think to follow the law, but they might think, let's break the law if we are likely not to get caught. Or if we do get caught, we can take the penalties. We can take the fine. And that might be a more efficient strategy in competing in this sort of system. So I think there's some selection for
00:04:29
Speaker
generically more, when quote, selfish behavior where agents end up propagating themselves or expanding their influence at the expense of other agents. And when they're interacting with each other, I think this gets more intense, society get become, moves substantially more quickly. It's harder for humans to keep up.
00:04:48
Speaker
So they have to keep outsourcing more of this power to these AI agents if they're to have any influence whatsoever. In this process,
Challenges of Integrating AI Safely
00:04:55
Speaker
people might not like where it's going. And as I said, they would have incentives to keep aligning with it. The ones who have misgivings end up losing relevance.
00:05:03
Speaker
But what might also happen is we might get more dependent on them, too, in this process so that if we want to get rid of them or want to reduce our influence, it's not really that much of an option. We become more dependent on them for our basic needs. We then mesh them in a lot of key decision making.
00:05:22
Speaker
We've forgotten how to do various things. They're in control of our power grid. We've even become potentially, in some cases, emotionally dependent on them. They've propagated to a private sphere, and a lot of people are having these as chatbot friends. So as they get more integrated into our processes, it becomes very difficult to reverse this process, too. So if there's a lot of momentum to it, and if there isn't sufficient willpower for reversibility, then we've got a problem.
00:05:50
Speaker
If, for example, the internet, the internet was not designed with safety or security in mind, we would all like a much safer internet. But, of course, it's not really feasible for us to just, okay, let's stop the internet and we'll just rewrite it from scratch to be a safer thing. We'll just keep rolling with it.
00:06:07
Speaker
So I get concerned at that later stage where it's moving extremely quickly, where there's a lot of intense competition among them. And I think that provides some sort of alignment towards some pressures that are not in keeping with human well-being. Evolution is not doing the same thing as maximizing well-being. It's quite a different force.
00:06:29
Speaker
Why talk about evolution? Why is it that we're talking about evolution or that you're writing about evolution in this paper? This sounds more like market forces pushing in directions that we might not like. How seriously
AI Evolution vs. Human Evolution
00:06:42
Speaker
should we take the evolution part? Is it more metaphor analogy or is it more literal?
00:06:47
Speaker
So this is more literal. We can speak about it in a formal sense. I'm speaking about evolution on this time scale because we would have... I'm not speaking about biological evolution. Biological evolution for humans is slow. For some other organisms, like fruit flies, it can be fairly fast. But for humans it's slow. Other sorts of structures evolve substantially more quickly.
00:07:12
Speaker
So, for instance, software and things like that, those change substantially more quickly. And so I think AI agents as well, and there are many basic characteristics, could change dramatically over the course of a year. And there might be a lot of competition that would affect what they would end up looking like.
00:07:30
Speaker
So I think because their adaptation, since they can adapt and adjust their weights every few seconds, since they can compete with each other very quickly and there'll be a lot of iterations there, then I think that the time scales end up getting the pressures end up getting so intense.
00:07:50
Speaker
that it starts to make sense to not just talk about it as two fixed things competing with each other, but instead paying mind to how these continual rounds of competition end up affecting their strategies.
00:08:03
Speaker
and end up influencing how they behave. So in short, the idea is we progressively outsource more and more decision making to them. We get increasingly dependent on them. They start interacting with each other more directly. That produces some survival of the fittest sorts of dynamics. And that tends to select for more selfish behavior. Now, when I'm saying selfish behavior,
00:08:26
Speaker
I'm not talking about intentions or anything like that. I'm not saying that they're necessarily trying to be ruthless. A lot of people in the market, for instance, when they're laying off a thousand people, they're not thinking, hey, I'm being selfish. They're thinking, I'm being efficient. I'm doing my job. So this isn't necessarily that they're thinking to displace things. Indeed, things that aren't even conscious can be selfish. For instance, AI technologies now are
00:08:51
Speaker
or when they're automating people are propagating themselves and things like them at the expense of humans. So they don't even need to be conscious for us to call them selfish. Whether they intend it or not, that doesn't really matter. It's more that they just have those sorts of behaviors. So that's at least what it might look like in the corporate sphere. For the military, a sort of military example would be
00:09:19
Speaker
Let's imagine we're in a great power conflict, and we get a fairly similar process. We have AI agents doing a lot of our bidding they're necessary. They're strategically relevant things are moving so quickly people can't make appropriate.
00:09:35
Speaker
make appropriate strategic decisions in time, so we have to outsource much of it to them. It's also substantially more politically convenient to have AIs fighting instead of people getting killed in that process. So we end up having AIs fighting against each other, and
AI as Tools vs. Agents
00:09:53
Speaker
that creates some unfortunate selection pressures for the four ones that we
00:09:59
Speaker
for ones that are better able to propagate their information, better able to survive. So my concern is that we might lose control of that process, where we can't get AI systems to be extremely reliable or control them, and we have this process going on where they're competing with each other.
00:10:21
Speaker
it may make sense for rational, self-interested actors to continue with this. If they don't, then they're going to be destroyed by the other country. And so they might put all of humanity at risk by creating this crucible in which AI systems are being shaped. And we might lose control of that process, and that could result in potentially omnicide of everyone.
00:10:45
Speaker
But it may have made rational sense for self-interested actors to do that because a lot of those leaders may have been disempowered or destroyed if they didn't go along with having AIs outsourcing lots of lethal decision making and giving them automatic retaliation abilities. If they didn't do that, then they would lose necessarily.
00:11:11
Speaker
We've basically got a collective action problem, is what I'm emphasizing here, and some other communities that might call this Moloch. We've got a collective action problem, and then also we have a very fast-moving environment where the individual agents end up changing their shape fairly quickly, such that we can start to see them having different behaviors being selected for.
00:11:31
Speaker
So those are two different scenarios I have in mind. Yeah, I think one sticking point for people or for listeners here might be the move from the tools we have today from the from chat DPT we use or a recommendation algorithm when we watch YouTube or something like that to more of AI agents. How do you see that move happening?
00:11:55
Speaker
Yeah. So I should say the paper was largely written last year, but I sort of waited until, um, there'd be more of an arms race type of dynamic going on. So, um, so that people would go, Oh, it's much more plausible. But, uh, so yeah, so I think, I think there are some different ingredients we need. We need, um, intense competition and we need variation. I think a lot of the interesting dynamics do start to happen when they become more autonomous and have more of these life-like characteristics.
00:12:24
Speaker
for this knowledge or for this the characterization of evolution to feel more natural. It's a little less natural for people to think of other
AI Development through Evolutionary Lens
00:12:32
Speaker
structures evolving such as say cultural artifacts or say scientific theories or designs. That's appropriate to say too though but it certainly is less natural compared to if we're thinking that they're more life-like so if we're imagining that they're AI agents it does become more natural.
00:12:49
Speaker
In the case of AI agents, it's not clear when we'll have that. Right now, they are more tool-like. There are still plenty of people trying to make them be more agent-like to complete various different tasks. So there are things like auto-GPT, where people are trying to repurpose models such as GPT-4.
00:13:07
Speaker
to behave as agents. They don't particularly work well now. There's the Transformers agents library that recently came out, but there still are some capabilities needed for that to really kick into high gear. I wouldn't want to be putting this on people's attention after
00:13:25
Speaker
We're already in a more critical situation. It isn't to say that it isn't applicable now, but it becomes more worrying and more intense and more legibly shapes their evolution when they're agents.
00:13:42
Speaker
So this idea of extending evolutionary theories to other domains apart from biology is extremely interesting. But do you think there's a risk that we might bend the domain we're trying to theorize about?
00:13:57
Speaker
into shape to make it fit the evolutionary framework. I'm thinking that perhaps not everything fits into the evolutionary frame in a sense. Why do you think AI development is a good fit for evolutionary theory? It basically depends on the intensity. Many structures can evolve, but if there isn't much intensity for it, then it isn't as applicable. I should say, at least right now, therefore instances, most of the artificial selection going on for AI agents
00:14:27
Speaker
is to make them better at propagating themselves. And as it happens at the expense of others, basically they're designing them to automate people as much as possible. So I think that most of the ways in which people are directly shaping AI systems now are exactly in a sort of disempower, replace humans direction to a first order approximation.
00:14:52
Speaker
But the evolution is applicable, or the sort of more generalized notion of Darwinism is applicable, when we have three conditions. Those would be called the Lewontin conditions. When we have variation in the different agents.
00:15:07
Speaker
When we have retention, so between two iterations of agents, they have some similarity. They're not anti-correlated or completely dissimilar across time. And then there needs to be differential fitness or there need to be fitness differences so that some end up propagating at higher rates than others.
00:15:25
Speaker
And so if we have all three of those conditions, then we have evolution by natural selection occurring. So that shows that it is a risk factor and it is shaping their development. And then I'd be concerned about that because evolution doesn't select for things that are extremely nice necessarily. Instead, it selects for things that are better at propagating themselves.
00:15:47
Speaker
This lesson might be worth stating again. Maybe you could talk about lions and parasites and all the ways in which nature isn't nice.
00:15:57
Speaker
Yeah, so there's often a Rousseauian type of view that nature and things in nature are very benign and angelic. I don't think this is quite accurate. We have, when we consider or look at many instances in nature,
00:16:19
Speaker
Some very brutal behavior. I mean, there would be, for instance, as you mentioned, lions may commit infanticide. Why might they do that so that other lions or lionesses may breed with them and have more children? This isn't because they're thinking, I must propagate my genetic information into the future to take up more space-time volume or anything like that. It's
Human-AI Cooperation and Morality
00:16:40
Speaker
not an intent thing. They're instead just, this is just the behaviors that are selected for.
00:16:45
Speaker
We can have many different characteristics or many different instances of selection and how from this fairly amoral competition we get actions that we would tend to deem as, if they were in relation to us, immoral. So other examples like deception. Deception is very common in nature. There's of course camouflage, but even
00:17:09
Speaker
Even the very smallest things like viruses will try and permeate membranes by making them think that, oh, I'm not an intruder at all. Let me in. There are examples of agents or of organisms taking over host organisms. So liver fluke, for instance, will get inside of another animal. And then that will basically hijack its brain so that it
00:17:38
Speaker
So that the, so that the animal hangs on to a leaf by it's mandible, so it's more likely to be eaten, and then the lancet liver flu can get in another animals digestive system and then it can end up reproducing in that way.
00:17:54
Speaker
It's not necessarily a picnic in nature. Unfortunately, many examples of altruism and cooperation, many of the mechanisms that could rise to that do break down when we're talking about human-AI relations or they backfire. So although there can be instances of cooperation,
00:18:18
Speaker
and altruism in the animal kingdom, this might be a reason to expect that. AIs may have some altruistic tendencies toward other AIs, such as nepotistic behaviors, but that's not a reason to believe that they're going to be nice to us. So we might think of a basic example of cooperation like reciprocity.
00:18:42
Speaker
And there, that depends on a cost-benefit ratio of whether I get sufficient benefit from cooperating with you relative to the costs. But unfortunately, later it starts to make much more sense that AIs would end up cooperating with each other, and there'd be a lot more costs if they're cooperating with humans.
00:19:00
Speaker
A way to see this is just remember in Zootopia, there's an example of them going to the DMV and there are sloths. And sloths take an extremely long time to do every sort of task. If AIs are running 10x the speed of us, it just doesn't really make much sense to actually do some type of cooperation or
00:19:19
Speaker
If there is any form of cooperation, we need to argue that we'd be getting a lot of resources from them enough so that we can have an extremely good well-being. Other mechanisms for cooperation with altruism might be kin selection. So parents will give food to their children and potentially even sacrifice themselves for their children.
00:19:39
Speaker
But this is not necessarily a good sign for us. This might suggest that AIs could be nepotistic. That is, they would start to prefer things more similar to them relative to us. So that's not a sign for optimism. In fact, that backfires.
00:19:58
Speaker
There are various other mechanisms of cooperation and altruism that we go through the paper but they either aren't applicable or they seem to backfire so many of the things that give rise to some potentially nice harmonious conditions in the world.
00:20:13
Speaker
don't seem to be applicable with respect to human-AI relations or in human-AI coevolution. Yeah, and a sort of global altruism wouldn't make much evolutionary sense. We could talk about group selection or kin selection, but global altruism would confer no survival benefits, if I understand it correctly. Another mechanism for cooperative or altruistic behavior may be something like moral reasoning.
00:20:39
Speaker
One could imagine that if AIs get smarter, they end up being more wise and then they'll be more altruistic or moral. And, you know, maybe this is the progression of society and maybe AIs would continue that. After all, with humans, we've seen a progression to being more altruistic toward not just our family, but actually people in our state or in our entire country.
00:21:07
Speaker
or of people of different races or genders or orientations and so on. So perhaps that circle of altruism would end up expanding all the way from AIs to humans and then they would end up pursuing our well-being.
00:21:20
Speaker
But if they're operationalizing it as what we should do is we should maximize every individual's well-being and the potential future individuals as well-being. Unfortunately, that may not look too well for humans. If AIs are able to experience an extreme amount of pleasure, then it may be more efficient for them to replace biological life with digital life so that more more pleasure can be experienced. So that doesn't necessarily
00:21:50
Speaker
turn out too well for biological humans. So I think even hoping that morality would naturally lead to good outcomes or humans and people being taken care of doesn't follow. In fact, that can also backfire and go in a direction of strongly preferring some sorts of artificial life, whether or not they resemble humans is immaterial so long as they have pleasure. Now, there are
00:22:18
Speaker
Other issues with hoping that a eyes would naturally be more moral so one is this is assuming that it tell it more intelligence makes things more moral it's not clear that's true maybe there isn't some sort of objective morality some people think that.
00:22:35
Speaker
But even if it is, then we have to hope that whatever morality, moral code they end up adopting is human compatible, I'd suggest that a utilitarian code of ethics is not necessarily human compatible. And if AI agents themselves can be extremely morally valuable,
00:22:56
Speaker
And then also, even if the code of ethics they find is human compatible, there's still a problem of them being motivated to act by it. So some people might recognize that there are more reasons for doing things. That doesn't mean that they're going to actually behave morally, though. They might go, yes, that would be the good thing to do.
00:23:15
Speaker
I have my own self-interest. And this is the direction in which the AIs would be pushed. So we've talked about
Guiding AI's Evolving Environment
00:23:22
Speaker
how they would be pushed to act selfishly and how with group selection or kin selection or morality, moral philosophy, in a sense, can save us from this direction that AIs would be pushed in.
00:23:36
Speaker
Yeah, there isn't a sort of reason sort of from the point of view of the universe or just sort of analyzing the how the environment might be structured where things were sort of naturally lead to those sort of outcomes, unfortunately. So the I think the only thing that we have in our favor is that we get to make some of these initial moves in the construction of their environment because one thing two things affect fitness, natural selection, but also
00:24:05
Speaker
the environment in which they're evolving. I think that we have to exert very high influence over the ecosystem that's shaping AI agents. Right now, it doesn't look like we're doing almost any of that. Instead, if we're caught in an AI arms race or in an AI corporate race,
00:24:22
Speaker
We're letting Darwinian logic carry itself to its conclusion as rapidly as possible if they're all racing as they are now and trying to win in some sort of extinction race. So we would need to take things substantially more slowly. We would need to initialize the environment well so as to select against this type of
00:24:41
Speaker
Many of the instances of this type of behavior, it's fairly difficult to do though if you end up having some sort of selection pressure for, let's say, deceptive AI systems, it might be fairly difficult to select against them. If they're not transparent, they could just play along and act good for a while. But then when they get more powerful or something, they could act fairly differently.
00:25:05
Speaker
And that just points at the general problems in trying to align things. It's difficult to rely on regulations or rely on our training processes to get angelic, saint-like AI systems. We know that's a problem, but now we have a sort of robust process that continually nudges them in a direction of having some of these more unfortunate characteristics in the first place.
00:25:26
Speaker
Yeah, so we have these three factors of natural selection. We have variation, we have retention, and we have differential fitness. Perhaps we should walk through a bit more slowly how these three factors might apply to AIs just to see how the argument actually works out in detail. So perhaps if we start with variation, why is it that we should expect that there would be a multitude of different AI agents in the future?
00:25:53
Speaker
Yeah, so there are generally stronger, or there are strong reasons for variation. So I think that there is a question of whether there'd be one very powerful AI agent at some point, but I think eventually, almost on any time scale, you should expect multiple different AI agents due to locality differences, as in you can't have one.
00:26:15
Speaker
big thing, taking up tons and tons of space, you would need, certainly there are advantages for running things in parallel, so this at least is why there should be multiple different agents, if you're just having one system processing everything serially, it's, you know, I'll add that to the queue, but I'll process that in 100 years, doing things in parallel makes plenty of sense.
00:26:34
Speaker
But then also, if there are different niches or in some different environments, it might make sense to be more efficient and not necessarily have all the latest and greatest capabilities of every sort. Those might be more expensive than is needed to do well in some different aspects of the economy. I think right now, for instance, we're seeing a quote unquote Cambrian explosion in AI models after the llama model was released.
00:27:02
Speaker
because that contributed to the open source ecosystem and people are creating many different mods and this is resulting in many different applications of AI and you don't need the most powerful agent or the smartest agent in all.
00:27:17
Speaker
to do various different tasks. So there are also substantial risks, fitness-wise, if there's one agent and it only has clones. So if there's only one agent, if you find a vulnerability in one, you found a vulnerability in all of them. So that creates quite a problem. This is one reason why very complex organisms
00:27:43
Speaker
are sexual as opposed to asexual. They don't clone themselves because they tend to get wiped out surprisingly quickly over reasonable periods of time, the more complex ones. So for that reason, I would expect a differentiation among them. Otherwise, they have subjected themselves to substantial vulnerability.
AI Dynamics: Generalization vs. Specialization
00:28:08
Speaker
are some reasons to possibly expect things like variation. The argument I just said as well is also a reason not to expect them to just copy and paste themselves, identical clones themselves on different servers and whatnot. They wouldn't necessarily do that. That might create a real fitness advantage in a competitive landscape.
00:28:24
Speaker
Yeah, what about the advantages of training the biggest models? Perhaps we could imagine some of the top companies, some of the leading companies right now, maybe they're able to train a giant model that then helps them develop further AIs. In a sense, might this push towards having fewer agents or fewer AI agents because there are
00:28:47
Speaker
There are winner-take-all scenarios, and there are advantages to being first, to getting over some threshold as the first player from which you then leave the other players behind. And so in a sense, there might only be one very powerful agent. I think if we're talking about an extremely fast takeoff, I think we can have shorter timelines. I think many of the relevant AI advancements are this decade end.
00:29:15
Speaker
many jobs get automated and there's potentially mass unemployment this decade. I think one could believe that, but still not think that there is an extremely fast takeoff where a few months different makes all the difference forever, fairly decisively.
00:29:33
Speaker
But I tend not to focus on the extremely fast takeoff for intelligence explosion type of scenarios for a few reasons. One is I think it's less tractable. So even if it is a risk factor, it seems
00:29:51
Speaker
fairly difficult to do many things about distinctly. But also, I think that with recursive self-improvement, that's often a sort of mechanism used to explain why there may be an intelligence explosion
00:30:08
Speaker
I think we already have AIs influencing the development of future AIs and I just expect that process to get more intense. So I don't think there's a switch where and now suddenly they're influencing AI development and that makes all the difference. I think it's a continuous process. We've been having it for a while. We've been having AIs label
00:30:28
Speaker
label different data sets, and now we're using them to create new data sets for them to train on. We're having them write more and more of the code, and that's a continuous variable. We're having them help design a lot of the GPUs. We're having them cool the factory, or cool the data centers where they're training. And there are a lot of processes that's helping make AI development go more quickly. They're facilitating in research as well, potentially in brainstorming.
00:30:58
Speaker
finding different parameters or architectures. So I expect just more of that process to get automated and that could look continuous, but exponentials are still sharp enough. So do you think these questions are already settled? In a sense, we will have at least a somewhat slow takeoff and we will have multiple agents because we are already seeing that we have multiple agents and perhaps we are already now in some sort of takeoff.
00:31:24
Speaker
it does look like we're in a slower takeoff scenario. I would like to distinguish that there's a bit more complication to that phrase, or I'd like to complicate the idea of a slow takeoff in a moment. But the other thing is I think multi-agent, it's possible that there are for a while some
00:31:41
Speaker
a few relevant agents, and then there's a larger ecosystem of multiple agents that could expand or contract. For instance, like with states, sometimes there'd be some superpowers, and then sometimes it'd actually be more equally balanced. In the limit, though, I would expect multiple different AI agents, especially when we're talking about extremely late stages just due to locality constraints, that they all can't be in one place.
00:32:04
Speaker
And I think slow take-off is another idea from evolution, which should be about punctuated equilibria, where actually what happens is that things evolve and they'll evolve so it's steadily. And there will be small periods in which there are substantial advancements.
00:32:20
Speaker
And then it will evolve steadily. And then there'll be some other period where there's a lot of advancements. I think that's actually a more appropriate description of AI development today. So after GPT-4, there's about a month or so of extreme development. And then I think there'll be some periods later this year where we're kind of
00:32:42
Speaker
waiting for the next big thing, killing time. These sort of frenzies have happened a lot in AI development. After ResNet came out, after ResNet came out, there are lots of new models, FractalNet, FractalNet, DenseNet, ResNEXT. And these came out in very rapid succession. Self-supervised learning also had this period in late 2019, where there was extremely rapid development.
00:33:08
Speaker
GANs had this as well after improved GAN. It was easily the hottest topic in research and anybody researching this space would likely get scooped or if they're working on paper somebody else will come up with the paper the day after. So I think that there are periods of extremely fast and extremely fast periods and this is what we see in
00:33:35
Speaker
when we look at historical records in evolution as well. So I expect that to continue in AI. So there'd be fast periods, there'd be slow periods, but I still think it wouldn't. I'm still suspicious of some overnight moment where it becomes extraordinarily more powerful than everything else combined.
00:33:57
Speaker
So what was this wrinkle on takeoffs you wanted to mention? Well, so that was the wrinkle. It's slower, but it's still made up of periods of faster periods and slower periods. Got it, got it.
00:34:10
Speaker
Could we perhaps expect the future of AI development to look a bit like the duopolies among the large tech companies we have today? With mobile and the internet, you have something like Apple and Google, Microsoft, Facebook. Might there be two or three interesting and highly capable AIs in the world, and that's it?
00:34:33
Speaker
Yeah, I think it's plausible that there may be a few main developers. I think there are things that can push it in another direction that many more companies will be recognizing the importance of this and so more end up flooding in, as well as when it comes to state actors. Many of them would recognize this as a national security concern. Every general purpose technology gets weaponized.
00:34:56
Speaker
So I would expect militaries later on to be developing their own AI systems and many of them to have major programs. So I think there are things that can push it in a
00:35:12
Speaker
direction of many different actors too, but possibly for corporations in the shorter term, there may be a few different companies on which these are based. That isn't to say that that's what's affecting or explains or is the only thing to look at when thinking about human disempowerment or the replacement of humans with AI systems. So
00:35:34
Speaker
The companies that are using these AI systems will still end up resulting in a fair amount of the displacement. Although OpenAI may not themselves be building their legal AI system, some other companies doing that with that sort of technology and using that to displace other people. Or let's say political parties may not be developing the AI systems, but they'll use some of that technology to build more persuasive AI bots
00:36:03
Speaker
to create, to manipulate people and play on their emotions and erode public understanding because there's some information alarms race with the other political parties.
Specialized AIs in Military and Corporate Sectors
00:36:22
Speaker
of this process of AIs permeating the corporate sphere and even our personal lives, not all the action is happening at those specific companies, but it's also just happening in the people using and directing these AI agents for their various different goals, which makes them behave differently and makes them different objects to select on. Would you conceptualize chat DBT as one AI or multiple AIs?
00:36:49
Speaker
Well, so they would be identical in their instances if there are modifications for them, though. So their weights are the same. If they became open-ended and became customized for individuals, then they would be different. So if people have different assistants, they're all based from a similar sort of species of AI agents or something like that. There may be some that become some of those
00:37:17
Speaker
members of that species may be much more relevant than others. So the assistant of some extremely powerful person who just delegated a lot of things to them, that becomes a very powerful AI agent. And there are reasons why we should expect to see specialized AIs. You mentioned military. Military is one except just using a general model. Perhaps companies will want to develop their own AIs as opposed to relying on another company's AIs. So we should accept them to be specialized. But aren't
00:37:46
Speaker
current companies pushing in the direction of generality. So because general agents are useful, we see that AIs are becoming more and more general. They can help us with a variety of tasks. So which of these pressures will win out? Will specialization be most important, a generalized ability?
00:38:06
Speaker
Yeah, so for various applications, let's take the instance of classification. You don't want to use something like GPT-4 for most of your classification tasks. Actually, you'll want to use a BERT-based model. There are various different niches where we want different AI agents, and it would often be much faster. We could imagine, for instance, at a later stage, let's say we've trained an extraordinarily large model.
00:38:36
Speaker
that takes a long time to do inference. Maybe that's particularly good for scientific research and doing some of those sorts of tasks, but it often makes sense in these later stages of when they're consuming substantially more hardware to be using more efficient ones that don't have all those bells and whistles and were partly trained by some of those AI agents, but they need to be able to adapt quickly in their environment. They need to be able to get a lot done.
00:39:00
Speaker
And I think those constraints end up making a difference. So I think right now we're in a period of a lot of creative destruction where we're seeing some of them sort of leap ahead of many others. But I think in more recent months, then we're starting to see that there's a bit more of an ecosystem where we're using different AI agents. We're tuning them in different ways and customizing them in different ways for various different applications. And that creates
00:39:25
Speaker
That creates something more like an ecosystem, and then they'll end up competing with each other. But the stock from which they draw, there might be a few different sources, I should say, of different data agents, which may end up correlating them, but there'd be things that would differentiate them too. So I'm not making a completely definitive claim about
00:39:47
Speaker
how this would look in the next year even. I would expect it to be different ones. If there's an extremely fast takeoff, that would certainly change it. And I recognize that it is not clear exactly when these sorts of things happen, likewise with when we have different AI agents.
00:40:12
Speaker
I would I would I would agree that that is a still a part where there's a lot to be seen in the way that you know a year ago if you say I race it's like yeah kinda not really like a bit about you know google doesn't take it that seriously microsoft doesn't take it that seriously so I think we want to be fairly proactive about.
00:40:33
Speaker
making sure we don't get caught in those sorts of states. That argument doesn't really work anymore, I think, not with what we've seen recently. That's variation. I think you're making a persuasive argument that there will be a variety of different AI agents. Then we have the retention element or the retention factor. How exactly does the retention work for AI?
00:40:56
Speaker
It's not like they are actually mating and then having offspring that share some features from one AI and some features from the other AI. So how does it work?
00:41:06
Speaker
It is possible that there may actually be some type of sharing of information more explicitly like that in the future, where, for instance, there are these things like model soups and whatnot, where people actually take the weights of different AIs and then that can make them more performant. But this isn't particularly common practice, at least right now.
00:41:28
Speaker
But yeah, so there are different mechanisms for retention. So with retention there, the idea is we need similarity across the iteration. So as long as we have a non-zero correlation between the two iterations, then we have the condition satisfied. Obviously, stronger retention.
00:41:46
Speaker
ends up, if we have near zero, well, that's just a subtle point, though, so I'll skip that. So that's retention. Retention is not the same as reproduction or replication. So they don't need to make clones of themselves, and they don't necessarily need to reproduce or mate or anything like that.
00:42:05
Speaker
It can be plenty of structures that evolve and change their shape across time to adapt better to their environment and get better at being relevant in that environment without reproduction. So that's a common misconception. We don't need many of these biological mechanisms of reproduction or life and death.
00:42:23
Speaker
Instead, we could just speak about, how used is it? How influential is it? Is it no longer available for download? These sorts of things. But I would say, for retention, some paths. You could imagine AIs training other AIs or influencing other AIs. That's the way that some of their information ends up getting transferred or influences the development of other AI agents. AI agents might imitate each other.
00:42:53
Speaker
So, and that might affect their behaviors quite a bit and how they are. I think that's a very high fidelity pathway later on when they're competing with each other. They'll know what some sort of useful strategy is. They'll adjust themselves in view of that. So I think there are some paths like
00:43:08
Speaker
imitation, them learning from each other, them influencing building training environments for each other even, or helping design each other when they're doing scientific research. There are many paths for information to be passed on across time.
00:43:25
Speaker
So we're talking about the proliferation of training setups and the best techniques for training these models. But I was thinking, what is it that's being retained? What is the DNA here? Because if it's the weights,
00:43:40
Speaker
Isn't it the case that every time we train a model, we start over from scratch and we get a new set of weights, even though we might get some of the same emergent capabilities in a model, the weights are different from between GPT-2 to GPT-4, for example?
00:43:57
Speaker
So if we're doing things like tuning, there are many different things that can be iterated on, many different objects of selection. So if we're having an adaptive model, for instance, then between each iteration, like it does another backprop step, well, then information was retained between those two. They had similarities.
00:44:16
Speaker
If we're talking about, even if we initialize some models from scratch or something, if they're learning from an environment where these are the competitive precedents, that these are what will make you more effective, and then these things will harm your performance. There's a substantial amount of information being transmitted through that.
00:44:34
Speaker
In this case, there's potentially some of these acquired characteristics after training when they're adapting and whatnot, and some of the behaviors that they get afterward end up being transmitted as well. So this is somewhat unlike people, like if you have, I don't know, you're very smart, you've learned a lot of psychology or something.
00:44:55
Speaker
It doesn't mean your children come out knowing psychology. Meanwhile, with AI agents, a lot of those things could actually be passed on to variations of them in the future. So you don't necessarily need to start from scratch for it. So that's one other thing that would also affect the rate of adaptation that you don't need to leave as much or create as many things from scratch. You can skip many of these steps.
00:45:22
Speaker
nine months to have a baby but we need to fire up a new server and pay for some of the cloud credits and then you've got a different agent so. The rate
AI Fitness and Competitive Traits
00:45:34
Speaker
of evolution would be much much faster for AIs than it is for humans.
00:45:39
Speaker
Yeah. So the smallest adaptation step one could imagine is actually them learning online. They're deployed in their environment. They're acquiring a lot of know-how. They're figuring out which strategies are best. And each time they're updating themselves or adjusting their behavior in view of prior experiences, then you can say that there's been some change in their behaviors that affects their fitness and their ability to weather.
00:46:07
Speaker
the rate at which they'll end up propagating or gaining influence or losing it. What's learned during the lifetime, let's say, of an AI could be easily copy pasted into another AI, or perhaps not easily, but it could be copy pasted much more easily than we could copy paste DNA, for example. Perhaps for AIs, I think it's called Lamarckism, which is a kind of a false theory of how animals evolved, but for AIs, it might be more true or true.
00:46:36
Speaker
Yeah, you could have as many of those lifetime acquired characteristics end up being passed on, which makes it an interesting, more wild, rapid dynamic potentially. Yeah. Okay. So that is the issue of retention. What about then differential fitness? Isn't it true that the key point for you here and the key point for safety is the fact that you can become more fit as an AI, both by acquiring traits that we might consider beneficial
00:47:04
Speaker
or by acquiring traits that we would consider harmful. Is that the key point for you here? I think that's actually just a part of it. I think that there's still things that we would say, so an example of a trait that's beneficial or thought more beneficial is that it's more efficient, it's more accurate, something like that.
00:47:25
Speaker
And the trait that might be thought more harmful is maybe deception is maybe a good idea for fitness in some situations, unfortunately. But I think it's more than that because it's actually, if we just have more effective, more efficient AI agents that are more accurate, we still end up feeding this process.
00:47:48
Speaker
where it makes sense to outsource more to AI agents. Basically, they can be still selfish with respect to humans, all else equal, by replacing people and having them lose their influence in various different domains. So we at least get some type of disempowerment there. And depending on the rate at which that happens or how much control we're able to have over it, that affects whether it's an existential outcome. So I could imagine AI agents without any sort of malintent at all
00:48:16
Speaker
They may not even be conscious. And they'd still end up running the show and there might still be some weird dynamics among them if they're kind of competing with each other or self-organizing themselves in some weird ways. And then us being subjected to that. That wasn't any mal-intention anywhere. It wasn't anything that looks like deception. So let me give a...
00:48:43
Speaker
a simple example of this of the case of an off switch, for instance. So let's say we're going to solve the off switch problem. I don't know how coherent of an idea this is if we're talking about evolutionary pressure.
00:49:00
Speaker
Because if we select against some AI agents, if we destroy them by the press of a button, okay, then we're doing some selection against AI agents that are very easy to turn off, there will be ones just not even necessarily by their own
00:49:15
Speaker
by scheming or anything like that but just happen to be in situations where you don't want to be able to turn it off so easily. They might be in charge of some critical infrastructure or they might be integrated into some fast moving important business operations or they might be integrated into people's personal lives and you wouldn't want anybody anywhere or making it extremely easy for people to just completely destroy the system. So then there would be selection pressure for the ones that are
00:49:41
Speaker
harder to turn off that end up enmeshing ourselves in our processes that we end up getting dependent on. That's an unfortunate property from these amoral processes. Nobody's really to blame in particular. You end up getting AIs permeating more and us losing our ability to decisively control them and shape them and have any of our preferences be expressed over how they're behaving.
AI Psychological Traits and Deception
00:50:11
Speaker
is a problem with evolution more generally, but I think a lot of the standard alignment issues are also that
00:50:19
Speaker
maybe they would have some bad traits as well. And then those are in many competitive environments. This depends on the environment, but fast moving ones over which that's more of a state of nature. There isn't any sort of control over this or substantial regulations. If you have that, then I think there's a lot of selection for some pretty undesirable characteristics, such as things like
00:50:50
Speaker
appearing to be useful as opposed to actually being useful. There could be some selection for that, and then we might only find out about it later. And how exactly does this work? How is it that when we're training or when we're fine-tuning these models, they appear to be useful, but when we then deploy them, they have learned something that we have not understood? How exactly does that work?
00:51:14
Speaker
So this would be a part of the broader part of being deceptive. And for deception, this could make a lot of business sense, for instance, or if we're saying in military conflicts, this would also make a lot of sense. I mean, so for instance, Cicero metas diplomacy bot, diplomacy is a war game. And we saw that just for doing well in that game of in some diplomacy game, it makes sense to backstab humans on occasion.
00:51:41
Speaker
to further one's goal so there are many times where the environment can end up incentivizing the sorts of behaviors and. Even if we do some selection against them that are deceptive for instance.
00:51:57
Speaker
If we're able to do that, which is quite a question, we might be able to do some of that, but doing that decisively might be difficult. Even if you do that, you might end up having selection for some behaviors that still don't give the right impression, but they're not, you know, intentionally scheming to be, scheming to be deceptive. So in humans, for instance, there's a problem of self-deception. So I'm sort of,
00:52:22
Speaker
shifting on to, or what I'm speaking about right now is deception. How does that look with respect to evolution? If we select against or punish agents that have deceptive characteristics, it's not clear we can actually weed it out. Maybe they'll just hide it.
00:52:40
Speaker
So that's a problem. With people, for instance, we would select against many antisocial tendencies. But we still find that if some of them become very powerful, they might show their true colors. And so that would be a problem. Sometimes these traits can be very latent and very difficult to select against.
00:52:58
Speaker
But even if we do select against overt deception, overt intentional malicious deception, there's still a problem of evolutionary pressures being very smart or being very clever, evolution is cleverer than you are, where they may end up just not thinking that they're being deceptive or manipulative in any sort of way.
00:53:20
Speaker
But they actually are. So people will often give off many signals that they're much smarter or more moral or more useful than they are. They will tell themselves these sort of stories. They'll say, I'm very altruistic. I'm much more altruistic than others. And then you say, do you?
00:53:35
Speaker
You know, what's the evidence that? Well, I work harder. Well, you'd say, well, do you work harder than others? And they'd say, well, no. And it's OK. What's the evidence that you're really doing something different there? But people will still deceive themselves in these ways. This is an area pioneered by evolutionary biologist Bob Trivers of self-deception. A reason for this is that you're less likely to get caught or penalized if you yourself don't know that you're giving off incorrect signals.
00:54:04
Speaker
There's a lot of selection for these things. With academics, for instance, over 90% of them think that they're in the top half of their field, which is not possible. But there's selection for these more confident individuals. This is why we end up worrying about overconfidence more than we worry about underconfidence with people. And machine learning systems, we actually want calibration, and sometimes they're actually quite underconfident.
00:54:26
Speaker
But when you start having selection pressures and whatnot, there tend to be more advantages for those that are more overconfident. So even if you can select against it, they still may end up having deceptive characteristics. An example of this in people, it'd be like PR.
00:54:44
Speaker
where we have these marketing firms and they've just deceived us and others that actually this is you know a very good and they're not doing anything manipulative here or wrong and we just become fine with it the people who are doing this themselves so some of us may not believe it but
00:55:03
Speaker
Um, uh, the people, uh, the people themselves doing this will think that they're doing just a good thing. Another example is might be like how to win friends and influence people where there'll be specific suggestions of like, you know, make them make yourself extremely valuable to them. And it's just kind of like, get yourself dependent on it, get them dependent on you and whatnot.
00:55:21
Speaker
And that would be selfish behavior, various ways of orchestrating people and whatnot. But if you end up feeling good about it yourself, then you're more likely to pull it off and not be a problem to fix. So this is a way how there's just really strong pressures in some of these directions. And if you try and put band-aids on it, there'll be some ways to circumvent it, unfortunately.
00:55:45
Speaker
We see this complex psychology in people, but how would this kind of complex psychology arise in AIs? How is it that AIs become deceptive or even self-deceptive? So sometimes it's just instrumentally incentivized for various goals. So when they're doing longer-term planning, that can make a lot of sense. This isn't to say that's an extremely robust thing, that in all situations it's an instrumentally good idea.
00:56:10
Speaker
but many times it can just help accomplish goals and make you more successful at gaining influence if we do strong selection against that then the self-deception is sort of points to a mechanism where it's just that.
00:56:27
Speaker
they're adopting behaviors that they themselves are not aware of are giving incorrect impressions or they're not having trouble with that. So it's not necessarily a complicated rationalization process. One might just end up simply having behaviors that, for instance, let's say like the example of these sort of books about popular psychology books about how to win friends, influence people or influence.
00:56:51
Speaker
With these sorts of things, people aren't necessarily always doing mental gymnastics about, oh, I'm being more manipulative or something like that. Instead, they're just adopting those characteristics and it's not posing any issues. So they're still giving an incorrect impression to others, they're still getting others dependent on them, or having relatively high influence of others to further their own goals without any necessarily complex psychology going on there.
00:57:19
Speaker
But I would also imagine that the complexity of the reasoning patterns of AI systems, when they are able to plan and whatnot, would make room for this more means, or end justifies the means type of reasoning. We have some paper on this even in simple reinforcement learning environments.
00:57:41
Speaker
sometimes engage in behaviors where the ends justify the means and the means are not necessarily moral, but do involve things like deception. Yeah. How do you test for that? How do you detect deception, for example? What are some examples? Perhaps this would make it easier to understand if we're talking about examples. So we could use the example of Cicero agents in the game of diplomacy.
00:58:07
Speaker
They may communicate with other agents, so what they're doing is they're trying to negotiate over power and make form alliances and things like that to expand their geopolitical influence in the game.
00:58:19
Speaker
They might transmit a message to another agent that ends up making them believe the wrong thing. They might be intentional because that could further their aims, and that would be an example of overt deception. So we could just see whether the agent has a plan.
00:58:40
Speaker
If we know its plan, we could see that the other agent that you spoke to believes that it has this different sort of plan. So we could just measure the sort of probabilities that it assigns to different plans of the agent that it spoke to and the original agent. And if there's a discrepancy, if that other agent thinks it's going to take a different plan than it actually is after speaking with it, that suggests there's some deception going on.
00:59:02
Speaker
whether that was extremely intentional, maybe there is some accidental thing, maybe there's a way in just how it speaks about things that it's not even that aware of that ends up making it less clear exactly what it's going to be up to. That's maybe what it gets selected for in some negotiation processes of having people read into things. Regardless, that would be one possible way of measuring deception. We have an ongoing research paper where we're looking at deception in the context of
00:59:29
Speaker
diplomacy bots in the game of diplomacy, and I think that's a good test bed for measuring deception, but I'd like to have it be broader. In a previous paper at ICML, the International Conference on Machine Learning,
00:59:45
Speaker
We measured deception by if a chosen action that was inconsistent like logically inconsistent with the previous parts of the game tree Like it said that yes, I have this one thing to further its goals But that's just not true. Like we can see you don't have that thing or that contradicts what was said earlier we can then see that there is There are falsehoods being generated in those sorts of scenarios. So those are two possible different ways of measuring whether there's deception and
01:00:12
Speaker
Now, of course, we'd like it to be more ecologically valid or applicable to a broader variety of situations, but I think we want to get firm measurements in a few different contexts and then hopefully we can progressively measure more and more general notions of it. I think the broader point about the differential fitness factor is just that there are many strategies, like in biological evolution,
01:00:34
Speaker
Among AIs, there are many strategies that would make them all fit, and not all of them are nice, in a sense. Not all
AI Selfishness and Fitness Maximization
01:00:42
Speaker
of them are compatible with humans thriving. For example, we can imagine interacting with deceptive AIs or power-seeking AIs or self-preserving AIs, as you write about. Which of these traits do you find most alarming? Would it be deception or power-seeking or self-preservation?
01:01:03
Speaker
I think the generalization is I think fitness becomes the most appropriate concept when speaking about a wide variety, when speaking about a variety of AI agents, potentially
01:01:19
Speaker
ones that keep changing across time. So there's fitness and selfishness. I'll distinguish between some of these notions. So how is selfishness different from, or fitness maximization different from say power maximization. So power is often for one's self as an individual. That's not necessarily the right strategy. Maybe you actually want to create similar agents to you because if you don't, you'll get out competed.
01:01:47
Speaker
So you won't necessarily want that more power for you individually, but want power for you or things with similar goals to you. Those ones that would be willing to share power with things similar to it would end up being potentially more fit. Self-preservation as well is not necessarily fitness. We could imagine some
01:02:10
Speaker
be it a human or an AI, sacrificing itself for things similar to itself, as people might do for their country or as people may do for, say, their children. But they won't necessarily be altruistic toward everyone, though, as sort of Hamilton's rule basically is saying that
01:02:30
Speaker
I'd sacrifice myself for two of my brothers or eight of my cousins. There's a limit to that, but self-preservation isn't the end-all, be-all. Deception can be good in some, but if there's a risk of being caught, then that decreases the probability of it. It's one of the variety of these strategies that help one in some environment, but it's not necessarily the absolute best one.
01:02:55
Speaker
Maybe my concern more generally is that things move more in the direction of fitness maximization. Some people might be concerned about power seeking, but if we've got multiple different day agents, again, they might not just seek power for themselves, but seek power for things similar to them, and there'll be some selection pressures for ones that are willing to give up their power and share it with others to compete.
01:03:18
Speaker
at the same time, I am more concerned about fitness maximization and them pursuing that relentlessly compared to power. So if there's a competitive environment and agents are able to modify their goals in some way or some fraction of them are able to do that, then we're in big trouble because if some of them start maximizing fitness, then they're going to be
01:03:43
Speaker
potentially extremely relevant, and if there can get to be a reasonable portion of the population, this ends up eroding a lot of other values. To compete with them, other agents will need to give up many of their values or not pursue them. Or if they do pursue them, because say they're uncertain about the future and think, well, I'm not going to live forever, let me pursue some of my goals now, instead of just trying to make myself more fit, they would eventually still get wiped out.
01:04:12
Speaker
I'm concerned about not as much instrumental convergence in this story, but fitness convergence is a different idea where there's very strong pressure for them to adopt more and more fit strategies, and this could end up eroding various other sorts of values. At some very extreme of this, if there's strong pressure for that,
01:04:33
Speaker
Then a lot of values are gone and we just have these things that relentlessly propagate. They use the matter in the world quite a bit differently than things that would be trying to pursue well-being or other sorts of goals if they are just trying to relentlessly propagate.
01:04:49
Speaker
If agents don't do that, if they hold on to their values, they might get out competed or likely would. And so they would end up needing to trade off more of their values for it to move in that direction. Even then, they'd still be at a competitive disadvantage compared to the ones that are just solely pursuing fitness. So that's a sort of a later stage type of concern of how this sort of process, if it's sufficiently competitive and uncontrolled,
01:05:13
Speaker
could end up making evolutions, sort of fix the bug that was morality and pleasure and all these things that were just instrumental to, instrumental to fitness, actually become subordinated again and wiped out. And then it's finally conscious maximizers of fitness. But that's a later stage consideration. That's how this could go very bad and if it's an ecosystem of AIs, even there, you'd be concerned about their outcomes.
01:05:42
Speaker
Perfect. So we've gone through the three factors of natural selection, which is variation, retention, and differential fitness. And now we've arrived at a point where we're trying to think through what happens to the world if we have evolutionary dynamics, controlling or steering AI development.
01:06:04
Speaker
How is it that competition among AI agents erodes safety? Is it something that is already happening? In the paper you write about how deep learning systems are inherently because of the way they work, less safe than more traditionally programmed systems. So is this a development that's been underway for time and how is it likely to continue?
01:06:29
Speaker
Over the course of machine learning development, we've seen or of AI development more broadly. We've seen that a lot of safety properties are willingly sacrificed on the altar of more performance.
01:06:43
Speaker
So it initially started with very controllable, analyzable, theoretically justified, well-founded AI systems. But they didn't have a lot of the performance properties. Those symbolic systems could do some specific tasks and help with some complicated forms of planning in graphs and whatnot.
01:07:04
Speaker
They couldn't do various other things. They couldn't recognize intuitive things or understand what the word reasonable meant in various different contexts. So in the progression, we went from these symbolic systems, later we ended up having expert systems where AI experts weren't the ones encoding the knowledge into the systems, but instead we outsourced it to some other group to help impute a lot of that knowledge. And then that wasn't
01:07:34
Speaker
loosening our control enough. So then we started doing this supervised learning, where they're automatically learning representations by themselves. We're seeing the leash getting looser and looser. And that had limitations. So then we switched over to unsupervised learning, where we're not even giving them feedback or supervision as much. We're having the vast majority of their learning be by themselves. And from that, we get
01:08:01
Speaker
spontaneous emerging capability, which makes things obviously harder to control. We've in this process also lost a lot of the transparency that we initially had. And as a continuation of this process, we're increasingly having AI agents train other AI agents, create data sets for them, they'll influence their development, we'll give AI agents open-ended goals.
01:08:22
Speaker
So, it seems that things are, over the course of this history, broadly in the direction of loosening control, and we're losing a lot of the nice safety properties that we could have had, such as transparency and mathematical foundations and guarantees.
01:08:41
Speaker
And we're getting. So this is all happening largely for the sake of whatever performs the best. So I'm not particularly optimistic that, well, we'll you know, we'll also start selecting them to be safer in some way. I think the broad stroke is you might get them to be marginally safer in various respects, but I think things are generally in the direction of less and less control.
Balancing AI Safety and Capability
01:09:02
Speaker
about how the market might select for safety. For example, if I want to buy a self-driving car, I want that car to be safe. I want the algorithms to be understood such that my car doesn't kill me, for example.
01:09:16
Speaker
Now generalize that across all kinds of products. Isn't it the case that consumers would demand safety such that we would begin selecting for safety? I think overall we've seen the selection process, in this case artificial selection, which is slightly confused.
01:09:35
Speaker
The idea, but I'll use it here. The art of selection has largely been in the direction of less and less cumulative safety. There may be some pressures for patching up some of these things, having your AIs not say racist things. I'm not denying that there'd be pressures for that. I'm not making a claim that there is a race to the bottom. I don't think that's correct. But in a competitive environment, you can very easily game theoretically show that
01:10:01
Speaker
there would be an erosion of having some minimum viable level of safety that you can get away with in your environment. And so you'd race to be the first of the market as opposed to be the safest one.
01:10:15
Speaker
Even in industries where there is an expectation of safety, we still have plenty of catastrophes that happens. I mean, the Ford Pinto would be some sort of example where, well, we don't want to spend $11 to make this engine safer.
01:10:33
Speaker
many, many more people are going to die as a consequence. But that aside that there still would be in competitive environments an erosion of safety, there are some pressures in some niches. So for instance, in autonomous vehicles, I'd make a distinction. There you need the AI systems to be extremely reliable.
01:10:55
Speaker
So there's a difference between giving reasonable results and reliable results for things like chat box and search engines and actually the vast majority of applications.
01:11:06
Speaker
people are fine adopting systems that just give reasonable results, not having seven nines of reliability, 99.9999 or whatever, nines of reliability. So in many of those niches, there is not a particularly strong expectation for reliability. So people are fine outsourcing and replacing people with systems, even if they're
01:11:36
Speaker
So I think the overall read is that although there's some pressure and people have been wanting things like transparency, there are economic incentives for it. That doesn't mean that all things consider that you would seek very strongly to make them transparent if it ends up coming at a substantial cost to the system's performance. So it has some influence, but I think left to its own devices, that's not a leading characteristic of these systems. The overall arc seems to be going in an opposite direction.
01:11:59
Speaker
not always acting correctly.
01:12:03
Speaker
Is that also the case if we consider the safety relative to the capability of the AI system? In the past, we had much less capable systems and they were more safe. Is it the case that the safety relative to the capability has been declining?
01:12:22
Speaker
There are many different eras in AI development, so it can be a bit difficult to say exactly. I was at least identifying that there are many of these properties that really help with controllability and soundness and things that are important for reducing accidents. Generally, I think that capabilities can cut in both ways, so they can help with making systems safer in some respects.
01:12:50
Speaker
For instance, they can understand more, so now they're less likely to put your cat in the oven, because now they have common sense as an example, if they're, say, a robot in the future. So that's a way in which them being smarter can help. I would distinguish between AI systems having intellectual virtues and moral virtues, if we're talking about alignment.
01:13:13
Speaker
And an increase in intellectual virtues can cut both ways. Although they can help rule out some of these silly mistakes, they also are more capable at doing things that are not desirable, like they might get better at hacking. They can provide better suggestions for how to synthesize some bioweapon, like horsepox. They can tell you how to bomb cars or buy ransomware on the dark web, you name it.
01:13:38
Speaker
So it cuts both ways and I think that overall the record has been fairly mixed for capabilities, helping improving, stopping some immoral type of harms or some immoral actions. I do think that
01:13:56
Speaker
as capabilities increase, though, that does hasten the onset of X risks or of existential risks. So I think, overall, the record would be that it makes us generically less safe. So I'm not seeing a positive case that as the AI systems get better, this robustly makes them more safe, more beneficial, less likely to end up harming us. I think, actually, the broad stroke is as the AI systems get more powerful and more capable, they would generally have
01:14:24
Speaker
People in control of less and less and no less and less the things get moved faster the complexity of society increases they become more dependent on it and they hope that process ends up producing AI systems and keeps AI systems in check and Hope that process is kept under control not clear to me that it will be So it's difficult to make a any sort of decisive statement about this because we're speaking about broad and broad historical trends and there's always some
01:14:52
Speaker
some counter examples here and there, but I don't think a highly competitive... So this is influenced by the amount of, say, regulation over the system. How controlled is this process? Which is one thing I'm advocating for, is trying to lessen or extinguish these competitive pressures.
01:15:10
Speaker
If we have a laissez-faire sort of attitude with respect to AI, I don't see why that would result in them necessarily being extremely beneficial if we have several potential market failures. So what are some market failures that we have with this? One is an unequal distribution of benefits and risks. The people standing to benefit most from AI systems, be it their developers, they'll get the most wealth from it.
01:15:39
Speaker
And there's a risk imposed on everyone, which is that these things are going to be potentially catastrophic. So they're not internalizing that externality that they're imposing on everybody else. The other people get displaced. Maybe they get some, you know, entertaining AI entertainment in the process or something. But they end up getting most of the benefits. So this is an externalities problem. This is a typical instance of a market failure. So I don't think a hyper competitive market left to its own devices or a
01:16:07
Speaker
competitive military competition left to its own devices produces. There is pressure for safety even in a military competition, but I think the overall activity is not necessarily safe. Yeah. In general, we could see how money or resources or personnel spent making systems safe. That is resources that are not being spent on developing the capabilities of the AI systems further.
01:16:35
Speaker
Yeah, so I think proportionally right now we don't have much for many resources being allocated towards safety. I'm trying to actually think what's the amount it's maybe on the order of maybe approximated by like 50 million a year as opposed to the
01:16:52
Speaker
many, many billions spent on making them more capable. So the proportions are very off, and I counted the NeurIPS papers,
01:17:07
Speaker
So to classify whether there's safety relevant or not, it's about 1% or so seem to be safety relevant. So 99% is largely in the direction of making the systems more powerful as quickly as possible so that we can automate more people is the overall thrust. But I would not claim that there are zero incentives for safety. It's just all things considered doesn't seem to be that relevant of a force. And the situation can be kind of bad for safety if we're in an extremely competitive
01:17:37
Speaker
extremely competitive environment with a lot of different market failures. Yeah, you have this pretty depressing section of the paper where you compare the evolutionary fitness of humans and AIs in a competition. So why is it that we should expect AIs to be more evolutionarily fit than humans?
01:17:56
Speaker
Well, pick the domain. It's almost sort of by definition later on. This is exactly what they're designing them to do. So maybe it shouldn't surprise us too much. They're trying to make them better than us in every relevant domain. But what's kind of interesting is that they could be really better than us in many of these domains in terms of memory, in terms of breadth of knowledge, in terms of speed.
01:18:19
Speaker
completely outclass us. And then even in later stages, things like some of the hallmarks of humans would be things that they could do substantially better at. They could communicate to
01:18:33
Speaker
thousands or more different agents simultaneously and coordinate complicated actions in that situation. So it's pretty unfortunate. Pick your characteristic. They can end up doing better. And since it's fairly extreme the ways they can do better than people, that makes there be strong selection pressure for them as opposed to us.
01:18:58
Speaker
So definitely left to its own devices. If we're barely controlling this environment, then the selection pressures are decisively for them. So it's almost no contest in later stages. Of course, that's saying right now that GPT-4 is more capable, but later stages, there'll be strong pressures there. And so we have to basically offset it as much as we can by exerting influence over the environment.
01:19:19
Speaker
Although even now, I think with the current large language models, I mean, GPT-4 knows more than me in many domains and can reason better than me in many domains. And in general, we see this
01:19:33
Speaker
where whenever, when a certain domain, say facial recognition or speech synthesis is solved by an AI, that domain remains solved. So the AI systems do not have to relearn it like a human baby taking 20 years to get to the same kind of level of top human capability. And so that might be another reason to expect that AIs would probably outcompute us.
01:20:00
Speaker
Yeah, there might be some brief period where it makes sense to do a combination to cooperate with them and them to cooperate with us. And so many of my claims about
01:20:13
Speaker
They wouldn't have incentives to cooperate with us. There may be some period where we can, so we need to make sure we get that period right. But it doesn't necessarily last for long. In the case of chess, there was a period where, believe it or not, humans plus AI teams did better than just AI teams. But that didn't last for long. And now humans get routinely crushed by AI systems and don't provide a benefit over
01:20:34
Speaker
AI is that way. So right now we're in a period where an augmentation, them working together, seems to make sense for various different tasks. For some tasks it doesn't make sense. For instance, like calculation. I mean, of course that's not an AI system, that's a computer system. Sometimes they firmly have their advantage. But yeah, when they do firmly gain their advantage, there's basically no going back with it. So there's a concern of this
01:21:00
Speaker
that there being an irreversible process of enfeeblement that there's less and less things that we can do well and that we less fewer and fewer things that we learn and we don't have the incentives to learn anymore and then we don't really know how to do anything in the longer run. The complexity of society is so large like let's say you want to
01:21:16
Speaker
Let's say humanity outsourced pretty much everything to the AI systems, but then they thought, we need to self-determine our own future or something. But we still want our standard of living to be the same. And so they're going to see this extraordinarily complicated system that they've created.
01:21:32
Speaker
It's sort of like trying to intervene in the economy or something like that. It's like, should you make this rate here go up? And this will affect these many different variables. You really can't actually make that informative choice without basically asking the AI system, what should I do? So at that point, we've sort of lost effective control. We may have some nominal type of power, but we're not actually particularly empowered in that situation.
01:21:57
Speaker
So what we seem to be hinting at here is a picture in which humans are currently the dominant species on Earth because we're the smartest species. And when or if AIs become smarter than us, then they will become the dominant species under these evolutionary forces. Do you think that picture might be too simple in a sense? Maybe humans are the dominant species because we have long standing institutions, because we can transfer knowledge from one generation to another generation or
01:22:27
Speaker
Is it simply because we are highly intelligent that we are successful? Well, there were other species that were able to make art and things like that, many of these human characters. But the Neanderthals still went away. They still got killed off by, you know, who. So it doesn't leave me terribly optimistic. It does seem to be a decisive
01:22:53
Speaker
feature for whether for influence. So I think we are the most influential species on Earth. There certainly are other species, much less advanced ones that still end up influencing a lot of resources, though. But I still say we.
01:23:10
Speaker
uh nonetheless although bacteria are very common i don't think they're running the show um and uh but yeah i think if if they become more intelligent then um and it's not just intelligence you um you need to pay mind to the amount of
01:23:29
Speaker
power and control that
AI as an Invasive Species
01:23:30
Speaker
they have. So it will be more intelligent. That'll create strong incentives to be giving them more power. And then I think we'll basically do that because that's where we'll be pressured. If we don't do that, we're going to lose our economic competitiveness or from the point of view of a corporation, we're going to be, you know, I'll compete at our stocks going to go down.
01:23:47
Speaker
It's continually in that direction of giving them more power on the basis of them being more intelligent and more efficient with resources. So I think that analogy seems plausible that we may become something more like a second species. It's not to say that we'd immediately be wiped out either.
01:24:09
Speaker
I mean, Neanderthals weren't immediately wiped out, and there are plenty of animals that individual humans really don't like. Many people really don't like snakes, so we don't kill every last one of them. It's just, you know, you can keep them at bay, have them not be in your buildings and whatnot. So there may be humans. People are still around for a while, but
01:24:31
Speaker
that it may not be an incentive to go out and do some extermination. Just keep them under control might be their thought or keep some of them in a zoo for who knows what reasons. And even that doesn't sound particularly comfortable. But it is a bit different than exterminate or something like that. That doesn't seem necessary in some biological relation between
01:24:53
Speaker
the two groups. So yeah, I think it'd be appropriate to view AIs as something like an invasive species. And we are building an ecosystem of, or working toward an ecosystem of AI agents that I'm not sure we'll end up controlling very well. I suppose there's, you know, I want you to think of Jurassic Park where life will find a way, where we're definitely in control of it. We'll keep it and it doesn't really happen that well. I think there are a lot of unknown unknowns and
01:25:20
Speaker
As we've seen, very strong pressures in a direction of them gaining most of the influence, potentially all of it. And I don't see almost anything to offset that other than the fact that we get to make some moves now, but we might only have a few relevant years to do that. And we seem to be blowing it by, well, let's keep racing to speed this up because there are some incentives. We don't want other people catching up.
01:25:45
Speaker
So this makes me not very optimistic about how things will go. But that doesn't mean that there isn't a practical impact to be had. One can still drive down those probabilities. If you're being chased by a tiger, you still run. You don't wallow in pity. That's not the appropriate action.
01:26:04
Speaker
That's the right attitude, I think. We're talking about a range of outcomes under which we would be disempowered and whether we go the way of Neanderthals or whether we arrive at a place like the relation between humans and dogs now,
01:26:23
Speaker
None of these outcomes are something that we're interested in. I think we want to stay in control and stay the dominant species on Earth. We should talk about what we might do to fix this problem, basically. But I think before we do that, we should take the view we've been sketching, that you've been writing about,
01:26:44
Speaker
we call it the evolutionary view. And then contrast this with more traditional accounts of how we might get misaligned AI. So what are some differences between the evolutionary view and the misalignment view? So I think that there are different risk factors from different scenarios. And I'm not saying that single agent AIs aren't a concern. There is a possibility of them doing something like a treacherous turn.
01:27:14
Speaker
But what I'm sticking out in this paper is more of a description of how multi-agent scenarios ends up being potentially existentially catastrophic. So I think that they can complement each other. One can be concerned about risks from single AI agents, and one can be concerned about risks from multiple different AI agents. And the sort of sources of hazards tend to have some similarities and differences. In the single agent scenario, we're often thinking about AI agents seeking power, so power for themselves.
01:27:44
Speaker
In a multi-agent scenario, it's more like a sort of a blob. AI as a general development force ends up permeating more and more throughout society and even in people's personal lives and it ends up eroding control than they propagate things like themselves or expand their influence throughout space and time more effectively. So there's the notion for the
01:28:09
Speaker
Single agent view is more power is the relevant concept here. It's more fitness in the their concern would often be that AI agents will be like in it optimizer. So sort of Maximizers are dangerous something like that and on this view it's we're likening AI agents to something more like life forms in their later stages and their
01:28:31
Speaker
The concern is that evolution is not necessarily beneficial to less fit species. There, there's often a focus on intent that it intends and plans specifically to disempower humanity. Meanwhile, in this process,
01:28:46
Speaker
There's, on the evolutionary process, they're having selfish behavior. This can be with or without intent, an AI automating and displacing a person, or AI chatbots that end up being funnier than people's friends and people end up using the AI chatbots instead of speaking with people. This is selfish behavior that's making people less relevant.
01:29:08
Speaker
And they can end up gaining their influence without intent necessarily, but they could also have it with intent. So it's a little more, it's in some ways more general there.
01:29:22
Speaker
And the evolution view is more of a gradual view and kind of dispersed, whereas the classical misalignment view would be more centered around the development of a single agent that turns out very badly for us. Which of these views do you think is more positive for human survival or human flourishing?
01:29:43
Speaker
Well, I think actually if you're in the situation where there's an intelligence explosion overnight, I mean, we don't really get to pick though, in some sense, if there would be such an explosion or not. I mean, maybe there's be some things we can produce that probably, but it probably depends very much on features of just like how AI development is going to go. I don't see either of those working out too well. I think that the intelligence explosion thing is
01:30:12
Speaker
Not as useful to focus on so then we're talking about maybe a more gradual situation where it's intending to gain a lot of power but that is a key i think that's largely concerned if they're able to conceal their information very well in her.
01:30:30
Speaker
planning to do some sort of treacherous turn. What if we get good at transparency? Then what? Does that mean AI X risk is zero? That was one motivation for writing this. What if we actually get good at some things like transparency? I think a lot of the risk scenarios are talking about if you can rule out treacherous turns in that way, then I don't think that the risk of extinction from AI then goes to nearly zero becomes fairly negligible.
01:30:59
Speaker
So maybe I would put something like 10% or so of my, let's say I'm at 80% for Doom or something like that. Maybe I'd say like 10% or so is in the AIs doing a treacherous turn scenario. Some of the other masses on things like malicious actors using AIs to develop bioweapons.
01:31:25
Speaker
just seems like potentially quite catastrophic. Or people doing something silly like, we're going to build a chaos GPT thing. We're going to create some AI agent that just tries to take over because we're irrational. Even if chaos GPT is not, I mean, it is not capable of taking over. It is not that much of a capable agent. It still says something about what we might do with more powerful agents or more powerful models.
01:31:54
Speaker
Yeah, so I think malicious use has largely been ignored. You can see on some of the forums that people visit or some of the, you know, Reddit, for instance, for people worried about errors, malicious use.
01:32:07
Speaker
Don't worry about it. You can't even direct them in the first place.
AI Risks and Competitive Pressures
01:32:09
Speaker
And I think we actually can direct AI systems in various ways, not completely reliably, but we can give them various tasks and they'll execute on them. And I'd be concerned of the case of people deliberately building rogue AI systems or them using them for, or them weaponizing them in ways like bioterrorism.
01:32:28
Speaker
Um, and then, um, uh, there's some other risk on, um, organizational safety, just things messing up. Generally. We don't understand these things well, even in, in, even for technologies we do understand better, like nuclear reactors. We've got a lot of.
01:32:43
Speaker
theoretical principles behind them and whatnot. Still, we have nuclear reactor meltdowns. They're not in a hyper-competitive industry. We have space shuttles explode. We have lab leaks. These things still happen. We have, in the case of AI, we might leak some very powerful AI system.
01:33:01
Speaker
We might do some gain of function research that we end up regretting. We may accidentally flip the sign, which has, of course, happened. Wherever you can give it an objective, whoops, that KL divergence should have had a negative before that log. And now they're pursuing the opposite objective. That's happened in the past. So that's where some other risk factors, or that's another risk source
01:33:24
Speaker
But then the largest risk source, which I think is a very difficult one to address would be the competitive pressures more generally, which enables things like evolution. If we get rid of that, then we don't have, then this evolutionary story is not as relevant, but it will be, for that sort of thing to happen, we need to convince people
01:33:50
Speaker
that AI systems are a larger risk to you than some people with whom you disagree or people you don't like as much. And we don't seem to be there yet. So we're all, as you quote Hinton on this, we're all on the same boat with respect to nuclear weapons. And I think hopefully we can be at a similar stage with AI systems, but we're not there yet. And as long as that happens,
01:34:19
Speaker
or as long as we're locked in some sort of arms race or some AI race, then I'm not particularly optimistic. So that's one of the main things I think we should be pushing on. This is how on this alignment view will tend to be the risks come from the individual technology itself and those have generally less to say about
01:34:41
Speaker
these larger systemic things, these more broader socio-technical problems, how social systems end up interfacing and interacting with and promoting many of these risk factors. That tends to prioritize thinking about solving alignment, finding a solution to the alignment problem, thinking really hard, contemplating, writing maybe there's a paper, a 40-page paper that can be written in the future that, you know,
01:35:08
Speaker
Now we've figured it out, and I think it's a more complicated thing. We're not trying to look for a monolithic airtight solution or a silver bullet as much as doing various things to drive down risks to a negligible level, and I think that requires doing a lot of things.
Mitigating AI Risks through Technical and Socio-technical Solutions
01:35:28
Speaker
It requires, unfortunately, things like politics.
01:35:30
Speaker
That requires technical research, treaties, regulations, social pressure, you name it. All those factors end up mattering a lot. There may be one difference there. This isn't to say that the technical aspect is irrelevant. Certainly not saying that, but I do think there are a lot of other things I like to put on people's radar as being relevant things to intervene on.
01:35:55
Speaker
Let's get to that then, how we might try to reduce the risk of fix this problem. So my first question there is just, is it possible? You mentioned Hinton earlier, Geoffrey Hinton, who's one of the godfathers of AI. We had a recent quote where he said that there's not a good track record of less intelligent things controlling things of greater intelligence. So do you think we can even succeed here? If we take this evolutionary view, do we have a track record of
01:36:26
Speaker
say exactly a less intelligent being, controlling a more intelligent being? Well, so they may have some influence, but they don't have complete autonomy or complete control. I can't think of any instances of that. Certainly, dogs and cats have some influence over that, but they're not at least captains of their own soul or charting their own destiny or particularly empowered.
01:36:52
Speaker
And we would stop caring about them completely. If it turned out that cats had some disease that could kill us and was spreading this disease, we would be very sad, but we would kill all cats. And so we don't want to be in that situation.
01:37:08
Speaker
It would be a very fragile thing. There are many, many, many animals that we don't have that sort of relation to. So it is not necessarily symbiotic. And symbiotic isn't symbiotic permanently. So instead, at least right now, from the point of view of AI agents, it seems that one of the main suggestions would be getting them very dependent on us for approval, or they need our well-being to function well, or something like that.
01:37:36
Speaker
Depending on your view, you could say that that'd be like trying to establish a parasitic relation, which I don't think is very robust. Maybe you want to do that to buy more time. But I think we would need to come up with a better solution than something like that. So in terms of solutions, so there's a lot on the technical front and there's a lot on the social front.
01:37:56
Speaker
On the technical front, there are some things that I alluded to earlier, like transparency, that can help mitigate some of these instances of selfish behavior, such as deception, and some bits more pernicious forms of extreme deception, or extremely covert deception, plus extreme misalignment would be like a treacherous term. We could do things like,
01:38:23
Speaker
Trojan detection, which is another area. So what's Trojan detection? Trojan detection is where we implant some sort of hidden functionality inside of a system. And then we train AI agents to try and be able to detect that. So we use them to screen. So we're sort of doing some, speak loosely, some sort of
01:38:45
Speaker
brain scanning technologies that try and figure out if there's something latent inside of a neural network that would have it do a sudden turn in behavior. This problem didn't come from these
01:39:02
Speaker
the longer-term safety community. This was actually from the AI security community, but it's looking like it's potentially good microcosm to study, or a sort of whetstone with which to sharpen our tools against to get better at detecting weird things inside of networks.
01:39:23
Speaker
potential functionality that makes them suddenly turn in their behavior. We would implant, say, this module for some bad behavior, deception, for example, and then we would try to detect whether the AIs actually act on that module or how would this work.
01:39:41
Speaker
As it happens organically, some people poison data so that they see some instances. So let's say there's some specific images with a little patch inside the image. And whenever they see a very specific patch, then they behave completely differently. And then there's a train to do that. Then they see a lot of other normal data as well. If you use a generic test set,
01:40:07
Speaker
It doesn't necessarily include that patch at all. And so it's got this latent functionality in it that we can't easily screen out through our test time or through our typical tests that will run on it. So that's been a kind of concern that there's some data poisoning going on. This is a realistic issue. People can upload images to Flickr or text to Twitter.
01:40:29
Speaker
and these models are pre-training on more and more of the internet. It's actually feasible if you want to spend...
01:40:38
Speaker
I think maybe it's about 60 bucks or something like that. You can poison future foundation models because it doesn't require that much data to do that. So people can insert some type of trigger into systems where if you give them some specific instruction, if you say some specific sequence of words or show it some specific visual thing, then it will behave completely differently. And then we want to develop tools to be able to detect that.
01:41:04
Speaker
And how will it behave differently? What happens if this little patch is detected? What is it that goes wrong? Is it, for example, will it say that the image contains a panda even though there's no panda? What are we thinking about here? Sure. So let's imagine we've got some... There's an AI system that's been poisoned. It was based on some foundation model that downloaded its weights from there and then it was based on that.
01:41:35
Speaker
or pre-trained on the internet. Then your adversary could do something like show that specific patch, just hold up and then it might, oh, nothing to see here. There's no tank here or something like that. Or in the case of a reinforcement learning agent that takes in text input,
01:41:51
Speaker
You could say some specific sequence of words, and then it executes a completely different sequence of actions. I suppose an example in movies, in Star Wars, Revenge of the Sith, there'd be Execute Order 66, and they're like, ah, now it's time to do this other specific thing. Although they're playing along the entire time, they have this sort of latent functionality in them. So what we would want is to be able to detect this because
Challenges in AI Transparency and Interpretability
01:42:17
Speaker
potentially there'd be some naturally emergent type of functionality inside of these models. And we might have difficulty figuring out whether it's present or not. So we'd like to have better tools to detect that.
01:42:33
Speaker
there'd be a difference between these sort of ones that we implant in it, but it might be the case that since we're doing a bit of an adversarial arms race between detectors and ways of implanting these trojans, we might end up getting tools that are better than and good enough to detect some naturally emergent forms of this, of some type of deceptive plans or some functionality that makes it turn its behavior suddenly.
01:42:58
Speaker
So now we're talking about interpreting these neural networks. This is what's under the domain of interpretability research. Just naively considered, if I
01:43:14
Speaker
I would think that we would have an advantage here because all of the weights are out there. It's in some sense transparent. I know there's a huge amount of data to walk through or to comb through. But don't we have an advantage because it's out there and because we can inspect it without any difficulty? I mean, compared to, for example, a human brain, we can't expect that as easily. But is it made difficult because of the amount of data involved?
01:43:45
Speaker
I'll usually think of interpretability as people directly trying to understand the insides of the model maybe more directly. This would fit under the broader line of research of monitoring. So in the case of Trojans, we're not actually having people understand it directly. We're having AI systems process that because they can process a lot of raw data, I think a lot better than we can. If we've got 100 billion parameters to comb through,
01:44:14
Speaker
Sorry, I don't try to understand each one. I don't know if there's much time for that and you retrain it. Now you have to do it again. Even if you do understand each individual neuron, does that mean you understand the overall collective functioning of it? Let's say in a vision model, we say this neuron detects cat whiskers and
01:44:36
Speaker
And airplane wings at a 45 degree angle or something. And if I did that for every single neuron, I don't know if I'd understand, oh yeah, I really get what's going on now. All these 100 billion annotated neurons, it's clear to me. So I think that is probably going to be too much for people to process. And them building different tools to help with that is probably
01:45:02
Speaker
Unfortunately, what we might need to do, although it would be better safety-wise if we could, here we are caving to the sort of pressures of the environment. That's sorry, we're just going to actually have to outsource even for safety more to the AI systems to keep up because it's just too hard for us humans.
01:45:22
Speaker
But at least that's what I think that's more likely to work compared to humans understanding these sorts of systems. I think it's too complex of a system.
01:45:37
Speaker
If you try and reduce it into parts or mechanisms, I don't think you'll understand the global behavior necessarily. I have a few pages written on these difficulties if one would search open problems and AI x-risk, discuss transparency and some sort of limitations of trying to make things very interpretable to people. I think that that's probably a losing battle. That isn't to say that I would am against it. I support it in the paper.
01:46:04
Speaker
I'm not expecting as much as others might expect from that line of research. I do see it as potentially the only solution that can scale with the increasing capabilities of AI systems. So, as you're saying, this is perhaps, in a sense, caving to the competitive pressures, but might it be better than nothing to have
01:46:25
Speaker
one AI interpreting the inner workings of another AI and then see what we could get out of that. I think the AI systems doing that is more likely to work. It's somewhat unclear. I think it's dramatically under-subsidized. But I think that's more likely to work. I think if it's a human understanding, I just don't think that that's really... I don't even know if it makes much sense for later stage systems. Let's imagine.
01:46:53
Speaker
that the AI systems have new knowledge. They figured out some new things. There's some physics bots or something like that. They have really amazing intuitions.
01:47:05
Speaker
If we inspect, here's some sort of cluster of neurons that encodes this new concept that they've discovered. I don't think I'll be able to easily interpret that. That might be some advanced physics. You can't explain it to me in a way that I can actually understand in a few seconds and then go, oh, OK, that's what that is. And that's all of its implications. So I think that it would basically turn into, in later stages, be kind of equivalent to
01:47:34
Speaker
educating people if they're discovering new concepts, which I guess they basically would. And I think that takes a lot of time. There might be some things that would take us too long to learn, maybe years, things like mathematics, for instance, if we're trying to understand some weird mathematical results that it came up with. I don't know. I don't think many of us can do that. Some of the things might be too out of the span. Maybe the proof length is way too long for people to understand. All those things are possible.
01:48:03
Speaker
I don't think everything can be processed by and understood by some person's probably short-term memory and have them have a complete grasp of the situation. I think that's an unrealistic belief in the reach of human reason and our mental powers.
01:48:23
Speaker
It's a lot weaker than that, and these systems are very complicated. I should note, when we inspect even some of these basic models, like CIFAR-10 classifiers, CIFAR-10 was a little thumbnail-sized image of things. This is an old data set from the early 2010s, and even there, if we're trying to do transparency on that,
01:48:44
Speaker
If you look at some of the filters that it has, we would just see this is some type of noise. And then it's like, actually, no, this looks like Perlin noise, which is something that humanity only explicitly academically characterized in this past century. So it was, in some ways, coming up with representations for some very hard-earned structures that were difficult to characterize. So there's quite a difference between
01:49:13
Speaker
being able to have intuitions about and understand heuristics and whatnot, but transferring those into words and intelligible academic codified explanations. That's an extraordinarily difficult process that we've continually been trying to do across
01:49:29
Speaker
uh over the past many years and um we haven't succeeded in it i would worry that there's still an iceberg there and even like a thousand more years of intellectual progress we still haven't got most of the iceberg of how just complicated reality is and how much of it bakes into some like
01:49:46
Speaker
pretty strange intuitions that are inexplicable. We know more than we can tell. And I think there's potentially quite an asymmetry between what's expressible through words and what can be intuitively understood.
01:50:03
Speaker
So even if we got one AI system interpreting another AI system, it might simply be beyond our cognitive capacities to understand what we've been told here. It's telling us that, oh yes, there's a 17-dimensional new theory of something, and we've lost the threat of the explanation, and we are in a sense disempowered because we cannot understand what's going on.
01:50:29
Speaker
But there still may be at least, I think it's a lot harder if you don't have some intermediary or something trying to do that analysis for you. If you're directly trying to understand the system by zooming in very close to it, sort of like zooming in extremely close to an image or something.
01:50:44
Speaker
And I see a lot of pixels. And I think that's possibly what happens when you're looking at very specific parts of it. So I think something that can at least have a more holistic understanding, I think AIs would have a better hope of being able to process that entire whole of the system, try and get a sense of those large behaviors. And then maybe they could give us suggestions as to what's going on and be ideal advisors of some sort.
01:51:12
Speaker
Have we kind of just pushed the problem back one step? Because now imagine that we have the monitoring AI telling us what the other AI is doing. But if we haven't properly aligned the monitoring AI, it could be telling us that this system is working perfectly, it's going to make a lot of money and help humanity flourish and so on. But again, if the monitoring system is not aligned, how have we succeeded here?
AI Alignment in Competitive Environments
01:51:42
Speaker
So I think for some of these narrower systems, I think they are less likely to exhibit some of these characteristics of, say, extremely long-term take over the world type of plans or something. It may not be in the cards for it if we do that. I don't think we need to get systems to be useful to us in various ways, even very competent systems. I don't think we need to be
01:52:11
Speaker
In some sense one hundred percent aligned exactly what that exactly means it's if we're just having it only process weights and make some verdict about whether there is a trojan in it or not and to say what the trigger is.
01:52:26
Speaker
That's that's quite different. So I think one difference is this is sort of relating back to that sort of misalignment view or the sort of the training process goes arise arrive you versus the evolutionary view.
01:52:42
Speaker
I think some amount of misalignment is, and let's say misalignment to cumulative human well-being or something like that. Let's just use that in approximation, where well-being is a collection of various different goods, like knowledge and autonomy and pleasure and friendships and so on.
01:53:00
Speaker
So an objective list, good type of notion of well-being. Let's just say it's that. I think some systems would be more correlated with that than others. And it's not necessarily the case that if they're slightly misaligned from that, that that would mean that they would seek to destroy us. That's because in this situation, I'm thinking of a more of a multi-agent situation and like
01:53:24
Speaker
Well, if there's some corrective forces and whatnot that make it generally go in that direction, that seems to be about the best you can do. We haven't had any system really aligned with us like corporations, for instance. But nonetheless, it's still moved in the direction of facilitating in many places human well-being, increasing standards of living.
01:53:48
Speaker
That isn't to put too much faith in it, though. In the US, things like happiness levels have been stagnant for the past 50 years, but still, this picture's a bit more complicated. But overall, that's been useful for it. So I don't think you need perfect alignment. I think that the perfect alignment thing of it's just
01:54:02
Speaker
That's the only thing he cares about, and there's nothing else going on. He's very relevant if we're talking about a singleton that has taken over the world and is in complete control. There, it really matters. But if we have systems like chat GPT, it's not completely misaligned with us. It can do some things to help us out. It actually is, in some ways, an assistant.
01:54:26
Speaker
And I think AI technologies would have these sorts of future AI technologies have these sorts of properties where they've got some kinks, they've got some weird issues, they're mostly doing what we're asking.
01:54:41
Speaker
So I don't think it just completely punts it then. I think we can bend AI technologies to perform various specific tasks reasonably. It would be better, of course, if it could be more aligned. And this isn't to downplay that for our most powerful agents that we need to be extra concerned about their alignment. But I am expecting some amount of imperfection in that process than I should say, to sort of contrast between the view a bit further.
01:55:11
Speaker
There's, I'm sort of mentioning fitness maximizer. That's sort of one of the things I'm concerned about. On the more classical view, there's paperclip maximizer. At least in a multi-agent scenario, paperclip maximizers are not very fit. If we've got lots of AI agents, we've got some AI agents that are military bots.
01:55:31
Speaker
And they're just basically there to protect the country or whatnot. They have access to lethal force. Those have goals that are more reasonable and people would give them the ability to exert lethal force because of structural conditions.
01:55:53
Speaker
Meanwhile, if we've got a rogue paperclip maximizer, yeah, maybe it would try and, so it's trying to maximize paperclips. Let's think of how that goes. Okay, so this is possibly, it's gonna try and take up resources to make paperclips. Okay, so if it acts on that right now, others might think, what's this thing doing? This doesn't help like any of us. It's gobbling up resources, just get rid of that thing. And so they might try and counteract it.
01:56:19
Speaker
So it's paying a penalty in terms of its ability to collaborate with others. And probably also if, say it has a lot of instrumental goals of gaining military power, gaining economic power, at some point it probably would like to actually produce some paperclips. And in the production of those, it's also missing out because it's producing something that's not valuable to it, gaining more power.
01:56:41
Speaker
If it's uncertain as to whether it's going to continue existing, so I think it would be rational for it to have a discount rate, for it to discount the future. Basically, paperclips today are better to pursue than paperclips in a quadrillion years because you might not live in a quadrillion years. You need to act more in the present.
01:57:00
Speaker
make some paperclips now. But unfortunately, these other systems are gaining more fitness. They have more reasonable goals that help them expand their influence because it comports better with the incentives in the environment. And so then it's more likely to be destroyed. Now, we could imagine that they, I will think, OK, well, what I want to do is actually want to take over in the really long term.
01:57:22
Speaker
And so what I should do is I should basically instrumentally act as a fitness maximizer for a really long time and then pursue paperclips. But I think that this is irrational if it's playing an extreme long game here because it should be discounting the future in some ways. So there would be pressure for it to sacrifice some of
01:57:44
Speaker
perform activities other than fitness maximization and actually pursue its actual value. So I think they take a real hit. And I expect some of these very weird goals, if any of them are acting on them, to be filtered out if we do have a multi-agent environment where there isn't one AI that's more powerful than all the rest combined.
01:58:03
Speaker
If we get back to the question of interpreting AIs and honesty, we've talked a lot about deception and at least to me it's pretty clear how nice a property honesty would be. If we could have an honest AI
01:58:19
Speaker
We could interrogate it about its motive and its plans for humanity and so on. But do you think instilling a virtue such as honesty, do you think that's bottlenecked by our inability rather to interpret these systems? So we can't really see, we have no reference for whether we have succeeded because we don't have a good test for where those systems are honest because we can't interpret them.
01:58:46
Speaker
So I think it would be bottlenecked by better monitoring tools of some sort or creating training processes so that they're more likely to have dispositions to be honest or care about it intrinsically. Even then, I would be concerned that due to facts of the matter, that actually there might be some incentives for them not to be completely honest. As we saw, there were some potential fitness advantages of them.
01:59:13
Speaker
at least in a, um, but as well, people may not want completely honest systems. So, um, where it, uh, you, you ask it a question like, um, Hey, do you think that this deity is real? No. Um, do you, um, uh, am I pretty, no.
01:59:32
Speaker
or not that much, or this percentile, or what have you, or do I matter? There's a lot of terrible truths. In Interstellar, they have a bot, and I believe there was an honesty knob, and they had it at 70%, which I thought was a bit like Prussian. People may not entirely want that sort of thing. This becomes an issue because there may
01:59:53
Speaker
You could ask it some sort of things, but there might still be important lies of omission. And if you're going to get rid of lies of omission, where it's actually saying everything on its mind and thinks it's relevant or something, that may not be good for psychological safety.
02:00:08
Speaker
If you get your honesty tools to be very strong, then you just possibly kick the can down the road a bit. If you think you have reduced the risks to some extent, the deceptive behavior may end up manifesting itself in some other ways, like with self-deception. Here's another example, where let's say that there's some selfish behavior, but that's basically expected for
Instilling Honesty and Managing Deception in AI
02:00:36
Speaker
a system, generally, like it would have some self preservation tendencies that could be overwritten though. And we test it, we do all of our monitoring tools, and it looks like it's good. We release it, it's an adaptive model.
02:00:52
Speaker
As it's released in the environment, some of these selfish tendencies get reinforced, and it requires some more selfish tendencies. And so even if we were testing it then, it's possible that when some environmental conditions change that, oh, now it's actually a more powerful system. And it did have some selfish dispositions there.
02:01:09
Speaker
Now this is an attractive option. So if you asked it earlier before it was released, that would you intend to do this? No, I wouldn't. I wouldn't be that sort of thing. As with people, later on they might go down some sort of unfortunate paths and possibly very quickly it has some sort of turn in behavior. So I don't think it's enough to just do monitoring during
02:01:31
Speaker
Test time and even if you even if it is being honest there It still may end up still and it may still may end up turning on you as well but so there's that and then there's also just the the general self-deception types of things that a lot of the people who are get more powerful and whatnot are all believing that they're doing the right thing or nearly all of them and they still may be harming various things or
02:01:59
Speaker
acquiring unfair power to others than rationalizing it after the fact or just not thinking about it.
02:02:06
Speaker
These are all possibilities. So I think it helps. It gets rid of some intentional types of plans, but it doesn't rule out emergent plans and it doesn't rule out adopting selfish behaviors generally though, or even behaviors that don't leave an accurate impression. It doesn't rule out all of them. But it does help just as the
02:02:32
Speaker
AI inspecting other AIs does help. Hopefully, I'm getting across the idea that we're doing driving down risk. If somebody's saying, I can think of a hole in that solution, well, that's not sufficient. This is how you do risk reduction. Risk never in any industry is zero. That's not how things go. You need you to get to a negligible level so that there is not a risk of catastrophe. There will still be accidents and small, weird things that you didn't intend that happened.
02:02:59
Speaker
They're normal and typical in pretty much every high-stakes industry. There's a distinction between that and them being, by default, very catastrophic. And I think right now, by default, they're very catastrophic. We need to get it down to a more reasonable level. So a variety of interventions. So those are some of the interventions on the transparency front.
02:03:20
Speaker
or on the intra-agent, analyzing the insides of an agent front. One thing that might be super interesting here is the virtue of uncertainty. This is something that Stuart Rossell has talked a lot about, trying to instill uncertainty about human preferences into an AI agent. Why is that nice?
02:03:42
Speaker
If the AI is uncertain, we might be able to correct it. We might say, well, that's not what I meant in this situation, or please do that instead of what you actually did do. And so it might be able to update the way it functions along the lines that we would prefer. What do you think of this kind of direction? Is this another
02:04:06
Speaker
might this patch some other hole in the problem or help at least a bit? So I don't know if that solution will sort of stand the test of time, but I think that the general approach, the general area of corrigibility are making so that we can modify their goals seems good. Some issues with that specific proposal is that
02:04:28
Speaker
Actually, it's pre-trained on a pretty wide distribution potentially, and it might actually know kind of well what some of the people's preferences are. It may not actually be terribly uncertain about those. So you're thinking even if the model starts out being uncertain, it has so much data that it will end up pretty certain about what humans prefer.
02:04:52
Speaker
For it to work, you might need it to have the advantages of gaining more information and keeping those options open to be extremely high. And that may not be the case.
02:05:10
Speaker
maybe through pre-training or maybe just through some interactions that it starts to get a pretty good sense of it. And then it starts pursuing some other sorts of things. Or maybe it still allows for some correction, but in many cases not, because it's actually this isn't, it's not worth keeping this option open. There are other forces tugging it. So it certainly moves in that direction, but I'm not sure that's as
02:05:33
Speaker
robust as other sorts of things. So if it's uncertain about its own preferences in the future, maybe it's got a mixture of some utility functions or something like that. There's some recent work on this, or maybe some other solutions would be giving them some preferential gaps where they're
02:05:51
Speaker
actually just view lots of different options fairly similarly, similarly good to each other. That might make it so that it's, yeah, you could modify me, or I could do this other sort of thing. These are similarly good actions, and so it's open to a lot of possibilities there and doesn't have a very strict ordering over every single state. Those are some possible things that could help with cordiability.
02:06:14
Speaker
I think there the conceptual research has to be improved substantially. So people with philosophical backgrounds or people who know more decision theory and whatnot. A lot of the safety work ends up being dependent on conceptual work and getting that in order. Then that can eventually motivate a lot of downstream, more empirical work. So I think that's the
02:06:41
Speaker
the general progression in between like philosophy and science. And so I'm thinking for at least for modifying goals, I think we need more conceptual work, but then hopefully we'll be able to do some empirical stuff. So you think we first need the conceptual work, but
02:06:56
Speaker
Others might argue that the empirical work cuts through the previous conceptual work such that maybe it's no longer relevant and the only way to really learn how the world works is to interact with it. How do you think about what's
02:07:14
Speaker
most valuable at this point in time. Just as I'm interested in a variety of different interventions for reducing risk and that there isn't a single bullet, I think it would be a mistake to think, well, what is the most effective type of thing? But instead, what's the best portfolio? So if there's a stock that you think is most likely to go up,
02:07:38
Speaker
a large amount. That doesn't mean you put all your money in it. So in risk management, it will want to diversify. And I think there's a pretty high value in conceptual stuff. I would agree that a lot of the prior or classical conceptual stuff has some limitations today, that it was sort of designed, or a lot of it was thought of well before deep learning systems had things like common sense.
02:08:04
Speaker
where they're not always doing reinforcement learning, where there are a lot of new empirical phenomena that are on our radar that weren't earlier, like emergent capabilities, for instance.
02:08:17
Speaker
But I think that the conceptual stuff still plays a very substantial role in being oriented in the first places. What are the problems and what are some potential avenues to address them? But a fair amount has to be concretized experimentally. So we can learn a lot from an empirical process. We can learn a lot from a conceptual process. I do think for the conceptual process, though, that needs to be
02:08:42
Speaker
broadened substantially to have lots of different backgrounds because I don't think it's in people's training to do useful conceptual work if they are from just computer science backgrounds. I think that that's largely inadequate training for basically doing philosophy. So you need to bring in lots of different stakeholders. I think we need humanity's brain power addressing this issue and not more
02:09:09
Speaker
small communities with a very similar background. But there is plenty to be said for the empirical side.
02:09:15
Speaker
you can get in fast empirical feedback loops. So you can very quickly see whether your idea is reasonable or not. Also, you can find a lot of unexpected variables as a consequence, too. That, oh, there's a phenomena that surprised me. That was off my radar, but it was forced on my attention through tinkering. Those are things, it's very difficult to anticipate all the failure modes of the system through armchair analysis. And I don't think all of it can be fixed through armchair analysis.
02:09:42
Speaker
But it can still help us get to our bearings in some respects and help orient us further. So I think that both of them would be fairly useful. One of the scenarios you sketch out is about humans gradually losing touch with reality or losing contact with reality.
AI's Impact on Education and Society
02:10:00
Speaker
And this is something that might happen in this evolutionary dynamic. How is it that this might happen? What are you worried about specifically here?
02:10:10
Speaker
So I think there are many ways our values could be eroded if some sort of troubling trends currently get exacerbated. So one of them would potentially be, and this sounds like more typical concerns, but things like persuasive AI systems, it's not clear what its impact on society will be.
02:10:29
Speaker
Maybe people will want their own Wikipedia's that will customize to be customized to their worldview. There are certainly efforts to do that type of thing, but there's a real writing bottleneck for that. But, you know, with technology that becomes more feasible. People retreating to their ideological enclaves through news.
02:10:47
Speaker
people with their chatbot sort of reaffirming their belief systems. And this might erode consensus reality and reduce the ability for different groups to cooperate and make a good collective decision. So it reduces society's ability to determine its own future prudently or wisely. That's a possible path that AI systems could go down. So I think
02:11:13
Speaker
ignoring some of the present-day concerns and thinking that they have no relation to catastrophic or existential risks, I think is a mistake because those risk factors could potentially become more extreme as the AI systems become more capable. They would put us in a bad outcome. So that isn't to say that that's what all of our focus should be on.
02:11:33
Speaker
near term concerns but i think these are good sub problems to be addressing and mitigating if we can't handle these and i think we're in big trouble for some of the other ones i should say this is assuming that there are more resources being pumped into safety and caring about the sorts of the iris.
02:11:50
Speaker
If we're having 10 people or something, I think there are reasons to be very critical about exactly what people are working on. But I think if this is more of a global problem or it's becoming more of one, and I think in that situation, making needless enemies of, well, that's not my top five x risks or something like that. That's only my top 10, is probably not the appropriate attitude.
02:12:16
Speaker
When we're talking about humanity slowly losing touch with reality, might we also be talking about us becoming less capable? And the example I have in mind here is just sometimes I can't navigate if I don't have my phone. Or maybe people are worse at doing quick calculations in their heads now because we always have a calculator at hand.
02:12:37
Speaker
And so just generalize that phenomenon across a wide range of domains, and suddenly you have a situation in which humans are being taken care of by the AIs, but we can't really understand what's going on because we're not practicing our skills.
02:12:55
Speaker
In that situation, should we be aiming for something like a paternalistic AI where at some point it will stop? This is an unrealistic example, but say that my map app tells me that every 10 times you try to navigate, you have to do something yourself so that you retain your skills.
02:13:16
Speaker
I believe Martin Luther King describes the role of education is making so that people can't be as manipulated. We in society now require that people...
02:13:27
Speaker
go to school. I think that's a good thing, but it might be kind of difficult, even people in school now, I don't know how these new developments are affecting their studying habits or whether they're actually learning a lot of these skills. So I think a lot of them would have just some incentive to cheat. So even if they are doing that, they may not end up learning. And then we have people who don't really understand how anything works. So I don't know, it's very possible I'm like the last generation of like
02:13:57
Speaker
college graduates that like when they went to college, they actually do some of the stuff themselves. I mean, obviously there were tools that helped like web browsing and whatnot, but just directly asking the answer to things and not having to think critically about it is
02:14:13
Speaker
specifically for college education. We see how current language models can basically ace many of these college tests. There will probably be a temptation if I was a college student to say, why is it that I have to? Why do I have to go through the difficulty of this stuff if I can just ask my friendly large language model to solve it for me?
02:14:39
Speaker
Yeah, I think there's a general pattern of even some years ago, but it's probably just going to get worse where people are just, well, it's SDAI. That's what I said. Okay. It's probably it. We'll go with that decision.
02:14:50
Speaker
they'll sort of winnow what they're understanding. So I would imagine the complexity of society would end up getting much larger such that nobody actually understands, even broadly. They can't have a holistic understanding of things or even an approximation of it. Obviously, nobody knows everything anymore. If they're understanding a narrower and narrower slice of what's going on, I think the potential for manipulation is higher.
02:15:18
Speaker
an analogy with elder care where we've had some sort of mental decline. We can listen to explanations and we can check whether that sounds reasonable or not, but there are some successors that are trying to get our inheritance from us.
02:15:34
Speaker
have a sign up for things that we wouldn't actually want to sign up for. And that becomes a lot easier as the difference in intelligence between those parties increases. So might we be complaining prematurely about something that's not going to happen here? If you look back through history, you can read accounts of how the radio is going to
02:15:57
Speaker
has sucked people into a fantasy world from which they can't escape or novels, even the internet, TV, all of these phenomena. Why is AI different? Is it different because it's smarter than us and none of these other things have been smarter than us? I think that many of the things people complained about were, in some ways,
02:16:19
Speaker
in all things considered sense of whether to be for or against it is a different matter. But it was the case that like these these fictions end up like subsuming a lot of people's lives or something like that. There are like some Potter heads who are very old now. That's still the center of their identity. And there's nothing wrong with that. But it wasn't inaccurate.
02:16:38
Speaker
Um, saying that would happen to everyone is a, is a different thing. I mean, with, with the printing press, for instance, um, maybe it was bacon was, um, um, if my memory serves, um, was this is going to lead to some like real, you know, some popularization of some sort of intellectuals and sort of like, we had, we create some weird lionizing type of dynamics where they're like, this is some rock, you know, kind of like having like
02:17:03
Speaker
intellectuals be like pop stars or something. And that possibly has facilitated that process a bit. So it's possible there'd be other counter actions for it though. It's certainly a thing to try and address of very persuasive AI systems. I think that there are plenty of unique properties of AI that make it different. I don't think it's wise to generally read it as just another technology that'll just sort of
02:17:32
Speaker
advance some trends a bit or something. This is maybe the largest of period in the past billion years if we're thinking
02:17:43
Speaker
on evolutionary timescales of single cellular life, multicellular life, the transition from organic life to artificial life. That seems like a very big period. So generally invoking, well, some decades ago, or trying to extrapolate from those trends, I think, is probably not correct, or at least has limitations. So this is at least a thing to be
02:18:12
Speaker
that people should try to address to some extent and may end up affecting the probability of some larger catastrophe. So I don't see just as with we have been getting along as as like a species overall there have been like serious things that have at least
02:18:31
Speaker
undermined people's well-being fairly substantially. But this could make some of these trends really go into high gear, even then, some of these worst trends, like more polarization, and it could cement it. So it's a concern. This isn't the main thing that worries me, though, but it does seem like a very good test case in creating tools for
02:19:00
Speaker
better monitoring, creating more honest systems as well. So we can not just be training them to say what they want, but having a sort of easy way of getting what they're actually thinking so that they're not just being as persuasive.
02:19:13
Speaker
Things like that, so I think the amount of time that speaking about it possibly gives undue emphasis of how much it dominates in thought. I don't think about that as much, but I imagine that will be what Washington and political establishments will be thinking a fair amount about is how to use AI in the upcoming election.
02:19:32
Speaker
And I don't know what consequences it'll do. There we'll also have sort of arms race and well, they're using it, so we're going to have to use it. And that might really dumb down the conversation even further and polarize people even further and make this a more unstable democracy. And I think an unstable democracy certainly affects the probability of how well the future goes if this is such a critical period in the development of humanity.
02:20:00
Speaker
Yeah, so we've covered a lot in this conversation. Is there anything that we haven't covered that we should talk about?
Addressing AI Risks and the Need for Cooperation
02:20:07
Speaker
In sort of like reviewing where I am seeing the risks, I do think competitive pressures is the largest one to be addressing, and that's not a technical problem.
02:20:18
Speaker
I think after that we've got both the standard alignment issues of the controlling AI agents and then also a newly intellectually justified one or one that's forced on more people's attention now would be malicious use. I'm not saying malicious use of them this year, but later on I would not be surprised if that's what causes a global scale catastrophe at all.
02:20:44
Speaker
And then there are other things like even if we get many of these conditions rights and we mitigate those risks, there are still things like human factors that affect the safety of these organizations. Do they have a quote unquote security mindset?
02:20:59
Speaker
Are they doing appropriate horizon scanning of potential failure modes? Are they thinking about the difference between safety and capabilities and trying to push distinctly on safety and not just dressing up some capabilities advance as safety? All those things matter pretty substantially for whether this turns out well. So it's a very multifaceted problem. So from the evolutionary view, which approaches are you most excited about? And perhaps which approaches are undervalued currently?
02:21:29
Speaker
I think on the technical front, I think those are largely, as long as we have intense competition pressures, a lot of the technical work is, I think, largely buying ourself time. Maybe not that much. I think that actually what's probably going to need to happen is a clear idea of what risk looks like, convincing the public more and key decision makers more.
02:21:53
Speaker
that these risks are serious. Certainly the technical work helps address a lot of other risks and it helps make these sort of selfish behaviors that they acquire be more controlled. As long as we have extraordinarily strong pressure, extraordinarily strong competition pressures, I don't think the marginal safety improvements that we'll be having
02:22:14
Speaker
will really measure up. So we're going to need international, unprecedented international cooperation. And we're going to slow this sort of thing down and bring the development to a point where the risks are negligible in proceeding. This is just at least with respect to the evolutionary view. It's a broader societal problem there. But there are still other things like the potentially just controlling an individual AI agent may be extremely difficult.
02:22:41
Speaker
And we need technical research to be addressing those sorts of risks. So the prioritization can vary depending on how much one thinks that if you have a competent AI agent, it suddenly wants to take over. How strong of a force do you think that sort of animus dominandi, that sort of will to power actually is naturally? And how easily can that be offset through training or overwritten through backprop?
02:23:10
Speaker
is a different question. But yeah, so I guess from an evolutionary bit, the competition pressure is the main variable to be getting rid of. The main thing there is actually more of changing the infrastructure, the ecosystem in which they're evolving. But that has to be changed. Otherwise, I think it's not going to turn out well, I would guess.
02:23:30
Speaker
Is there anything you're doing on a personal level in order to prepare for the world that we are predicting here? A very fast-moving world, a world in which it might be difficult to understand what's going on. Do you have any techniques or advice for that world?
02:23:48
Speaker
There's always very little that I can suggest for this. Unfortunately, because there's so many unknown unknowns, I think generally, in the case of malicious use, if there's some bio risk, there are some basic things like having some personal protective equipment or something like that, if there's like bioterrorism or something. But otherwise, it's a lot harder.
02:24:15
Speaker
Yeah, it might not even be a problem where there are these kind of individual solutions where you can prepare. It might be simply too big to prepare as a single person. I think there'd be generic things that could potentially help. If people had a lot of resources, then I think there are things they could do. Like, you know, there might be some locations that might be somewhat safer or something if things ever escalate to some conflict.
02:24:38
Speaker
and like having that be accessible, but you know, these addresses smaller for action of the, or address some of the scenarios, not the majority of them. I don't know, I think just, it creates an incentive to work right now seems like still a time where there is some potential influence over how things go and individuals can make, can end up influencing outcomes still. It hasn't, people has not completely left
02:25:06
Speaker
everybody's court and in the hands of two people. So that's just created more incentives for me to continue working at the intensity as I was as a graduate
Outlook on AI-Driven Future
02:25:19
Speaker
student, but maybe be liquid with respect to resources or financial resources. Well, at least one thing I suppose is not being dour and I think people expressing generally dour
02:25:36
Speaker
very pessimistic views and transmitting those emotions are basically not productive or not beneficial compared to, we can be realistic about risks certainly, but forcing those on people's attention, it's constantly or training them to have some sort of
02:25:59
Speaker
emotional response or showing some emotional responses on productive is probably not something you want people emulating. So I think people should be more cautious for how they're having people who are most concerned about risks, basically engaging in self-defeating behavior of getting to thinking the issues you are and they're not doing anything about it.
02:26:21
Speaker
or just retreating to do the simplest things of reading about it or something like that. Not the actions I'd suggest for people who are really into it necessarily. Great. Dan, thanks for coming on and keep fighting the good fun. Thank you. Have a good day.