Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Brain-like AGI and why it's Dangerous (with Steven Byrnes) image

Brain-like AGI and why it's Dangerous (with Steven Byrnes)

Future of Life Institute Podcast
Avatar
3.2k Plays8 days ago

On this episode, Steven Byrnes joins me to discuss brain-like AGI safety. We discuss learning versus steering systems in the brain, the distinction between controlled AGI and social-instinct AGI, why brain-inspired approaches might be our most plausible route to AGI, and honesty in AI models. We also talk about how people can contribute to brain-like AGI safety and compare various AI safety strategies.  

You can learn more about Steven's work at: https://sjbyrnes.com/agi.html  

Timestamps:  

00:00 Preview  

00:54 Brain-like AGI Safety 

13:16 Controlled AGI versus Social-instinct AGI  

19:12 Learning from the brain  

28:36 Why is brain-like AI the most likely path to AGI?  

39:23 Honesty in AI models  

44:02 How to help with brain-like AGI safety  

53:36 AI traits with both positive and negative effects  

01:02:44 Different AI safety strategies

Recommended
Transcript

Foundation Models: Plateau or Power?

00:00:00
Speaker
We think that foundation models are going to plateau in capabilities somewhere between where they are now and before the point at which they are very powerful and dangerous.
00:00:10
Speaker
If you figure out how human brain algorithms work and put them on a chip, then you wind up with something that can do all the things that ah humans and groups of humans and societies of humans can do, including inventing science and technology from scratch. If you look at the variation in human ability to do science and technology, you find the wild swings from one human to another. And I definitely don't think that the best and brightest scientists and engineers are anywhere remotely close to what brain-like AGI will be able to do. I think the main thing we need is motivation control, ah where we sculpt the AGI's motivations such that it's not like a callous sociopath that is indifferent to human welfare.

Stephen Burns: From Physics to AI Safety

00:00:54
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Docker, and I'm here with Stephen Burns. Stephen, welcome to the podcast. Thank you. Do you want to start by introducing yourself?
00:01:05
Speaker
I am a physicist by training. I live in Boston, Massachusetts, and I first got interested in artificial general intelligence, safety and alignment around 2018 or 2019 as sort of a hobby in my free time.
00:01:19
Speaker
quickly got interested in the question of what if future artificial general intelligence AGI algorithms work in a similar way as the human brain works algorithmically. So I sort of wound up teaching myself neuroscience at that point.
00:01:33
Speaker
And then I got a grant to do that full time in, think, 2021. And then I was and then i was ah hired by the Astera Institute, a foundation in California, starting in, I think, 2022. And that's where I still am, ah doing mostly independent reading and research on that topic.
00:01:51
Speaker
What's the scope of your work? What are you trying to achieve ah when thinking about the safety of brain-like AGI?

Brain-like AGI: Risks and Impacts

00:02:00
Speaker
If you figure out how human brain algorithms work and put them on a chip, then you wind up with something that can do all the things that ah humans and groups of humans and societies of humans can do, including inventing science and technology from scratch, founding and running organizations, making plans. And if the plan doesn't work, then you find a way around it.
00:02:20
Speaker
If it lacks some resource that it needs to find solve a problem, then it can then it can gather that resource. So would be kind of more like a new intelligent species on the planet, but one that would be more numerous than humans, more experienced, that can think faster, more competent in every way.
00:02:39
Speaker
So this this is sort of the setting where it sounds like wild sci-fi, ah which is fine because the future should sound like wild sci-fi. You know, heavier than air flight sounded like wild sci-fi until it happened.
00:02:50
Speaker
And yeah, so this is something that I'm expecting to happen. and And this is the context where I think things like human extinction are are really on the table. So I'm really interested in trying to make to sort of contingency plan for that and say, if people figure out how to put brain-like algorithms on computer chips that can do all these cool things that humans can do via the same kinds of algorithmic tricks, then how would we get to a good future sort of in the broad sense?
00:03:17
Speaker
And I'm more specifically interested in, ah mostly focused on technical parts of that, which would be like what code could one write such that the AGI winds up intrinsically either staying under human control and wanting to stay under human control, or else at least having the best interests of humans at heart, sort of as an end in itself.
00:03:38
Speaker
So what you're trying to solve here is mostly a technical problem and not a problem of the social or governance aspects of future AI.

Creating Safe AGI: Challenges and Strategies

00:03:50
Speaker
Yeah, I want to emphasize that those are also problems. So if we could have a plan to make AI that has humans' best interests at heart, but then the people programming AGI don't actually use that plan. Instead, they have some better idea that doesn't actually work.
00:04:07
Speaker
And then, yeah, having the technical plan was not sufficient. um And likewise, it could be that lots of people have the desire and common knowledge that we should follow this plan.
00:04:19
Speaker
But over time, there's a competitive race to the bottom where the people who make you know more callous sociopathic profit maximizing AGIs gradually make more money more fast and they're able to outcompete the AGIs that are not like that.
00:04:35
Speaker
a lot of things can go wrong. Also, if we don't solve the technical problem, then you know One could reasonably say we we shouldn't be making these AGIs in the first place. But again, i think having a plan, a technical plan is is an ingredient in this puzzle.
00:04:50
Speaker
And that's where my comparative advantage is. that's what i focus on. Do you think the current paradigm of scaling up large language models, doing these large training runs, is that is that inspired by the brain or to what extent is that inspired by the brain? Do you think do you think current progress or recent progress in AI has anything to do with us being or computer scientists being inspired by the brain?
00:05:17
Speaker
Well, historically, for sure, neural networks were, if I understand the history correctly, inspired by brain learning algorithms. And I think people disagree about whether the whether they misunderstood the thing that they were trying to be inspired by. But regardless of the details, they they certainly built very impressive artifacts from that.
00:05:37
Speaker
So i I am part of a school of thought that I like to call a foundation model plateauists in that we think that foundation models are going to plateau in capabilities somewhere between where they are now and before the point at which they are very powerful in danger and dangerous.
00:05:53
Speaker
And you know if you ask ChatGPD to found and run a new company and you know raise it to a billion-dollar successful company, it's going crash and burn. It's just not capable of that.
00:06:04
Speaker
um And my guess is that that's a problem that's not that but that's a problem with foundation models as opposed to foundation models today. Say more about that. What's the fundamental limit to foundation models?
00:06:17
Speaker
Okay, so this is a little bit of an annoying question. It's fine that you asked it. The reason it's a little bit of an annoying question is I think we're not ready for AGI yet. I totally expect that we're going to get AGI sooner or later, for better or worse.
00:06:31
Speaker
um But on the current margins, I would much prefer later to sooner. So for AGI capabilities researchers, if they're doing the wrong thing in trying to make AGI, then great.
00:06:42
Speaker
I don't want to correct them. And I certainly don't want to give them great ideas on the off chance that my ideas aren't that great and that they're listening to me, which maybe that's a little unlikely, but just in case.
00:06:53
Speaker
So what you're saying is there could be some kind of info hazard here. Yeah. Info hazard is the term of art. Although I like to clarify that either I'm wrong, in which case there's no reason to to be saying my stupid ideas on the air or else I'm right, in which case they're info hazards.
00:07:10
Speaker
And either way, there's not much to be gained. So that's the capabilities researchers. But of course, there might also be safety researchers listening to this podcast, safety and alignment researchers. And for them, maybe...
00:07:22
Speaker
you know If they talk to me in person, or you know I often chitchat about the limitations of foundation models to the extent that that it's not giving away too many sensitive details.
00:07:32
Speaker
But I still don't feel great about... I'm not like really sold on the mission of trying to... convince people to stop working on foundation model safety or foundation model AGI safety, because I think that foundation model AGI, there's there's no such thing, so they shouldn't be working on it.
00:07:49
Speaker
But I don't really feel great about that because what if I'm wrong? What foundation models do scale to AGI? I would much rather get more people to be working on brain-like AGI safety by getting more people who are not already working on AGI safety in the first place rather than parasite ties from the already extraordinarily tragically small group of people who are interested in AGI safety in the first place.
00:08:13
Speaker
I don't want to be infighting like that. I think we should be doing contingency planning for both. Yeah, yeah, that makes sense to me. Let's talk about the brain.
00:08:24
Speaker
Let's talk about differences between the brain, the human brain and current AIs. You talk about the the brain as having a learning subsystem and a steering subsystem.
00:08:35
Speaker
Maybe you can describe that. Yeah, so I think that ah part of the brain is ah more than 90% of it in in the case of humans by volume is running basically within lifetime learning algorithms that are ah randomly, sort of like randomly initialized, at least locally, ah in a way that's akin to machine learning algorithms, such that when you're born, you're cortex, for example, is not doing anything useful.
00:09:01
Speaker
It's learning and over time it will become more and more useful. and And in adults, it does all these impressive things like motor control and planning and language and vision and and on and on. and But for newborns, I think there's actually a lot of evidence that ah the cortex is just not yet doing really any much as much at all.
00:09:19
Speaker
And likewise with the striatum and cerebellum and and a few other areas. and the hippocampus, which is part of the cortex. ah Whereas the hypothalamus and brainstem are these repositories of reflexes and innate I like to call it business logic, which is this term from software engineering where, you know, you know, Megacorp has the, has their proprietary code that says, you know, if the client has, you know, more than a hundred thousand euros of assets, then, you know, attach form 6432. And if the client lives in Belarus, then.
00:09:55
Speaker
also attach form 9432 and on and on and on, all these really specific things that help the business run. So by the same token, the body needs business logic.
00:10:06
Speaker
If I'm fertile, then I need to increase my sex drive. If I'm cold, then I need goosebumps. if i'm if If it's time to vomit, then the way to vomit is to contract muscles A, B, and C and release hormones D, E, and F, on and on. So I think the hypothalamus and brainstem is where you find all of those things. And I call those the steering subsystem.
00:10:26
Speaker
There's nothing special about the brain, though, because if you look at any machine learning algorithm GitHub repository, you'll find lots of code that is not the core learning algorithm itself, but rather is doing, you know, calculating the reward function in the case of RL, or doing data augmentation, calculating the loss function, all those other things.
00:10:45
Speaker
ah So the brain needs that, too. So what's the relevance of learning about the brain like this or studying the brain for creating safe AGI?

Understanding Brain Mechanisms for AGI Safety

00:10:55
Speaker
is it Is this about you kind of front running ah discoveries made by others such that we can we know how to make, if if if it is the if it is the case that they make systems that are inspired by the brain, we have done the work of trying to figure out how to make them safe beforehand.
00:11:16
Speaker
Yeah, that's exactly right. And I do think that there's within the domain of understanding the brain, there's parts of that that are very important for alignment and safety and not particularly important for making the AGI in the first place. And those are the areas that I focus on, ah which includes things like understanding Yeah, the hypothalamus and brainstem and and how they lead to human prosociality and norm following and compassion and things like that.
00:11:45
Speaker
Actually, tell us about that. So what is it that we can learn about how prosociality relates to the human brain that's then useful for trying to make AGI safe?
00:11:58
Speaker
brain like AGI would be a little bit more in the camp of the kinds of reinforcement learning agents that people talked about a lot in the 2010s and have sort of fallen out of fashion in the past five years since foundation models came along. But so, for example, these days, everybody talks about training data ah in the 2010s in the Alpha the Zero era.
00:12:17
Speaker
People would talk about training environments and That's not a huge difference in and of itself, but it's sort of indicative of this different way of thinking about things. So in the area of reinforcement learning agents, there's the idea that you design the agent with a reward function.
00:12:36
Speaker
And I think the brain likewise has this actor critic reinforcement learning model-based system, it's not exactly like any RL agent that's been invented to date, but it is something sort of in that general domain.
00:12:50
Speaker
And I do think that the brain has a reward function in the same way that AlphaZero has reward function that says, you know, plus one if you win the game. but Well, the brain's reward function says, you know, positive reward if you're eating when you're hungry, negative reward if somebody punches you in the face or if you stub your toe, on and on, lots of things like that.
00:13:11
Speaker
ah These are sometimes called primary rewards. I sometimes call them innate drives. Maybe you could explain the difference between trying to control AI or AGI and then trying to instill these positive social instincts.
00:13:26
Speaker
What are the pros and cons of ah each approach? Joe Carl Smith has this term motivation control versus option control as sort of two ways to limit the chaos that an AGI might cause.
00:13:40
Speaker
So option control means that even if the AGI is out to get you, it doesn't really have any options to do so. So this includes things like putting the AGI in a box and not giving it internet access and not giving it access to actuators and stuff.
00:13:55
Speaker
It also includes things like making the AGI not terribly capable, not good at planning. Whereas motivation control would be giving the AGI, somehow sculpting it such that, building it such that it wants to do things that we want it to do.
00:14:11
Speaker
and it's trying to do things that we want it to be trying to do. And then even if it has the option to cause harm, to start self-reproducing around the internet and get out of control, it doesn't it chooses not to.
00:14:23
Speaker
Or if it does, it it does it with good ends in mind. So I think that option control is not a plan in and of itself, um although it can be you know an additional layer of protection I think the main thing we need is motivation control, ah where we sculpt the AGI's motivation such that it's not like a callous sociopath that is indifferent to human welfare.
00:14:46
Speaker
And you know you ask the AGI if it has the humans' best interests at heart, and it says, oh yes, I definitely do, but it's lying. That would be bad, whereas a good thing would be the AGI that actually does have humans' best interests at heart.
00:14:58
Speaker
So I'm definitely in the camp of motivation control. That's what I mostly work on. And I think that a useful jumping off point would be to understand the motivation of humans, especially in those special cases where the motivation of humans is actually good as opposed to bad.
00:15:14
Speaker
So when humans are compassionate, when humans are ah norm following, those are probably things that we want, or at least possibly things that seem... It would be useful to know how those work.
00:15:25
Speaker
but We don't necessarily want to copy them exactly, but it would be a good jumping off point. And so the problem with trying to do option control or trying to control ai and perhaps especially AGI is that you're trying to constrain an agent that's on some quite fundamental level, perhaps working against your interests.
00:15:45
Speaker
And so it's more like trying to plug certain holes, but then problems arise at another point in the system or some behavior ah the of the agent you're trying to control thats that you don't like.
00:15:57
Speaker
And so that's ah that's a pretty difficult problem. And it's it's an interesting approach that we might have to consider if we don't solve the more fundamental problem of trying to align AGI before we get to ah to AGI. But I would feel much better. I would feel much safer if we could instill the right motivations and especially pro-social motivations into AGIs.
00:16:22
Speaker
Yeah, that's exactly right. yeah that We should absolutely be doing gap sandbox tests of AGI, but there's only so many things that you can learn from some such tests. in In humans, how ancient do you think the the systems that control pro-social behavior

Human Instincts and Emotions: Lessons for AGI Design

00:16:40
Speaker
are?
00:16:40
Speaker
is Because so when we're talking about norm following or compassion or kind of positive emotions towards others, they seem quite complex and perhaps kind of recent in evolutionary history.
00:16:55
Speaker
What do we know about that? Right. So I like to draw a distinction between emotion concepts versus innate drives. So emotion concepts in humans are definitely unique. They're also different in different cultures to some extent.
00:17:07
Speaker
And an emotion concept is a learned... so So your learning algorithm is observing the world, but it's also observing yourself. And you can notice, implicitly notice a pattern where ah you feel certain things in certain situations, and it can capture that pattern in the form of a learned concept.
00:17:24
Speaker
And that learned concept can be all the words that we use to talk about our emotions in a human context. And then a separate thing is the innate drives that are driving those, or that are sort of one ingredient out of the emotion concepts.
00:17:39
Speaker
So these are going to be more like sort of reactions or reflexes, and they're not going to be easily described in English language words. Instead, it's going to be something like when such and such, you know, signal in the hypothalamus is active and this other signal in the hypothalamus is active, then that triggers this yet a third signal in the hypothalamus to be become active.
00:18:01
Speaker
Human social instincts are definitely somewhat different from chimp social instincts. I'm not sure how much that's like tweaking the parameters on existing modules versus building wholly new modules.
00:18:15
Speaker
um That's, yeah, that's a complicated question. I think we need to know more about what these innate drives are before I'll have a ah strong opinion about that. One time I tried to look into whether mice and rats feel spite.
00:18:26
Speaker
in in the specific sense of not just like attacking a rat who's attacking them, but in the sense of using flexible, motivated behavior, ah clearly related to reinforcement learning to inflict harm and misery on another rat.
00:18:43
Speaker
but There is evidence of this for compassion, but in my brief search, I couldn't find evidence of that for spite. So maybe mice don't feel that in the same way that humans do, but maybe they do and and I just missed it or nobody's measured it.
00:18:55
Speaker
Yeah, it seems like a very difficult question to answer. You can't interview mice or rats, right? You you can't ask them kind of complex questions. So you'd have to somehow find out looking at their behavior. Maybe there's some smart way to test this, but it it seems like a difficult question to answer for me.
00:19:12
Speaker
So the the purpose of investigating how prosociality works in the human brain, you know, tell me if I'm wrong here, but is it is it simply that we can understand how it works in the human brain?
00:19:23
Speaker
And then maybe we can learn something about how to create prosociality in AI models and how perhaps how to train them in a certain way such that they develop these these drives that we that we want them to have.
00:19:38
Speaker
Yeah, but basically. So to to continue the story from before, I think the human brain has something like a reinforcement learning reward function that says, yeah, pain is bad. Eating when you're hungry is good. Something, something about social drives.
00:19:50
Speaker
And by the same token, when these future AGI programmers, hopefully not too soon, ah invent brain like AGI, they're going to likewise have a slot in their source code that says reward function.
00:20:02
Speaker
and they get to put whatever Python code they want into that slot. One of our main technical intervention points in the motivation of the AGI is this question of what code do we want these programmers to put in in that slot?
00:20:16
Speaker
So there's a lot of choices that are obviously bad. you know Maximize the bank account is obviously bad, and but there's not really anything that's obviously good. There's a lot of things that seem like they might be good if you don't think too hard about them.
00:20:28
Speaker
But then when you look into it in detail, you find that the the ultimate consequences are going to be bad. It's an open problem to have any idea of of what this reward function should be. And that doesn't have to be the entirety of of the plan. We also get to choose a training environment. We can also you know do real-time monitoring and intervention, whatever we can think of.
00:20:48
Speaker
But I do think the reward function is a giant portion of of the solution and well worth investigating. How complex can the reward function be? How much can we, so to speak, put in there? um I mean, it's just going to be some Python source code or whatever programming language we use. And it can be anything that runs fast enough.
00:21:10
Speaker
And that's feasible for programmers to write and hopefully to debug because you definitely don't want to, you know, get the sign wrong or whatever. They can maximally DAD AGI.
00:21:22
Speaker
It's certainly possible that we would want to do things like small pre-trained image classifiers. I think there's something like that in and the human brain reward function. When we get vision data into our eye, it goes up to the learning within lifetime learning algorithms in the visual cortex. But a copy of that same data also goes down to the superior colliculus in the brainstem.
00:21:43
Speaker
So whereas the visual cortex just learns patterns in the data and patterns and patterns in the data, The superior colliculus in the brainstem, I think, is learning things like, it's not learning at all, but rather it's applying a lot of innate heuristics to that visual information.
00:21:58
Speaker
It's saying, is there a human face? And if so, it executes an orienting reaction that makes you look towards that human face. Is there something that seems to be moving like a slithering snake or like a scuttling spider? And if so, you know, it implements physiological arousal.
00:22:14
Speaker
And so there's a few more things like that. And all the other senses likewise have have homes in the brainstem and hypothalamus, and not just in the cortex. So there's the gustatory nucleus of the medulla, which processes taste.
00:22:25
Speaker
There's the inferior colliculus, which processes sound. I certainly expect that there might be reason for programmers to put some you know You can already download from GitHub and share some face recognizer module that works just as well as the superior colliculus.
00:22:41
Speaker
It's just some tiny convolutional neural net. That would work fine if that's what you were trying to do. So the that might be like one of the more complicated things that people might choose to put in the reward function. And that's one of the tools in our tool belt as we try to design and solve this problem.
00:22:57
Speaker
But if we're talking about AGI, kind of agentic, general, wouldn't we want to put something in the reward function like optimizing for compassion towards others or honesty or willingness to collaborate and all of these things? Such concepts seem incredibly complex and it's difficult to even know where to start with if we're trying to build a system that's optimizing for that.
00:23:25
Speaker
Yeah, that's exactly right. In terms of the the human brains, you know, it doesn't understand all these concepts. It doesn't know what, you know, peace and love is. It doesn't know who people are. It knows things like, you know, this is vomiting and this is laughter and this is physiological arousal.
00:23:43
Speaker
So there's the the neuroscience puzzle of ah how do we get from that kind of stuff to... the empirical observed fact that humans do in fact feel motivated by compassion and justice sometimes, some humans.
00:23:56
Speaker
So that's the neuroscience question. And then in parallel, there's the AGI safety question, which is what are we going to do? So of course we could do things that evolution could not do in our hypothalamus and brainstem. We could try to use interpretability to find the concept of human flourishing and just directly connect that to reward. And that's like, that's not crazy plan. I happen to think it's probably not going to work, but it's not a crazy idea.
00:24:22
Speaker
do do you think Do you think that's actually coherent to look for? Do you think, say, in the most advanced models we have today, do you think there is a concept of of human flourishing? And would you be able to extract that concept and make it into the thing that the AI is optimizing for?
00:24:38
Speaker
When you say the most advanced models today, you're probably talking about foundation models and those definitely speak English and therefore they at least know the words human flourishing. I'm not sure. Again, I'm not a foundation model, ah AGI safety expert.
00:24:52
Speaker
It's not my field of expertise. So I don't really know if that concept is sufficiently I don't know. So like that, that could work. Yeah. I don't really know.
00:25:03
Speaker
In terms of an AGI, brain like AGI, there's, yeah. So one concern would be that it doesn't really have a concept of human flourishing as we would like it to have. It has various things adjacent to it. And it has some concept of like humans talking about human flourishing and like what they mean by that.
00:25:18
Speaker
But that's sort of ah not native to its ontology, but rather understood in some different sense. So yeah, does it doesn't even have that concept in a way that would work in a sort of motivational sense. And if so, can we find it? That's an interpretability.
00:25:32
Speaker
challenge, I think the the cortex is learning all these concepts from scratch as patterns and patterns and patterns and sensory input. And we sort of have our work cut out in the case of a brain like AGI and figuring out exactly what's what with all the nuances.
00:25:49
Speaker
And then yet a third problem is that even if we find that concept, then the AGI is going to continue to, the brain like AGI will be continuing to learn and grow and figure new things out.
00:26:01
Speaker
as it goes. so whatever though So even if the concept sort of, we started out thinking that it was the right concept, ah the AGI comes up with new ideas and the world changes out of distribution and maybe it's not going to generalize in a way that we like.
00:26:16
Speaker
Do think brain-like AI would be beneficial to interpretability?

Interpreting Brain-like AGI vs. Foundation Models

00:26:21
Speaker
Do you think it would be easier to understand what's happening inside of a brain-like AI model than it is to understand what's happening inside ah of a modern foundation model?
00:26:32
Speaker
Yeah, so my basic answer is it may be slightly on the margin, but not enough for it to really change much of the story. So for example, in a brain-like AGI, there would be you know an actor, critic, world model, award function.
00:26:46
Speaker
So you would be able to look at the operation of this AGI and you would easily be able to tell that the AGI just had a thought and that thought seemed like a bad idea, demotivating.
00:26:56
Speaker
And you know that it's a bad idea because the critic says, no, this is a bad idea. And then the AGI starts thinking about something else instead. ah But you don't know what that idea was or why it was a bad idea, according to the critic, because the critic is this big learned model and the world model and the actor are this big trained model.
00:27:12
Speaker
So yeah, so it's a step in the right direction, but it's a very small step. And by the same token, if you look in the cortex, you can find some neurons in the visual cortex that are learning patterns in visual data, and you can find neurons in motor cortex that are learning patterns in motor outputs.
00:27:30
Speaker
And you can also find these other neurons that aren't ah in any of those categories. and But you don't know what the patterns are, at least not easily. So if you say this is a high level pattern in visual input,
00:27:43
Speaker
then that's like not a huge help unless you actually know what the pattern means. And even if it's the kind of thing that you can figure out by spending 10 minutes scrutinizing all the activations and trying to shake out, like look at different examples of what it produces, there's still the issue that are people actually going to be doing that in practice while the AGI is running at superhuman speed? Or you know they could slow down the AGI, but then maybe the next company down the block is going to run there as much faster without a human in the loop.
00:28:12
Speaker
What are the unique safety concerns that you have with brain-like AGI? So from my perspective, I'm expecting that the way that AGI will be built is by brain-like AGI.
00:28:24
Speaker
So I don't i have a clear concept of how brain-like AGI compares to some different kind of AGI because Some different kind of AGI sounds to me like square circles, and I just don't know really how to think about this clearly.
00:28:36
Speaker
Actually, before we get there, is there way for you to express why it is that brain-like AGI is the is the way to is the most likely way we would get AGI without without putting out dangerous information?
00:28:53
Speaker
Yeah, so one thing is, just as a sort of very, very general note, sometimes in the world, there's just kind of one best algorithm for doing something. Like Fast Fourier Transform is a good example. If you want to do the Fourier Transform of a big data set, you should use Fast Fourier Transform.
00:29:10
Speaker
It was independently invented multiple times. If we ever met extraterrestrials, I would expect that they would be using Fast Fourier Transform as well. So that's at least a sort of vague intuition that to at least raise a possibility that maybe by the same token, the brain algorithm is more or less the one right algorithm for figuring things out and getting things done in the real world.
00:29:33
Speaker
If you want something more specific, I think if you go around and ask us LLM's foundation model plateauists why we are foundation model plateauists, the sort of two camps that people fall into is ah empirical and theoretical.
00:29:48
Speaker
So the empirical people will look at foundation models today and they'll be not very impressed by them. And then maybe they acknowledge that future foundation models are going to be better than today's foundation models, but they project that will run out of ah data, run out of compute before it really changes the story. So sometimes there's kind of stupid reasons for that. Like somebody who hasn't used foundation models since 2022 and remembers them being bad.
00:30:14
Speaker
and hasn't updated, hasn't checked again since then. Sometimes there's sort of more ah sophisticated reasons, like picking out very specific foundation model limitations, like they do bad when there's a lot of complex stuff in in a big context window. And sometimes it's, or, you know, just looking at the extremely bizarre ways that they sometimes mess up and frankly baffling ways that they sometimes mess up when they're also so good at so many other things and just sort of having some intuition related to that. And then there's the theoretical people who have some grand idea about what it takes to make artificial general intelligence.
00:30:51
Speaker
And when they look at the nuts and bolts of foundation models, it's missing the right pieces or something like that. And so I'm mostly in the theoretical camp myself, although I do think there's something to that, to what I said about big complex context windows.
00:31:05
Speaker
ah But the nice thing about the theoretical camp is that I can make a stronger claim that I believe that not only foundation models today, but also future foundation models and even foundation models of scaffolding are still not going to, you know, found and run a big company from scratch, that this isn't a fixable problem without a big AI paradigm shift.
00:31:24
Speaker
ah So that's what I believe. And I guess we'll we'll find out one way or the other sooner or later. is Is the human brain highly optimized by evolution? It's certainly not. Yeah, i'm I'm mostly on the no side. I mean, I guess it depends on what we're comparing to. I think the human brain is somewhat locally optimized for the thing for you know being a successful human. And that's very different from being optimized for science and technology and you know economic productivity, the kinds of things that but AGIs are going to be designed for. Yeah, I mean, if you look at the
00:31:56
Speaker
variation in human ability to do science and technology you find the wild swings from from one human to another and i definitely don't think that you know the best and brightest scientists and engineers are anywhere remotely close to what brain like agi will be able to do which is not going to be limited by head size it's not going to be having issues with motivation and needing to take Adderall and ah you know the the slow conduction velocity ah of

Scaling AI: Human vs. Machine Learning

00:32:27
Speaker
neurons.
00:32:27
Speaker
Yeah, especially head size. you know Every time we scale up a model in machine learning, we're surprised and impressed by the new things that it can do. If you compare ah convolutional neural net with 1 million parameters versus 10 million parameters, you know wow, it can do all these cool new things.
00:32:42
Speaker
ah So by the same token, the human brain is bigger than the chimp brain and an AGI brain with 10 times more neurons. I think we're going to be very impressed by the insightful things. You know, if you only have so many neurons, then there's just, you can't like hold a complex idea in your head all at the same time and notice the connections between different parts of it. Instead, you have to, you know, cache it and jump around and sort of chunk it. And then you sort of miss these details.
00:33:09
Speaker
I think that that's just a limitation of our head size. Yeah, yeah. I mean, brain-like AGI is an interesting term because when we think about what such a system would actually look like, it would surpass the brain in all kinds of ways that that you just mentioned. Speed, memory, just sheer size, and so on. And so, at least in in some sense, it wouldn't be very very much like the human brain.
00:33:33
Speaker
What is the sense in which AGI would be what be like the human brain? I think there's this core algorithm that lets humans do the things they can do, like that AI can't do today. Like, again, you know, found a company from scratch and and build it to a billion dollar company, ah just as an example.
00:33:51
Speaker
And there's, you know, algorithmic tricks in the human brain that that enable that to happen. So it's something, you know, i don't know, it can find patterns and it can search through the patterns and it can compose the patterns, you know, what whatever We can learn the patterns, but whatever we want to, however we want to think about it or describe it.
00:34:09
Speaker
And obviously the details are mercifully not fully understood yet. But whatever those tricks are, if we understood those tricks, then we could put those same tricks on a chip. And once you do that, then we already know that scaling up a learning algorithm straightforward. Even in the human brain, you know some people are missing an entire brain hemisphere and they turn out fine. You you can just have the the number of cortical columns or whatever be bigger or smaller.
00:34:34
Speaker
If you know what you're doing, if you understand how the algorithm works, then scaling it up, scaling it down is straightforward. I definitely agree that in terms of the output, there are a lot of ways in which it would be radically non-human. I think it would be radically non-human in competence, specifically superhuman, and also be able to instantly copy itself and all these other things.
00:34:54
Speaker
It would also be radically non-human in motivation, um unless, of course, we figure out how human-like motivations work and we coordinate to actually put those into AGI successfully.
00:35:06
Speaker
So figuring out human motivation is is the thing we need to do before we begin to, or before we before somebody tries to build brain-like AGI. I think it's useful for us to return to the question of motivations or human motivations for a second then.
00:35:24
Speaker
What are the most important motivations to to understand and perhaps copy in your opinion? Yeah, so I would just ah say it in a little more nuanced way, which is at the very least, we want to have a technical plan such that we can make an AGI that isn't trying to kill us before we build the AGI.
00:35:43
Speaker
And I happen to think that better understanding ah human innate drives seems like a useful thing to do towards solving that currently unsolved problem. But I don't have a strong opinion that it's definitely necessary and certainly it's not sufficient.
00:35:58
Speaker
Like even if we understand human innate drives, we still have the issue that humans grow up to be humans, not only by having human innate drives, but also growing up in a human body and a human family at human speed.
00:36:10
Speaker
And AGI is not going to have those things. So, you know, we need to understand how these things work and take that as inspiration and jump, jump off of those ideas. But we're still facing a blank slate engineering challenge ah where we get to come up with whatever solution we think is best.
00:36:26
Speaker
Would it be something like trying to manipulate the training environment then to try to replicate what happens to humans from birth to say age 20?
00:36:38
Speaker
ah So I don't know that that's necessary and I hope it's not necessary because it doesn't sound very realistic. I have a strong expectation that when people, as people approach the ability to build real deal brain-like AGI, ah there's going to be ah frantic dash of lots of people trying to scale these up in whatever way ah is most easily at hand.
00:37:00
Speaker
um And if you have a training plan that's going to work great, but it requires 10 serial years of interacting with flesh and blood humans, then you know i think human extinction is going to have already come and gone long before you finish your training plan. I think that we should be hoping that that's not necessary, but also I don't i don't have a good plan right now and i'm I'm open to whatever's gonna work.
00:37:24
Speaker
I mean, in a training environment, couldn't we run at much faster speeds that than, say, human adolescence or human childhood and and and adolescence? couldn't we Couldn't we kind of speed run interactions with other AIs that then, combined with with the motivations that we have hopefully understood and copied into these systems, produces something that's that's pro-social in ah in a human-like way?
00:37:50
Speaker
I definitely assume that when we know how to make brain-like AGI at all, we will know how to make brain-like AGI that runs at least 10 times faster than the human brain in terms of, you know, whatever human brain thought process is happening, you know, the AGI could think the same thing 10 times faster, not just more insightful, but also faster. um So that definitely opens up an opportunity to have, you know, agi Yeah, so, you know, it takes a human 10, 15, 20 years to be economically productive.
00:38:17
Speaker
i I don't expect that it's going to be remotely that long for an AGI. If the AGI is just messing around with blocks in a virtual reality environment, certainly that can be run at high speed. If an AGI is chit-chatting with other AGI's, that can also be run at fast speed. I'm not sure if that's going to be sufficient for...
00:38:35
Speaker
you know, the kind of safety and alignment that we want because, well, it might or might not be. It depends on exactly how these innate drives work. ah We definitely don't want an AGI that feels warmth and compassion towards other AGI's, but not towards humans.
00:38:47
Speaker
Yeah, i was about to say, that this could also be potentially counterproductive or dangerous. If you're you're an AI, say, in an environment, learning to be pro-social with other AIs, does that exactly when you then deploy the model, map onto the world?
00:39:05
Speaker
but that's so That's one of the tricky questions that i but we don't really have an answer to, I think. Yeah, you don't want an AGI who's raised in a virtual ah reality environment and then sees the the real world as fake the way that we we are the opposite way around.
00:39:20
Speaker
That would be would be ah another way to fail. When doing these interviews, i've I've often asked my guests about honesty in AIs, because that seems to me like if we could create an honest AI, I would feel much better about the way things are going.
00:39:36
Speaker
If we could ask the model about about its internal state, and it would actually give us an answer to the best of its ability, it wouldn't have to be necessarily so that it it has perfect knowledge about its own internal state. But if it's if what it's doing when you ask, what are you planning now? What are you thinking about now?
00:39:56
Speaker
If what it's doing there is and is an honest effort to reflect what's going on inside the model, I think that would be a ah ah pretty large step and in the right direction. Do you agree with that as as like a pro-social emotion that will be very valuable to replicate in AI's?
00:40:11
Speaker
Yeah, I mean, that would be great. that's That's a desideratum, not a plan, but it's definitely a desideratum that I share. The reason it's tricky, or the way I think about it, is communication is is a hard problem. And the way that humans communicate is they bring their whole intelligence together.
00:40:30
Speaker
motivations to bear on the problem. We communicate because we want to communicate. We're thinking about how to make the other person understand us correctly. Yeah, that's one motivation. we We might also be trying to do other things. Think about say sales communication. there are many types of communications, of course.
00:40:47
Speaker
Yeah, so we're motivated to make the other person ah believe something that we want them to believe. And ideally, that's the truth. But if we're working through the motivation system to make communication happen, then the motivation system is also going to get all these instrumental drives like deception.
00:41:04
Speaker
So yeah, the making an AGI that wants to be honest because we sculpted its motivations correctly is not really different from the rest of this problem of making an AGI that has motivations that that we like.
00:41:17
Speaker
How's how is that project going so far? How many people are working on it? Is it mostly you trying to to to so kind of kickstart a new research direction? what yeah how how how much progress has been made and in in this area?
00:41:31
Speaker
So there's a specific neuroscience research program involving reverse engineering, how human ah social instincts work, especially involving the hypothalamus and brainstem sort of in this framework of learning algorithms and reward functions.
00:41:45
Speaker
I obviously read lots of neuroscience literature as I work on this problem, and I'm very happy about some of the work that's been done in academia. that I can cite over the years.
00:41:57
Speaker
I don't think anyone is systematically ah working on this problem from a neuroscience perspective apart from me right now. Or if they are, then um I guess I'm not getting much useful out of, maybe they have a sufficiently different vision from me that that I'm not seeing the connection when I read their work.
00:42:15
Speaker
i would i would I would think that it's difficult in traditional academia to frame the problem you're trying to solve the way that you're framing the problem you're trying to solve. right it's i would I would guess that the work that's being done in in academia is is more narrow in scope and is is perhaps trying to, you know, when when applying for funding, people are probably not talking about artificial general intelligence would be my guess.
00:42:39
Speaker
Yeah, that's for sure. There's experimental measurements, for example, in the hypothalamus and brainstem that would be helpful for this research program. And and people are are doing those measurements bit by bit. And you know with a lot of luck, fingers crossed, um maybe some of the connectomics groups like E11 or others could start mass producing these kinds of measurements at some point in the future.
00:43:00
Speaker
But in the meantime, there's various reasons that people are taking those measurements. i And part of it is that the hypothalamus and brainstem have a lot to do with obesity, with mood disorders, with mental health.
00:43:12
Speaker
And so you can get funding that way. For example, there was a recent study where Catherine Dulac and students and collaborators found this group of cells in the hypothalamus that and in rodents that sort of track loneliness and then feelings off touch of conspecific after a long period of isolation.
00:43:31
Speaker
And that was cool. I'm not sure how they what what they put on their grant when they wanted to do that work. I don't think it's like directly solving any important. It doesn't like help me super duper directly, but it's definitely sort of inspirational and useful context as I work on this problem.
00:43:48
Speaker
So what what did they put on their NIH grant for that? I don't know. It might've been something about, you know, so the loneliness crisis in teens and mental health. There's a lot of things that I can imagine sort of their existing interest in.
00:44:03
Speaker
Is there something that that you would find very helpful for your work? what but What would yeah make you very excited to to to see published by academia? So the main thing that I'm trying to do specifically is and in that project is to find, is to sort of write down pseudocode that makes sense.
00:44:25
Speaker
And that's compatible with what we know about how the brain works, especially all my specific ideas about you know, the cortex is a big learning algorithm, blah, blah, blah, actor critic reinforcement learning, write down pseudocode that's compatible with how I think the brain works, that's compatible with human behavior, that's compatible with evolution and, you know, appropriately related to mouse behavior and and things like that.
00:44:49
Speaker
And then we can sort of pat ourselves on the back and say, cool, we have at least one hypothesis that it seems like it hangs together and makes sense. And then there would be an additional thing of how do we test that hypothesis?
00:45:01
Speaker
And that would come down to trying to find, I think it would be specific cell groups in the hypothalamus probably, ah or less likely grain stem that would be actually implementing the pseudocode.
00:45:13
Speaker
So I've basically been working on this problem for years, like three years or something, as ah one of the main things I've been working on. And I think I've actually made a bunch of progress recently. I have pseudocode that's probably not right in detail and certainly not covering all of human social instincts.
00:45:29
Speaker
ah But ah at least I feel like my foot is in the door now in a way that it wasn't six months ago. So you can read a neuroscience of human social instincts, colon, sketch to get that sort of progress report.
00:45:41
Speaker
We'll put it in the show notes. I would encourage everybody else to sort of be doing that same kind of ah effort to come up with plausible big pictures of this kind of pseudocode and flesh out these stories. If you read that blog post, you'll find lots of open questions.
00:45:56
Speaker
And in all honesty, I don't really expect anybody to be working on those open questions it's except for me. um But if other people start diving in, that would be delightful. How do you verify that the pseudocode is actually implemented by the specific brain regions that that you talk about? There's just a general problem of it being very difficult to understand or to verify which algorithms the brain running.
00:46:25
Speaker
I'll give a simpler example of laughter, and this is simpler for a number of reasons. Well, first of all, I think it's a simpler circuit. And second of all, I think it also exists in the same form in ah rodents and and other animals, at least other mammals.
00:46:39
Speaker
I wrote this post called The Theory of Laughter last year or the year before, and it proposes that there's specific group of cluster of cells in the hypothalamus, I think.
00:46:50
Speaker
I'm not sure exactly where it is. And I say that it should have two inputs and that it should fire when both of those inputs are present. One input would be related to physiological arousal in the sympathetic nervous system, the other to the parasympathetic nervous system.
00:47:05
Speaker
And that when both of those are present in the same time, then that should trigger this ah laughter reflex. So people have are honing in on this cell cluster because we have already found this group of cells in the brainstem that actually triggers the motor reactions associated with laughter. So what you would do is start from there and do what's called a retrograde tracer experiment to see what upstream cell groups project to that one.
00:47:34
Speaker
And then ideally you would find hopefully a small number of candidates And some of them would be in the hypothalamus and you would look into them perhaps by stimulation studies or you know just looking at the inputs to those.
00:47:47
Speaker
And hopefully you would find some group that's just obviously ah implementing the pseudocode that I wrote down in the box in that blog post. And that would be great if somebody found something like that.
00:48:00
Speaker
I'm a little bit farther from that level of ah experimental readiness in the case of human social instincts or non-laughter social instincts like compassion and spite, just because the story is a little more complicated and there's more ingredients that I'm a little bit unsure about.
00:48:17
Speaker
But presumably, one could come up with ah exactly what you're looking for. and then, yeah, even without a dedicated experiment, if we had a connectomic data set, then somebody could dive into exactly trying to find cell groups that connect to other cell groups, you know, or or perhaps, you know, peptide receptors or whatever that reproduce that logic and in a straightforward way. And of course, i want to emphasize that the the end goal is not to understand, at least my end goal is not to understand exactly how human social instincts work.
00:48:48
Speaker
i I even question whether by sort of neuro, my little sketchy outline foot in the door thing is might already be enough that there's diminishing marginal returns from continuing to dive into this neuroscience question.
00:49:01
Speaker
And that's why since writing that post, I've actually just been putting my nose back directly into the question of AGI safety and saying, okay, if that's the jumping off point, then I want to jump off of it and think again, what what reward function do we want for AGI? What training environment do we want for AGI?
00:49:18
Speaker
And it it might be that after I think about that for a while, I'll decide that, yeah, I am still kind of bottlenecked by being too confused about human social instincts to think coherently about this topic, in which case I would try to flesh out the story more.
00:49:31
Speaker
So if we were to have a good understanding ah of how human pro-social instincts work in the brain, how far would that take us towards solving the more perhaps important problem of of AGI safety?
00:49:47
Speaker
Yeah, I definitely don't want to make any strong statement. It seems like a useful thing to do ah for a bunch of different reasons. You know, one one one reason is it's directly an inspiration. Another reason is it's sort of helps us in this.
00:50:00
Speaker
So ultimately we we want to sort of be able to make some statement about AGI that says ah reward function X plus training environment y equals an AGI that's trying to do Z. um And this is this like yet to be invented science of reinforcement learning in a brain like AGI architecture.
00:50:18
Speaker
And it would be great to ground that yet to be invented science with what happens in humans. So humans have, we know about the human training environment. If we know all the human innate drives, then we could likewise try to relate different innate drives to where adult humans wind up in different environments.
00:50:36
Speaker
And that would be like lots of data to sort of ground our thinking and and keep us on the right page in terms of, yeah, anticipating what could go wrong and what could go right for AGI.
00:50:46
Speaker
And then that's, so that's the sort of indirect thing. And then there's the direct thing of just being inspired by what the human brain, ah what tricks evolution has put into the human brain and copying where applicable, or modifying them where inapplicable or leaving them out when they're actually bad.
00:51:01
Speaker
There's a lot of I have a lot of concerns. like One concern is that feel like things like norm following and sense of justice are good, but things like ah social status signaling and jealousy are bad.

Replicating Positive Social Instincts in AGI

00:51:16
Speaker
But it might be that the very same innate drives are doing all four. like One innate drive ah is upstream of all four of those different things, the good ones and the bad ones. Then we would be kind of stuck if we're trying to design our AGI to the same do just the good things.
00:51:31
Speaker
Yeah, I mean, optimally, we would we would kind of take the best of humans and have that as our as our motivation for AGI. It seems like a lot of our and social instincts, so not not just pro-social, but social instincts are negative and perhaps kind of concerned with but things we wouldn't want a future AI systems to be concerned about.
00:51:56
Speaker
ah Do you have a solution for the problem you just raised of... Perhaps some of the some of the innate drives that underlie some pro-social motivations also underlie some some negative social motivations.
00:52:10
Speaker
I mean, if you look around at humans, some of them, yeah hopefully they're they're not listening. I won't name names. Some of some of them just kind suck and others are pretty great. So evidently there's configurations and settings that, at least in some environments, are able to output human adults who don't suck, who we're happy to have around. I'm not sure I want to make any, not sure there's any human who I want to make dictator of the universe.
00:52:34
Speaker
And we might not have a choice because if an AGI can you know instantly replicate itself and project power that way, seems like very hard to get into a world with balance of power and things like that. ah So maybe, yeah, maybe we're just screwed no matter what, but at least there's a chance that, you know, some of the humans who don't suck would be ah trustworthy with the ability to control a lot of amount, a large amount of power without being, you know, corrupted by it.
00:52:59
Speaker
One vision of a good future is that the AGI accumulates a lot of power and uses it to implement what's called a long reflection or coherent extrapolated volition or or some other good thing that is not just like keeping the power for itself and making all the decisions, but rather taking power and using it to have some sort of more democratic and participatory regime that works out well for everybody, including the people who are who never signed up for this sci-fi nonsense in the few first place and just want to have their good, and normal human life free of disease and suffering.
00:53:36
Speaker
You mentioned the the example of f CEO ah building a billion-dollar company from scratch. If we go back to that example, we could say that some of the traits that are necessary for such ah as a CEO to have would be something like risk-taking and yeah know ability to make decisions under extreme and uncertainty and perhaps not a willingness to endlessly debate and you know, get feedback from others, but simply to take some actions and then stick to those actions and in the face of resistance also.
00:54:10
Speaker
Is there a case to be made that some of the drives that that would be necessary to have capable AIs would also be potentially destructive or ah dangerous?
00:54:24
Speaker
I did read somewhere, perhaps an Adam Grant book, so maybe take this with a gra a grain of salt, but I think he listed a bunch of examples of successful CEOs and founders, ambitious people who were extremely risk averse.
00:54:36
Speaker
And so that seems to be a thing that can happen. I do think that that there's 100% cases ah where there's a trade-off between AGI competence versus ah AGI safety and alignment.
00:54:50
Speaker
So one example would be Curiosity Drive. We find in the reinforcement learning literature that lot of problems are better solved by an RL agent that is curious to enter novel situations.
00:55:02
Speaker
um And likewise, animals and humans are have an analogous curiosity drive. And presumably that convergence is because curiosity is just a good way ah sort of bootstrap learner in ah and an environment like that's too complex to learn all at once.
00:55:20
Speaker
Yeah, I can attest to that. i i had a yeah I had my first child a year ago. And so it's it's it's been wild to see how much he's willing to sacrifice in order to try something new and to learn about his environment. Basically, he's putting himself at risk in order to to gain new information all the time.
00:55:41
Speaker
So definitely, I can attest to that. Yeah, but if you think about a full grown powerful AGI, then curiosity is clearly a bad thing because if the AGI faces a trade-off between human welfare and satisfying its own curiosity, ah we don't want that to be a trade-off. We want it to be all in 100% on welfare.
00:56:00
Speaker
ah human welfare um or you know welfare of animals, AGI's or whatever else. I think there's a lot of things like that where there's sort of a more concerning one, which is that ruthless sociopaths who are willing to break the rules when they don't get caught are in practice in the world today, often able to accomplish impressive things that rule followers are not and the norm followers are not.
00:56:22
Speaker
If that same dynamic holds in the AGI world, that's a really bad ah sign and it's it's hard to see how to avoid that. or it's hard for me. Do you think the best way to understand pro-social drives is to look at the brain? Because it it could also be to look at the environment or the the culture that that we are in.
00:56:42
Speaker
Perhaps when large language models are reading everything humans have ever put online, They're understanding our culture and they're understanding what it means to be honest or compassionate or rule following such that they have those concepts.
00:56:58
Speaker
What I'm asking is whether whether we should look to the brain or to culture to understand pro-sociality. I 100% agree that human social behavior depends on culture and not just on innate drives, but it depends on culture via innate drives.
00:57:14
Speaker
So this actually points to an important difference that's safety relevant between foundation models and brain-like AGI. Maybe the biggest difference is that a foundation model pre-training has this sort of magical way to transmute observations into actions.
00:57:30
Speaker
So if the foundation model during pre-training sees the letter a ah in a certain context, then what happens with pre-training is that that turns into outputting the letter A in the same context.
00:57:42
Speaker
And likewise with every other token. There's really nothing like that ah in the case of the human brain. If you hear somebody say something, then that amounts to auditory expectation.
00:57:54
Speaker
Whereas if you say something yourself, that involves moving your throat and larynx and mouth to emit words. So people can imitate other people, but they imitate other people because they wanted to imitate other people.
00:58:09
Speaker
And they don't always want to and imitate other people. So if... You know, if there's somebody who you really look up to and that person starts skateboarding, then you're going to start thinking that skateboarding is cool.
00:58:20
Speaker
But if there's somebody who you think is really stupid and awful and they start skateboarding, then you're going to be, if anything, less likely to skateboard because now it's, it's cringe. So yeah, foundation models can just ingest and ah duplicate human behavior.
00:58:35
Speaker
or at least human behavior as ah reflected in the text of the internet during pre-training and they just get to clone all of these human behavioral tendencies so that's not strictly true after post-training but i think it's still like 99.909 percent true because present true because most of the learning happens during pre-training so that's so foundation models have this sort of magical shortcut to sucking up a human behavior and human cultural norms which is i think the reason that people find it a lot easier to make foundation models to behave in ways that at least superficially and we can debate about whether it's deep or not but at least superficially you know claude seems nice and
00:59:17
Speaker
ChatGPT seems nice and they seem to follow the rules. And if you ask them questions, they seem to give reasonable answers. If they're not jailbroken, I think it would be a lot harder to get anything remotely like that for brain-like AGI. I think alignment is much harder and but can can go way more off the rails in the case of brain-like AGI. And that's what I expect to happen.
00:59:40
Speaker
and And why is that? Brain-like AGI, everything it does, it does because it wants to do it, because of the motivation system. there's There's no sucking up and regurgitating typical human behaviors.
00:59:51
Speaker
So you don't expect a brain-like AGI to behave in typical human ways, in typical human situations, unless you have specifically somehow figured out a way to build that motivation system that that leads to that. And I don't know how to do that, and I'm concerned that nobody else is going to figure it out either.
01:00:08
Speaker
If someone listening to this wanted to work on brain-like AGI safety, what's the best way to start? Yeah, there's there's so many challenges.
01:00:19
Speaker
If we just focus on the technical challenges, there's the neuroscience projects that I was working on, which is to better understand how human social instincts work in the hypothalamus and brainstem.
01:00:30
Speaker
By the way, if you read intro to brain-like AGI safety, which is a blog post series and also preprint. Then the last post of it, or post 15, chapter 15, has a list of some unsolved problems that people can work on.
01:00:44
Speaker
I also list some things that are a little bit more like conventional machine learning or conventional computer science. I mean, it would be great to have There's lots of stupid things that are going to come up sooner or later. Like, do we have good, secure reinforcement learning training environments that we can use for sandbox testing that are that have had a proper security audit against an AI trying to break out of them?
01:01:06
Speaker
Probably not. A lot of these that that are like really user friendly, such that people are actually going to use them. You know, there's so many ways for us to shoot ourselves and in the foot during this mad scramble to deploy future mad scramble to, you know, where everybody's going to be trying to test and deploy brain like AGI as powerful as possible, as fast as possible. You know, what can we do lessen the frantic, a competitive race situation?

Developing Brain-like AGI with Minimal Compute

01:01:32
Speaker
I don't really know.
01:01:33
Speaker
What can we do to buy more time? I don't really know. i don't think anybody has great plans for any of these things, especially because I expect that the path from here to brain-like AGI is people publishing stuff on GitHub and Archive, as opposed to bigger and bigger and bigger training runs.
01:01:49
Speaker
I expect that when we can make brain-like AGI at all, ah we can make brain-like AGI in a shockingly small amount of compute, like even a ah consumer GPU or two, certainly you know a university cluster would be plenty because I don't think the brain really i don't think the brainin uses that much compute and it doesn't even use it very efficiently.
01:02:09
Speaker
And so nobody has a plan for governance that could survive in this ah low compute world that I'm anticipating. you know There are things that people can do on the margin. I think that would be helpful.
01:02:21
Speaker
What else? Yeah, just trying to think through the whole problem and and what's going to go wrong and how to put the pieces together and come up with a good plan. ah there's There's a lot of work to do, I think. i I would love ambitious, creative people who can think carefully about problems, naughty problems that have lots of components and and try to find where everybody's dropping the ball and and work there.
01:02:44
Speaker
Safety work that's done in the current paradigm of foundation models, how relevant is that if it's if the first AGI is going to be a brain-like AGI?
01:02:55
Speaker
I typically find it not super duper, at least technical safety stuff. Like I try to stay abreast of of the literature. i tend not to find it super helpful for what I'm doing.
01:03:07
Speaker
Governance stuff is different. I find that again, mostly, or maybe maybe like helpful on the margin, but not not very helpful. it would be nice.
01:03:17
Speaker
Yeah. Like, so for example, if future companies collaborate on safety, I don't want the government to be, I don't want them to think that maybe they're breaking antitrust law. And if the government could write a letter that says, obviously, AGI companies collaborating on safety is fine and not a breaking of antitrust law, then that would be a good thing. And whoever's working on that, I think that's, you know, on the margin, a good thing that they should be working on.
01:03:46
Speaker
Again, I think you know powerful AGI is going to require so little compute that it's going to be ubiquitous. um But nevertheless, if it's ubiquitous, it would still be nice for governments to know where the data centers are.
01:03:59
Speaker
yeah are i guess I would prefer a world where the governments know where the data centers are but where they don't and where they have at least some idea of what's going on in them. Maybe it doesn't make a huge difference, but on the margin, but that still seems like marginally useful. It's not a fight that but I would want to fight, but...
01:04:16
Speaker
Sure, why not? It sounds like you're not very hopeful just because if things are as as as you think they they they are, AGI will require not not that much compute. So we can't perhaps track these giant training clusters that that we we might be able to track in a world in which a AGI requires enormous amounts of compute.
01:04:38
Speaker
What developments would make you more hopeful? Yeah, I think if somebody has a technical plan to make AGI that is sufficiently, simultaneously sufficiently powerful to really move the needle on societal resilience and things like that, and also has the best interest of humanity and other things that we care about at at heart, then having that technical plan in hand would be a great progress.
01:05:02
Speaker
It would also be great if more and more people were aware of that technical plan and sort of bought into it, especially people who are ah working on the many people who are working on brain-like AGI as we speak.
01:05:14
Speaker
i' I'm not like that enthusiastic about outreach when I don't actually have the technical plan yet. How could we ever be confident that we had the right technical plan in our hand, right? it's it's As I see it, it's it's bound to be a a high stakes, fast moving situation in which we'll have some people, some group of people will have to make some decision about whether we feel like this is sufficiently safe.
01:05:37
Speaker
And that's that's even and in in the kind of positive or in the positive situation in which we are even rich decision makers at ah a global scale are even thinking in these terms. But is there is there some some way for us to know in advance whether the technical plan we have in our hands is actually going to work?
01:05:55
Speaker
that This is an important issue. I think it would be great to have you know, as a first step, a technical plan that is not obviously going to fail, which I think is is the bar that we we haven't even crossed that bar yet.
01:06:08
Speaker
Once we cross that bar, then we can start, like you said, to sort of hone in on and how do we start to de-risk it? How do we do tests? Certainly part of the technical plan should involve ah you know sandbox testing and you know what what unit tests are possible. Yeah, there is definitely, I'm i i am very concerned and expecting that there's going to be a point of no return, a little bit akin to launching a space probe where If the antenna is pointing the wrong direction, then it's too late to fix So yeah. How do we know that something is safe? I don't know. I think that we should have some plan that seems like it might work and then we can work bit by bit to try to flesh it out and to try to ah anticipate where it might go wrong and to have some theory based reasons to believe that those things won't go wrong or, you know,
01:06:56
Speaker
things that are based on reasons as opposed to just hope. There are a whole bunch of unknowns here and and there are things that we would like to know. there There's a whole science of ah human motivation that that it would be nice to have decades to kind of get get a grip on.
01:07:14
Speaker
Are there any tactics for buying time that you think are are useful? Is it useful to try to collaborate on pausing at ah at a certain moment? are there any Are there any ways to buy time that you that you find plausible?
01:07:28
Speaker
the The message that but I want to project loud and clear is that more people should be working on brain-like AGI safety who are not working on AGI safety at all, especially people who are trying to build a brain-like AGI. I do think that it's great.
01:07:43
Speaker
And I myself try to do lots of ah outreach to people in the AI and neuroscience communities who are ah either working on reinventing or reverse engineering algorithms that I think have the secret sauce that that I think the brain has.
01:07:59
Speaker
It's a hard slog because they, you know, they get paid. This is their life work. they They're doing all this cool science and finding all these cool algorithms and eating benchmarks, whatever. in in their toy models and people don't tend to be receptive to the message that they should that they're doing the exact wrong thing and they should instead be working on safety although i do like to emphasize that the technical safety problem is also this very interesting fascinating technical problem with lots of discoveries yet to be made and so they can still do cool neuro ai work neuro and or ai work
01:08:33
Speaker
and still be on the right side of the aisle from my perspective. So that's where I spend most of my effort. I'm not that enthusiastic about the you know pause AI movement or you know governance efforts like that, because again, I think that the path from here to powerful AGI is going to be some new AI paradigm that's currently either non-existent working in toy models.
01:08:59
Speaker
And that's going to happen without much compute. It's going to happen on GitHub and archive and neuro apps and stuff like that. It's not, <unk> we're not bottlenecked on giant training runs.
01:09:10
Speaker
If anything, I think the giant training runs might be a distraction from actually powerful and dangerous AGI that I'm thinking about. But of course I could be wrong and I don't want to like make any very strong statements in case I'm wrong.

AGI Alignment: Motivations and Human Welfare

01:09:22
Speaker
Yeah, i mean, I know a lot of people that will be very happy to to see the large language model scaling the paradigm kind of plateau for for underlying technical reasons so that we would have more time to work on safety.
01:09:36
Speaker
When will we know? but did you think when when we when will we know whether ah that paradigm is plateauing I don't know. I mean, what was it? Dario Amadei has the most aggressive timelines that I've ever heard. he was like, we're going to have Nobel Prize winning inventions from autonomous AGIs in, what did he say, 2027 or something?
01:09:57
Speaker
Late 2026 or early 2027. So that's... That's crazy. Yeah. So I think we will know quite soon whether Dario Amadei has been drinking too much Kool-Aid or whether he knows important things that are not public information, or at least not that if he knows something that I know.
01:10:14
Speaker
And then there's a lot of people ah who are you know less less aggressive than than him. We'll find out. Certainly things have progressed quite quickly since 2018. So as time goes on, we'll learn more and more about the capabilities of foundation models.
01:10:30
Speaker
And again, if somebody is working on the the contingency that foundation models do in fact scale to a powerful, dangerous AGI, then I'm happy for them to be doing that. I don't want to strongly discourage them from doing that because, you know, what if I'm wrong?
01:10:45
Speaker
i don't know. I'm not, I've been wrong before, believe it or not, in my life. Do you think AIs in the form that they currently have are either are useful in your work to help you in your research or will be useful as researchers or as a research help either as a tool or as an agent in trying to do the the kind of safety work that you're doing?
01:11:09
Speaker
I don't, I can't speak to the future. I think as of right now, i haven't gotten more than marginal help from the various foundation model ah service providers that I subscribe to.
01:11:21
Speaker
Yeah, I use them for sort of random little things that, that I could theoretically also Google, but maybe it would be take more time and be a little worse to Google them. I haven't successfully gotten like interesting conceptual ideas out of them so far.
01:11:37
Speaker
All right. As the last question here, maybe you could paint us a picture of what success would look like for brain-like AGI. Oh, gosh. I don't know. And it certainly shouldn't be up to me to decide what crazy sci-fi, you know, whether and how we have this crazy sci-fi future. I think it's terrible that...
01:11:57
Speaker
You know, people are going to unilaterally welcome this new intelligent species onto our planet without necessarily knowing how to control it or make it care about humans.
01:12:07
Speaker
Yeah. What does the future look like? I don't know. i think that I think that what I'm working on now, this technical problem of being able to sculpt AGI motivation such that it's not callous indifference to human welfare is a good thing to work on robustly across multiple scenarios.
01:12:24
Speaker
So I'm going to keep working on that. But, you know, I've been thinking, I've actually, this is actually a little bit part of what I've been doing in the past six months since writing neuroscience and human social instincts, a sketch.
01:12:36
Speaker
It's been actually a few years that I've had my really had my nose in the neuroscience books and haven't sort of caught up with the larger AGI safety and alignment, safe and beneficial AGI discourse.
01:12:47
Speaker
So i think there's a lot of disagreements between people in the field. And I don't feel like I fully understand where other people are coming from. And I've been trying to go through and hash that out.
01:13:00
Speaker
And I think I'm making some progress, but maybe in and in a couple months, I'll have a better answer to those kinds of questions that I do right now. Steven, thanks for joining me on the podcast. Thanks for your invitation. It was a delight. Thank you.