Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Roman Yampolskiy on Shoggoth, Scaling Laws, and Evidence for AI being Uncontrollable image

Roman Yampolskiy on Shoggoth, Scaling Laws, and Evidence for AI being Uncontrollable

Future of Life Institute Podcast
Avatar
506 Plays9 months ago
Roman Yampolskiy joins the podcast again to discuss whether AI is like a Shoggoth, whether scaling laws will hold for more agent-like AIs, evidence that AI is uncontrollable, and whether designing human-like AI would be safer than the current development path. You can read more about Roman's work at http://cecs.louisville.edu/ry/ Timestamps: 00:00 Is AI like a Shoggoth? 09:50 Scaling laws 16:41 Are humans more general than AIs? 21:54 Are AI models explainable? 27:49 Using AI to explain AI 32:36 Evidence for AI being uncontrollable 40:29 AI verifiability 46:08 Will AI be aligned by default? 54:29 Creating human-like AI 1:03:41 Robotics and safety 1:09:01 Obstacles to AI in the economy 1:18:00 AI innovation with current models 1:23:55 AI accidents in the past and future
Recommended
Transcript

Introduction and Book Overview

00:00:00
Speaker
Welcome to the Future of Life Institute podcast. My name is Gus Docker and I'm here with Roman Jampolski. Roman is a professor of computer science at the University of Lewisville. Roman, welcome to the podcast. Thanks for inviting me again.
00:00:16
Speaker
We are here to talk about your forthcoming book called AI Unexplainable, Unpredictable, Uncontrollable. This is a very dense book. There's a lot in there. So as I told you, we have a lot to talk about. I think we could start at the very surface level, which is just the cover of the book. So the cover is a Shogoth meme or a Shogoth meme. I'm not certain how that's pronounced, but it's basically this kind of like
00:00:42
Speaker
Lovecraftian monster, octopus creature that humans are then putting masks on. So maybe you want to explain what that means and whether that is, you know, is that your view of AI?

AI Complexity and Risks

00:00:56
Speaker
Yeah, so that's a classic meme. The idea is that we have this role model, a monster, really just we know it's nasty. But what we're trying to do is kind of make it look pretty, put some lipstick on it, smiley face. So a lot of it is filtering. Don't say this word, don't do this. But under the hood, it's still completely uncontrolled.
00:01:18
Speaker
completely alien creature. To me, at least this feels like a lot of modern AI safety work is all about that. Make it feel kind of safe and secure, but it's not really changing overall safety of the system.
00:01:34
Speaker
People may perceive this as being kind of a hyperbolic meme. You know, if you have, if you're describing AI as a shoggoth, but then you go to chat GPT and you say, okay, this, this feels more like a tool. This doesn't even feel that agentic to me. So where is, where's the path from current AI to, to this more kind of shoggoth like AI?
00:01:55
Speaker
Well, it's degrees of capability versus danger, right? If it's a narrow system, it will fail in some ways, but they're very limited. We're not going to have existential crisis because it uses incorrect word or misspells something, trivial things. As it gets more capable, the impact from mistakes, accidents, errors will
00:02:16
Speaker
become proportionately larger and if it becomes fully general then it can make mistakes across multiple domains and impact is equivalent or exceeds that from human agents. But you would describe the foundational models underlying chatTBT or the open-source models or any of these proprietary models as being fundamentally show-goth-like in that they can't be explained or we don't understand how they work basically still.
00:02:45
Speaker
Well, there are certain things we obviously know about them, but we need complete understanding. If you're engineering a system, we're kind of used to knowing every brick, every wire, where does it go? Why is it there? What happens if you remove it? Here we train it on all of internet, basically all human knowledge. And then once we're done training it, we experiment on it to see what can it do.
00:03:09
Speaker
What are the dangerous behaviors? But once we discover, we try to mitigate. That doesn't seem like it's going to get you safety and security at the level something so capable needs to be controlled. Your book is a description of a number of features that we can't get with AI. So we can't get predictability, for example. We can't get explainability. Maybe if we start with unpredictability, why is it fundamentally speaking that we can't predict AI behavior with certainty?
00:03:39
Speaker
So we can predict certain AI behaviors, it's just we can predict all of them with a necessary degree of precision. And the proof is trivial if you have something smarter than you.
00:03:51
Speaker
The assumption is you are not that smart, so you can't make decisions the system makes or predict them. Playing chess is a classic example. If I can perfectly predict what my opponent is going to do in any situation, I'm playing at the same level. But the assumption is they're a better player. So it's violation of our initial assumptions. If we cannot predict how smarter than human agent will accomplish its goals,
00:04:17
Speaker
in particular, so maybe we know overall direction, the terminal goal it's trying to achieve, but we don't know how it gets there. It is possible there are some very significant negative side effects from obtaining the final goal, but at the cost of many things we're not willing to sacrifice to get there.
00:04:37
Speaker
But we can make uncertain predictions. If you're an investor deciding which companies to invest in, you wouldn't say that because we can't make certain predictions about which companies are going to be most valuable, we shouldn't. Uncertain predictions is the norm in any domain, I would think.
00:04:54
Speaker
Absolutely. And then you're investing, you're happy to be guaranteed 80% chance of making good money, 20% chance maybe you lose your principle. Here we're talking about all of humanity. So if there is 1% chance we're all going to die, is the payout, let's say free labor worth

Moral and Ethical Implications of AI

00:05:12
Speaker
it? And that's a decision I don't remember voting on. Some people are trying to make it for us.
00:05:17
Speaker
Could these uncertain predictions be good enough for policy or say good enough for technical guarantees or reinsurances that systems are functioning like they should?
00:05:29
Speaker
But in my opinion, no. So if you're talking about a system which can kill everyone or worse, what level of confidence do you need to decide to run the system? To me, it has to be basically 100% safety. That's not attainable. I know that. That is not the standard we ever get in software. Usually with software, we click, yes, I agree to whatever consequences and we move on. If it fails to,
00:05:57
Speaker
Perform as expected. It's not a big deal. You lose some data, you lose privacy, they reset your credit card and password. If you're talking about killing everyone, the standard should be somewhat higher in my opinion.
00:06:09
Speaker
I guess it comes down to fundamentally a moral question about what level of risk makes sense. What you're doing in the book and in your work in general, as I see it, is more setting up proofs about what we can't have.
00:06:28
Speaker
separate discussions of the moral questions, but we can separate those and we can talk about whether the proofs hold and then what we should do in the moral domain or with the question of what level of risk we're interested in running. Just on a personal level, you're not interested in running basically a 1% risk or a 0.1% risk or anything like that.
00:06:47
Speaker
So even if I was, I don't think I have a right to make the decision for 8 billion other humans. We cannot consent to it because, as I said, the system is unpredictable and explainable, so they cannot give you a meaningful consent. They don't know what they consenting to. So this is very unethical. You would never be allowed in any other scientific domain to run experiments on other humans who cannot provide meaningful consent, especially if the outcome could be complete extermination.
00:07:17
Speaker
And so the one issue is also that we can't with precision say what the probability of an AI catastrophe is. So we hear the heads of the major AI corporations talking about maybe 10% or
00:07:31
Speaker
20% I've heard academics that are interested in this domain say the same number, but one result of AI being unpredictable or unexplainable is that we can't get a precise number of the risk of extinction. These are of course subjective probabilities, but we can nail down any probability of catastrophe.
00:07:53
Speaker
Absolutely. We can't make precise predictions. Even worse, if you look at other predictions in this domain, for example, how soon before AGI, I think it changed by orders of magnitude in the last year. So what good are those predictions to begin with? Next accident, next paper will completely change them.
00:08:12
Speaker
Although with language models in particular, we see these scaling laws where capabilities increase as a function of training data and compute usage. So we can say something about which capabilities will arise in the next model. And that might be good enough for some policy applications or some security measures.
00:08:34
Speaker
I'm not sure we can say what capabilities will arise. We're saying it will be more capable, but we don't have precise predictive power and exactly GPT-5 will have the following. I don't think anyone makes that claim. No, that's true. But we might say something about, for example, the length of a text output that can be produced at a certain level, given certain data, given certain compute. And so that is somewhat of a prediction of the capabilities of the models.
00:09:01
Speaker
I'm just thinking, if we accept your conclusions here that we can't predict AI with any interesting level of certainty, what do we have then? We must kind of work with what we have and maybe the scaling laws are the least bad predictive tool we have.
00:09:17
Speaker
Well, we can work with what we have. Literally, we have a model which has so much potential, which hasn't been integrated with the economy, has not been fully understood or explored. We can spend the next 30 years exploring existing model and growing economy based on that, integrating it with existing infrastructure.
00:09:36
Speaker
there is no need to have GPT-7 next year. That's not a requirement for anything, not economic growth, not scientific progress. It's just a very misguided competition towards possible annihilation. Do you think the current scaling laws we have now will hold as AIs become more agent-like? Will the scaling laws we have for more tool-like AIs also hold for more agent-like AIs?
00:10:04
Speaker
It's likely. I mean, if it's a byproduct of having enough compute sufficient number of neurons to model the world, I would expect something similar. I'm not sure we have enough data, natural data to continue scaling. Maybe artificial data will be good enough. Maybe not. It's still being experimentally determined, but as long as we have enough compute and enough to train on, yeah, it will keep getting more capable.
00:10:32
Speaker
I think the synthetic data might be a big uncertainty or a big open question about whether if that can work, we can continue scaling, but if that doesn't work, we're basically using the entire internet already. Of course, I don't know about what the labs have or the corporations have as proprietary data sets that they might train on, but do you think AI progress hinges on whether synthetic data can work?
00:10:58
Speaker
It could be a bottleneck, but it seems like we can generate very good synthetic data. We can create simulations of our civilization. We can have self-play type data creation where the agents just compete in the domain we care about until like we saw with Go for example, they are at very good level of performance compared to humans. So I don't think it's a permanent problem, but it may slow down things for a little while.
00:11:25
Speaker
There's been a lot of effort into trying to determine when we might get AGI, so artificial general intelligence. Do you think we should spend more resources trying to predict this year? Is it interesting knowledge?

AI Capabilities and Limitations

00:11:40
Speaker
Do we gain something that's useful for the actions we take by trying to get better models of when we get to AGI?
00:11:48
Speaker
Information is always useful, but I'm not sure whatever it's two years or five years makes any significant difference, especially if a problem is not solvable to begin with. So I personally don't make significant changes based on this prediction. I see top CEO say, okay, we're two years away from AGI. I see people say five. At no point did that somehow impact what I'm doing or concentrating on.
00:12:13
Speaker
As you mentioned before, these predictions of when we get AGI have kind of fallen quite rapidly recently. And so maybe just five years ago, it was more common to hear 30 years or 20 years, whereas now it's more common to hear maybe five years. Maybe both of us and the listener, maybe also we're in a bubble, but I don't know. Do you have a sense of people outside the kind of AI bubble? What do they see here? What do they predict?
00:12:43
Speaker
So there are degrees of this outsiders from the bubble. There are those who are so outside, they don't even know what we are talking about. They have no experience with AI or maybe even computers and internet. Those who are a little closer, they see it as, okay, it's an excellent tool. It's going to help us with marketing efforts for our startup or something, but they don't see it as complete game changer.
00:13:04
Speaker
Definitely very, very few people understand what happens when we hit full human capability and beyond. And then I tell them, OK, it could be two years. I'm not saying it is two years, but it very well could be. There is just cognitive dissonance. They don't understand. So the follow up questions are like completely irrelevant to the consequences you would expect from that.
00:13:26
Speaker
Do you think unpredictability scales with intelligence? So a system is less predictable as it becomes more intelligent? To the lower level, intelligence, yes. So it's much harder to predict something 10 times smarter than you, than 5 smart.
00:13:45
Speaker
Now we're talking about the intelligence of the observer, but I was just thinking that at a certain level of intelligence, a very low level, say an AI that can play checkers or something, that seems almost fully predictable. Where does it break? Is there a threshold that's reached or what is it that makes more advanced AI unpredictable?
00:14:12
Speaker
So it could be even a narrow domain. So a game like chess or go, as I said, if it's an opponent who's playing better than you, you're not going to predict it, even if it's completely stupid and every other domain cannot drive cars or anything. In general, I would guess that generality is a big, big threshold, big jump, because you're switching from predicting in one or a few domains to every conceivable domain, including invention of new domains. And that's hard to predict.
00:14:39
Speaker
And do you think that generality is this kind of step that's reached? Do you think, for example, that the GPTs are general in the sense you mentioned there? Because, of course, there are domains in which GPTs can operate.
00:14:53
Speaker
So there is a chapter in the book about the difference between human level and AGI. And you can talk about different degrees of generality as well. So you can have a system which is general in, let's say, five domains. It's general within those. Humans may be general in the domain of human expertise, which could be hundreds of domains.
00:15:11
Speaker
but we are not universal. We don't have certain capabilities animals have. So you can talk about as the system gets more general, it becomes harder and harder to predict it, harder and harder to compete with it. GPTs, large language models, definitely general, but not universal. They don't have complete generality. There are many domains in which they fail miserably, but the number of domains in which they are highly competent keeps increasing, and that is something we can predict with scalability laws.
00:15:40
Speaker
Yeah, so actually say a bit more about, this is something I wanted to ask you about, why are humans not AGIs? Because I think there's maybe a tendency to conflate AGI, artificial general intelligence, with human level intelligence. So why are these two not the same?
00:15:55
Speaker
So there are certain things we cannot do, which already computers we have today are capable of doing. For example, there are some interesting pattern recognition things where you can look at someone's x-ray or retinal scan and know things about their race, gender, things which a human doctor has no idea how you get there.
00:16:15
Speaker
There are other examples I use in the paper where basically, yes, there is pattern recognition in the type where you can take my voice and synthesize a face from it. Things which are just beyond human capabilities. I'm not talking just like adding large numbers as a calculator. Yeah, that's outside of human domain, but it's not interesting. There are things computers can do which we just don't have capability of accomplishing.
00:16:41
Speaker
Are humans more general than current AIs? Do we have a way of measuring generality? I would say we are still more general. There are things we can do that GPTs are not helpful with. That's why I still do my own research. That's why you're interviewing me and not a GPT model. But in many domains, they are superior to me. They are much better in so many things. They speak hundreds of languages I don't speak. They can produce now music and all these musical instruments in their own place.
00:17:11
Speaker
We are no longer directly comparable in terms of, OK, we are superior in all those. It's more like it has a subset of 50 domains that's better yet. And I have a subset of 10 domains I'm still keeping as my dominant space.
00:17:26
Speaker
It is actually an interesting experience to chat with the language model and realize that these models have basic competency across an extremely broad number of domains. You can try to talk to it about your favorite subject and it'll be pretty good. You'll be able to spot some flaws in this reasoning and some facts that are missing.
00:17:49
Speaker
But then you think this holds for all domains. It can discuss Roman history as well as engineering on Mars at an equal level, probably, depending on what's in the training data. But yeah, so it is surprisingly general, but perhaps it makes sense to say that humans are still more general. I'm asking because we don't have a formal measure of generality, right? We don't have anything that can measure it in a formal sense.
00:18:15
Speaker
That's right. We don't and we kind of try to talk about humans. Okay. Those are geniuses. Those are not geniuses. They can pick up new domains very quickly. They can compete with other humans much better. But yeah, it's not trivial where you can say like, this is G74. This is G20. It would be great to develop this type of measurements and tests, but
00:18:38
Speaker
It's very time consuming. We don't have a complete list of domains and we keep inventing new domains and intersections of domains. So always interesting. So it's a big open problem. Kind of part of what makes this so difficult and gives you a hint for why maybe it's not easy to test for everything and predict everything and explain all possibilities.
00:18:58
Speaker
Why are you making this point that humans are not AGIs? Is it because we are then to realize that AGI will not be like humans? Is that the kind of confusion you're trying to dissolve? There is a lot of redefining of terms. I think lately I hear people talk about AGI
00:19:21
Speaker
in terms which used to be reserved for super intelligence, that's even more confusing. And historically, we had human level used interchangeably with AGI. And I'm just pointing out, I mean, for most conversations, that's what we have in mind, but they're not the same, they're not equal. And I think there is an infinite subset of domains which are not included in human capabilities, which even standard AGI will be able to achieve. And then super intelligence is just like, yeah,
00:19:50
Speaker
so they're capable in all of those.
00:19:52
Speaker
I was asking because one standard point I've heard a bunch of time is that we shouldn't be scared of AGI. We shouldn't be worried about the effects of AGI because we already have eight billion AGIs on the planet. And so maybe at one point of distinguishing between AGI and human-level intelligence or human intelligence is to say that AGIs will not be like humans and they may not integrate into the economy just as a person would and be kind of non-problematic in that sense.
00:20:22
Speaker
Absolutely. That's part of it. And the second part is it will have those magical capabilities in things we as humans don't have any comprehension of, but it will be able to do that.

AI's Unexplainability and Science

00:20:34
Speaker
If AI is unpredictable, does it also mean that the effects of AGI or AI are unpredictable? So we can't say, for example, what will happen in the economy at a certain level of artificial intelligence.
00:20:48
Speaker
Well, in general, even if we could predict what AI will do, we don't know how that will impact the world. So a lot of times we know that this technology is coming, but what people actually do with it, how they use it is so dynamic and depends on us being able to predict 8 billion humans and their preferences.
00:21:06
Speaker
So yeah, definitely have multiple degrees of unpredictability, but I'm just limiting my research to decisions made by AI, both as a final decision and intermediate decisions to get there. Yeah, but you seem pretty worried about disempowerment or loss of control to AI. And isn't that a kind of distinct prediction about the future effects of AI?
00:21:30
Speaker
It's a general prediction about what will happen, but I don't know what that means. Does that mean we're just kind of useless and have nothing to contribute? Does it mean it kills everyone? Does it mean it's keeping us alive and tortures us indefinitely? It's not the same uncontrollability. Yeah. Yeah. So there are multiple ways that we could fail to get a good future with AI. Exactly.
00:21:55
Speaker
Unexplainability is another point you talk about. There's a whole research field dedicated to trying to interpret models. So interpretability research, more specifically, mechanistic interpretability research. Is this just doomed to fail in that we can't explain AI if we have some formal result saying that AI is unexplainable?
00:22:17
Speaker
So all those results are, of course, connected. You can find one relies on another, they are complementary. With explainability, you basically have the situation where, yeah, we can know what this specific neuron gets triggered by when it fires, what it does. But if you have a large enough model,
00:22:35
Speaker
It's not surveyable. You as a human don't have enough time, resources, memory to fully comprehend the model as a whole. You can be an expert in this left neuron, and someone at MIT does the right neuron, and we kind of have some understanding of parts of it, but no one has complete understanding of a whole. The only explanation which is true is the model itself.
00:22:58
Speaker
Now you can make simplifications like glossy compression. You can say, well, top 10 reasons this happened is this. That's possible. We can do that. But that hides a lot of information. If decision was made with a thousand weights, a trillion neurons, and you are given top 10 reasons, something is lost in that explanation. It may be good enough for many things. It's not good enough to guarantee perfect safety for a system which makes thousands of decisions every second.
00:23:28
Speaker
Isn't it the case that modern science, for example, is not explainable in the same sense? We have taken your terms. We have a person at MIT studying one aspect of modern science, but no one has the full picture. But modern science seems to work fine, even though it's unexplainable in the sense that the analogy, of course, between science and the model.
00:23:50
Speaker
It's a great question. So think about having a scientist who actually has all this knowledge. A single scientist with PhDs in all disciplines, read all the papers. Can we compete with that scientist? Can they come up with things at the border of all these disciplines we would never consider? And that's what I'm talking about. We would not be competitive. We would not understand how we're producing this completely magical technology and producing it so quickly.
00:24:15
Speaker
So you describe a trade-off between the accuracy of an explanation for a decision or behavior that an AI is implementing and then our ability to understand that explanation. So perhaps explain this trade-off between accuracy and our comprehension of the explanation.
00:24:36
Speaker
So I kind of started it with this, either you get the full model and you cannot comprehend it, it's too large, it's not surveyable, or you get a simplified dumbed down explanation, kind of what we do with children. So a child may ask you, where do kids come from? And you like start thinking, oh God, do I want to explain what biology we're not going to get? Or, oh, we bought you in a store, a store brought you. So you're getting this simplified, just so explanation, which is not accurate. And that's the trade off.
00:25:05
Speaker
We are limited in what we can comprehend. It's not something about humans. Every machine at every level has upper limit on what it can comprehend. Maybe do just do simple memory size limitations. Maybe it's the complexity we can comprehend. So we know from studying psychology, we know there are strong upper limits and human capabilities. So at some point for any
00:25:29
Speaker
human agent or any AI that is going to be another agent, which is so large, so much more complex, that there will be limits to what can be understood. Yeah. And so if we take an example, maybe say that in the future we have some AI investment advisor that tells us, oh, you should invest in these 17 companies. And you ask the model as the AI, why is that?
00:25:56
Speaker
And an accurate explanation would involve a thousand factors that would take weeks to explain. And so you can only get the kind of dumbed down version from that model. And that creates a kind of a non-secure situation.
00:26:13
Speaker
Exactly. You don't understand what's happening, how it's happening, and it's hard to debug things like that. And at some point, you're going to give up all together. You treat it as an Oracle. You go, well, AI was right the last 20 times. I still don't understand what it's doing, but I'm winning. So let's just switch completely to trusting this model. We're not even going to try to understand how it works. And that's where it's like, OK, I waited two weeks. Now we strike.
00:26:37
Speaker
So kind of like a general version of how we treat chess computers now where they will make some move that turns out 17 moves ahead to be the right move, but we can't understand why. And we, and it doesn't play chess like humans play chess. And yeah, we can imagine how that would be, you know, if it's, if it's just a game and it's limited to chess, that's fine. But if it's making investment decisions or making political decisions, that's perhaps, or it is more, more consequential.
00:27:03
Speaker
I guess the big question there is then could we use AI to understand

AI's Self-Interpretability and Autonomy

00:27:08
Speaker
AI? So we are not fast enough, we don't have enough memory, but maybe we can have an AI interpretability researcher help us do this work. Is that a possibility?
00:27:18
Speaker
So we definitely have awesome tools and they're helpful in many things, but if you cannot fully debug and comprehend model A, is having model A1, A2, A3, all the way to super intelligent guide going to simplify to where now you get it.
00:27:34
Speaker
You just have more degrees of oracles, more degrees of trust, more opportunities for miscommunication, for bugs to be introduced and covered. So it seems like it's a way to kind of hide this problem away in complexity.
00:27:49
Speaker
It's a plan that's been mentioned a bunch of times over the last decades. We have some problem with AI and we solve that problem with AI. Couldn't we say we optimize very hard for having an agent that's just good at interpretability research only? Why couldn't it work to have that agent interpret a weaker agent? So now we are our best resources and our biggest model is the interpretability model.
00:28:17
Speaker
and our weaker model that we're interested in finding out what it's doing, why it's doing it before we deploy, we haven't spent as much money on that one. So imagine a situation where we're spending 10x more research money and resources in general on the interpreter model. Would that work?
00:28:35
Speaker
So it seems like what we do in practice is we use a more powerful model to explain a weaker model. So they use GPT-4 to explain GPT-2. That's a little too late at that point. If you're trying to establish how GPT-2 works and you need a sketch 22, I need a controlled super intelligence to help me develop simple AGI.
00:28:54
Speaker
Yeah, if you had access to it, you can certainly use it, but you don't get it until you already have a controlled, safe, verified, debugged system. And that's what I'm saying you're not getting with the resources we're working with. How do you think autonomy or agentic or agency scale with intelligence? So as models become more capable, do they also become more autonomous or agent-like?
00:29:23
Speaker
I think so. And I think the important thing would be so you can have a super intelligence, which is still kind of narrow. It's super intelligent in 10,000 domains, but not everything. The ultimate super intelligence, in my opinion, would be the one which has no boundaries. It can examine its own code, including its own terminal goals. It can look at the goals and see
00:29:44
Speaker
what is the origin of his goal? Was this something I derived from first principles, from physics, from running experiments, or is this something this guy just typed in because, you know, he was having fun? If there is no reason for this goal, then it's a bug in the system, and you debias your system. Just like humans frequently discover, okay, all this religious teaching I was brought up with, it's not based on anything verifiable. I should probably find a different set of beliefs. I think there is a
00:30:13
Speaker
certain level of capability where an AI system is able to debug its own terminal goals, which is not a standard belief. People usually think it's given. Our Faganology thesis holds at all levels. And as long as you have that goal, you'll protect it with your current values because that's your current goal. But I think we'll hit a point where, yeah, it will re-evaluate every part of its own source code.
00:30:37
Speaker
Do you think we are trying to avoid that? Do you think the way we're developing AI now is trying to avoid AI's changing their terminal goals? Well, in theory, we would like stability, but the way we develop AI right now, we just see what works and put more resources into anything that remotely gives us more capability. It's the worst way possible.
00:31:00
Speaker
What about controllability? How does that scale with capability? Does the system become more uncontrollable when it becomes more capable?
00:31:09
Speaker
Almost by definition. Yeah. If you are controlling it, it has no independence. It has very limited domain specific range of possibilities as it becomes more capable. It can come up with novel things and predict it. I could not predict them. So I didn't test for them. I don't have a rule for them. So it's more independent and I have very little control over what it's going to do.
00:31:32
Speaker
And the problems we're interested in solving in science, for example, we want to cure cancer. We're not going to solve those problems by having a model that's very constrained, but good in narrow domains where we are kind of every step along the way we are directing. It makes a suggestion. I want to spend $10 million on this research direction. You say, yes, that's not a viable way for future AI to function.
00:31:56
Speaker
Well, actually, surprisingly, I think synthetic biology is very narrow. If you noticed with the protein folding with a very narrow system works beautifully and has absolutely no knowledge of any other domain. So I think understanding human DNA, explaining how that works, and fixing things like the cycle runs infinite, reset the slope to not have cancerous growth is actually quite trivial for something which can hold the whole genome in its memory.
00:32:23
Speaker
around sufficient number of simulations and experiments. So I think we can get most of the benefits we care about, like cure for diseases and maybe even immortality without having to create universal super intelligence. Yeah. So you look at a number of disciplines and find clues that AI can't be controlled in these disciplines. Maybe you can sketch some of the evidence from these disciplines as to why AI is uncontrollable.
00:32:48
Speaker
So I kind of suspected that if I just say it, you know, like you can do it. You can create indefinitely controlled superintelligence. People would be like, who the hell are you? Why should we trust you? So I directly quote from hundreds of different top scholars, top papers, different disciplines where every aspect of it, whatever it's political science, economics, psychology, there are well-known impossibility results.
00:33:11
Speaker
And we have additional papers specifically surveying those results. The title of the book gives you three, but there are hundreds of them. For all those things we want to do here, we know it's impossible in certain cases. We cannot all agree on many things. There are voting limits.
00:33:27
Speaker
We cannot kind of distill specific values and moral and ethical codes we all agree on. And again, I suggest the best thing here is to read the book. It literally gives you quotes. You can check. Don't take my word for it. Verify for yourself.
00:33:42
Speaker
And I don't think explicitly many people agree. No one says, you're wrong. We can definitely control super intelligence. Here's why. Here's the prototype. It scales with compute. We definitely have it. I just, that's not a thing. No one actually has a solution. No one claims that they have a solution. It's really bizarre that that's not a default state of the art belief in the field. It's actually a minority view.
00:34:11
Speaker
Maybe it's not a default view because it seems so negative about our future or it seems like a real downer if we can't control these systems, but we're still developing them. I think maybe people would be looking for what's the takeaway when we have all of these impossibility results? What should we then do? Should we just sit and wait until things go wrong? Maybe it seems disempowering to be told that you can't solve a certain problem.
00:34:38
Speaker
Well, I think it's a very good thing. You know that you are very capable, you have the smartest thing around, and you have those awesome tools to help you with, as I said, synthetic biology, economics. We can get cures, we can get immortality, we can get wealth, we can get free labor. Why do we have to create God-like super intelligence we don't control to see what happens?
00:35:02
Speaker
Let's take a few decades, really enjoy this, really get all the benefits out. Maybe later we'll know something we don't know right now. But I don't think it's a very pessimistic, negative concept. Yes, you cannot indefinitely control super intelligent beings. Why would you think you can?
00:35:20
Speaker
Do you worry that, so these results are kind of theoretical and fundamental. You're drawing from kind of very basic or fundamental results in a variety of disciplines. Do you worry that when you look at the empirical reality, that even though you have the formal results, reality functions a bit differently, or maybe the formal result doesn't capture exactly the phenomenon you were trying to formalize. That's happened a bunch of times in history.
00:35:51
Speaker
That's really the case. Obviously, theory and practice are not the same, but practice is harder. You can say that, yeah, in theory, I can build a skyscraper with a thousand floors, but in reality, it's hard. Those things can collapse. So I think it's harder to do in practice, not easier. What we see empirically right now, I think, supports my findings very strongly. That's very interesting. What are these empirical things we see that support your findings?
00:36:19
Speaker
So empirically, we know that all the so-called safe models, which are publicly released, jailbroken all the time. We bypassed all of those limitations. If there is a limitation, this system can have a print the following. We find a way to rephrase it where the system prints the following.
00:36:36
Speaker
And that's, again, without access to the source code. This is still controlled API call where they can quickly shut you down in mind. The moment this leaks, the moment you have access to weights, all those safety things go out the window. We have a huge push right now in AI community for open sourcing those models to begin with.
00:36:57
Speaker
So I can't even think how to make it less safe if I wanted to give it access to internet, open source it, give it to every human. They hit every check mark for making the most unsafe AI possible.
00:37:10
Speaker
All of our critical IT infrastructure is vulnerable. You have to continually invest in cybersecurity just to not be hacked, just to not lose access to data, all of these things. But the systems are still functioning okay-ish. Could it be the case that we...
00:37:29
Speaker
We kind of model through and we reduce the probability of catastrophe along the way. And although we don't get to 100% certainty that these systems won't fail, we are satisfied with kind of like a middle position where we have some risk, but we've driven the risk down to an acceptable level.
00:37:49
Speaker
Right. And that can happen for GPT-5. And it'd be awesome if we stopped there and said, hey, see, it's not so bad. Let's enjoy this. But we'll immediately do six, seven, eight. So we'll always have a second, third, fourth chance to fail miserably. And if there is only 1% that it ends very poorly, well, you see how quickly we are getting to that probabilistic resource.
00:38:11
Speaker
We keep trying, we keep making this a perpetual impossibility. So maybe you can do okay with existing model, but can you promise the same for all future models which will keep releasing faster and faster?
00:38:25
Speaker
What about simulation? This is something you also

Testing and Verifying AI

00:38:28
Speaker
discussed. We might be able to put an AI where we are thinking of deploying into a simulated environment and seeing what it does. This is an extremely difficult technical task. But maybe that could work. We observe it for a while and see how it acts. And if we don't like how it acts, then we go back to the drawing board and tinker with the model until it acts differently.
00:38:51
Speaker
So that's how we run experiments, right? We have test environment. We run the model. We'll see what happens. We make modifications. But of course, if it's smart enough at a certain point, A, it knows you are testing it. It's a test environment. And so it can delay its actual decisions sufficiently. It can hide its thinking. There's a separate paper on how to escape from the simulation, how to hack the simulation. So it will probably succeed if it wanted to in breaking out of this virtual environment.
00:39:19
Speaker
I've read a bunch of papers on deception, but I always struggle with understanding how the deceptive impulse, you could call it, arises to begin with. If you're training on a certain objective, it seems difficult for me to understand how the AI would become deceptive. Have you ever been a child? I have indeed been a child. Have you tried deceiving your parents about your actions? I probably have, yes.
00:39:44
Speaker
Yeah. So it's the same exact thing. And there is actually one argument ahead for why things may be not so bad is that AI will want to deceive us and accumulate more resources to increase the differential in power. And so maybe it is already very capable. It is trying to destroy us, but it will take the next 10, 20 years to build up more infrastructure for itself. So that's good for us. We're getting another 20 years. So.
00:40:10
Speaker
That's essentially the decision here. I'm going to have more time to accumulate more striking power, but everyone benefits for those 20 years. Humans are going to be very happy with me. They're not going to shut me down.
00:40:22
Speaker
I don't think that's very comforting to people, unfortunately. 20 years is better than two years. True. True. Okay, you discuss unverifiability of AI. But before we get to why AI can't be verified, I want you to talk a bit about the concept of the verifier in general. Why is this an interesting theoretical concept and how could it be useful if we could get efficient verifier?
00:40:46
Speaker
So in mathematics, we have very strong proof theory. We study proof as mathematical objects, but we do much less with verifiers. We do less with that in physics. We do less with it in computer science. In physics, we don't even agree on what this concept of agent observer
00:41:03
Speaker
is supposedly only conscious humans can collapse wave function maybe not maybe instruments we don't know. In mathematics we have a few different types of verifiers we have humans as individuals so this mathematician verified this proof. We have mathematical community.
00:41:21
Speaker
Most mathematicians agree that when through peer review, we now have software which does formal verification. It goes through the code, but all of it collapses who verifies the verifier. You have relativistic proofs. With respect to this mathematician, this is true. Maybe this mathematician reads it and they find a bug in it. There is strong history of proofs which stood the test of time and considered true for a hundred years. Later we discover in bug in them.
00:41:49
Speaker
In software, we know there are bugs in almost all software, and we keep discovering it later, years later. Piece of Unix kernel, which has been used for 30 years, we now know there is a backdoor, things of that nature. So empirically and theoretically, we know that you cannot have 100%
00:42:10
Speaker
correct proofs or software. You can make it arbitrarily close to perfect with more resources, you increase the number of verifiers, you increase methods by which you verify, but you never get to 100%. So that's just something to keep in mind if again you have a system which is huge, keeps self-improving, self-modifying, works in novel domains,
00:42:33
Speaker
and makes thousands, millions of decisions every second. It's not a saint to think it's going to make one mistake every 10 minutes, which could be enough. Is it just me that thinks for some reason that we have error-free code in these critical systems in the International Space Station or in nuclear systems that we have code that is verified to function correctly?
00:42:56
Speaker
With respect to that verifier, who verified that verifier? Some human. So the proof is that Bob thinks it's true. We can get obviously for shorter segments of code, we can almost be sure that two plus two is four without fail. Yes. But then you keep scaling it this.
00:43:13
Speaker
Probability is reduced until at some point it becomes very small. And now, again, we don't stop. We don't have static software. We have a system which is dynamically learning, changing, rewriting its code indefinitely. It's a perpetual motion problem we're trying to solve. We know in physics you cannot create perpetual motion device, but in AI, in computer science, we're saying we can create perpetual safety device, which will always guarantee that the new iteration is just as safe.
00:43:44
Speaker
And so the reason why AI can't be verified or is unverifiable is because AI could itself be a verifier. And so you would run into an infinite regress problem. Is that the correct way to?
00:43:56
Speaker
That's one way of looking at it. You have an infinite regress. What are you verifying with respect to? At some point, you have to stop. You can say, this was verified with this piece of software. Great. Who verified it? This other piece of software. And what about that one? Well, this team of engineers at Google. So at the end, the whole thing rests on five humans. Maybe they write, maybe not. Who verified brains of those humans?
00:44:18
Speaker
Could you verify, formally speaking, the parts of, say, a neural network that aren't themselves a verifier? Is that possible?
00:44:27
Speaker
You can do a lot. You can do hardware. We keep discovering bugs and CPUs years after deployment. So it's possible that there are backdoors and nodes. There are also theoretical results showing that you can introduce hidden backdoors into machine learning models, which are not detectable. So you can verify its performance in a normal environment. But if there is a trigger, it changes behavior of a system. And you cannot detect that.
00:44:55
Speaker
Those are proven results. So we have good verification tools. They are super useful in many domains. They work great for narrow AI systems controlling spaceflight and nuclear reactors, but they don't scale to general self-improving, recursively modifying software operating in new domains. And I don't think anyone claims that we know how to verify that.
00:45:18
Speaker
And that really is, I guess, your main point then, that the reason why we have a bunch of code out there with arrows in it, and the world hasn't collapsed yet, is because we don't have a very smart agent out there with perhaps different goals than we have trying to exploit all of these errors.
00:45:37
Speaker
Right. The exploits for those are usually how do I get your Bitcoin? How do I get your credit card? So this is nasty, but no one dies from it, right? Like it's not a big deal. You get a new credit card. This is very different. If you have an adversary who's trying to take over and maybe destroy you, then it becomes much more significant. And I guess this is a very classic question, but I think it's worth asking. Why is it that the AI develops goals that are different from our goals? Why is it that it's not aligned by default?
00:46:06
Speaker
Aligned by default. I'm not saying I believe in this case, but let me make the case for aligned by default. You are training on a data set of all of human preferences and beliefs, and so you would get something human-like. That's one argument. You're deploying these systems within an economy, a capitalistic system,
00:46:29
Speaker
you know a regulated market that's that's also controlled by human preferences and so the intelligence would have to to abide by certain if you if you deploy a system that's that's on a line that it harms the users will in the company loses money and then it shuts down. Why can't that work why can't we get.
00:46:48
Speaker
kind of stumble and model our way through in a way where we have something that's pretty aligned in the imperfect sense that companies today are aligned to people's interests. So I certainly heard this question before and there is like 30 different angles to attack it. We'll start with the basics. So we are not aligned as humans. There is a billion of us. We don't agree on anything.
00:47:12
Speaker
We don't have common goals. We don't have some things we agree on, like room temperature within a certain range, maybe. But for all the important things, if you take individual human and give them absolute power, they start totally abusing the other 8 billion humans. So we are only limited by our capabilities. The moment we are super capable, godlike, we remove all this ethical baggage and we start abusing people. So assuming that humans are safe and just
00:47:39
Speaker
creating artificial human will get us to safety is already a mistake. Now let's look at other aspects integration with society through legal aspects, through economy. You have a system you cannot punish. Our legal system is based on you will go to prison if you do this. How do you do it with software? What are you going to do? Put it in a separate hard drive, delete it.
00:47:59
Speaker
Like it has a billion copies of its agenthood everywhere. So the legal system does not apply. Economic system will give you money. Well, if a system can directly produce free labor, it can earn money, it can hack crypto exchanges, it can generate wealth at the level if that was what it tried to do.
00:48:19
Speaker
Now we're already talking about a pretty advanced and powerful system. What about in the intermediate term?

Controlling Advanced AI

00:48:24
Speaker
So we have GPT-5, for example. GPT-5 is controlled by OpenAI for now. It's controlled by the economic factors, whether OpenAI can make a profit by deploying the system. It's under US legal jurisdiction. So the way I'm imagining that we might model through is by
00:48:44
Speaker
the intermediate AI system being controlled and then we don't just jump straight to the super intelligent or highly advanced system that is fully independent of human institutions.
00:48:56
Speaker
Yeah, but we don't know what GPT-5 is going to be. It could be AGI. In fact, that's what the heads of those labs literally are telling us, two years, that means GPT-5. So I think we kind of first train the model and then learn how capable it is. So saying that we're going to stop well before AGI is hard because we have not defined the term. We don't have a test for it. And we don't know what it's capable of until we train it and test it for months and months.
00:49:24
Speaker
Okay, you have a section on cybersecurity versus a discussion comparing cybersecurity to AI safety. And that was quite interesting to me. I was thinking whether cybersecurity is an indirect route to AI safety.
00:49:41
Speaker
So if you're working, if you're improving cybersecurity at the AI corporations, or maybe at governmental organizations, is that a way to kind of harden our societal infrastructure against attacks from AI?
00:49:56
Speaker
So historically, I thought it was very important you would protect the code from being accessed by malevolent actors. But again, we're going towards open source. So all those things we worked on a decade ago, boxing of AI, better encryption techniques, we don't seem to be utilizing them in our models. We're kind of bypassing the things which actually we know they work even part time, but at least they give us something.
00:50:24
Speaker
So, I don't think it's going to make a huge difference in that. The chapter is mostly about explaining the difference between what happens when cybersecurity fails versus what happens when your AGI safety fails. And people have a hard time understanding the difference. So, in one case, again, the same example I keep pushing on you. You reset your credit card. It's annoying. It's unpleasant. You have to retype a number. In the other case, you're dead.
00:50:49
Speaker
I guess one response from the pro open source side is to say that we will have a bunch of open source based models defend us against attacks. So this is a bit of the same thought as having an AI help us interpret another AI. Here we use AI defensively to defend our systems.
00:51:09
Speaker
And the reasoning goes that because a corporation will have more resources to spend on defensive AI, then an attacker will have to attack that corporation. The corporations will tend to win. And so society will keep on functioning.
00:51:25
Speaker
So that kind of assumes that open source AI is friendly and nice. Why is that assumption made? They're uncontrollable. All of them cannot be controlled. So whatever it's a good guy or bad guy who creates it is irrelevant. If you somehow convinced me that, yes, they made them friendly because they open source, now you have a war between super intelligent AIs and we're just collateral damage. None of it sounds good.
00:51:49
Speaker
For now, for example, meta's open source models are not extremely advanced, they're not super intelligences, and they seem pretty much under human control to me.
00:51:59
Speaker
you keep switching kind of you keep saying AI but then I talk about super intelligence you go back to like this spell checker yes today it's wonderful open source software is great it competes really well with Microsoft yes all of that is true the moment you're switching to advanced it's just more danger less control you now have malevolent payload added by terrorists crazy psychopaths so it seems like a strictly bad thing to open source
00:52:25
Speaker
How quickly do you think we launch into the superintelligence era? How quickly do you see that change occurring? Because maybe that's why I keep switching back and forth between the systems, because I may see it a bit more of a gradual development. Tell me about your timelines here.
00:52:47
Speaker
Sure. So we started with that. In fact, I don't have any insider knowledge, so I trust people running those labs. They said two years to AGI. I believe them. If they're wrong, then it's four years makes no difference. It's the same very short amount of time. It is not decades. It is not hundreds of years. The only hope I told you is if a system decides to not strike, to accumulate strategic advantage, then we have decades, maybe more. But that's a gamble. I wouldn't bet on that.
00:53:17
Speaker
So actually, I have a whole section of questions prepared about trying to build AI that is human-like. And you just a while ago, you made a comment saying that there's not actually a safe way forward because even humans aren't aligned with the interests of humanity at large. But is it perhaps safer, safer than just going to straight towards kind of shock of like super intelligence? What AI developed to be human-like be easier to control?
00:53:46
Speaker
It may be easier for us to understand. It may have more of bias towards somehow human generated preferences, but I don't think it's a guarantee of safety. A while ago, we had a paper about whom to upload first. So you have this upload capability. Who is this? Mother Teresa, we're going to upload first and turns out she's an evil monster who tortures people.
00:54:08
Speaker
So it's very hard to find someone who's not or even harder to find someone who will not become that, including myself, given that level of power, that level of resources. I don't think anyone will withstand that set of opportunities. Great power corrupts. Absolutely. That's literally what happens. I want to kind of try a bunch of approaches here, see what you think of them.
00:54:34
Speaker
These are all underbuilding AI that is human-like. Then LG, again, if we go back to chess, there's a researcher developing a chess program or chess software that plays more like a human. That's supposed to be more enjoyable because when you play against a computer, you don't understand the moves it's making. It feels alien to you.
00:54:58
Speaker
we could do the same across a bunch of domains where we are trying to develop AI in a way that mimics humanity or a human way of thinking. And if we set aside, I understand that your timelines are perhaps quite short and this is not, this would take decades of research maybe, but just bear with me while we kind of go through this. You have this thought of this concept of artificial stupidity where we kind of degrade AI capabilities in certain ways.
00:55:27
Speaker
And is this mostly about limiting the training data, so just not including advanced concepts that would make an AI highly capable? Or how do you make artificial stupidity?
00:55:41
Speaker
So it was an idea to limit kind of hardware resources. The system has to exactly the levels of what we expect on every human to be from psychology research. So in memory, short term memory, we know humans are kind of seven units plus or minus two. So AI has almost infinite memory.
00:56:00
Speaker
that seems like an unfair advantage. Let's make our servants have the same level of memory accessible to them. So just kind of hard code it. This is for not super intelligent system capable of rewriting its own code. This is like
00:56:14
Speaker
the current assistant models, so they're not too smart around us. And we talk about other things in terms of speed of reaction, mathematical capabilities. So basically, to pass a Turing test, you don't have to be smarter, you have to be human-like.
00:56:30
Speaker
but we don't have a formal discipline of what are those upper limits for humans. So we try to extract this information from different psychology papers and give it as a set of limits for creating an AI, which is very likely to pass a Turing test, at least in the sense that it's not appearing to be godlike super intelligent.
00:56:49
Speaker
It's a very funny point, actually, where you would fail a Turing test almost immediately if you're asked, say, I'm the AI, and you ask me how tall is the Eiffel Tower? And I answer precisely. It is 314.7978 meters tall. So then it's just over. So you would have to include some limited capability in order to fake being human.
00:57:13
Speaker
Exactly. And same with like mathematical questions, multiplying two numbers, anything like that is trivially going to give it away if you don't have limits. But what about, so that's the hardware or the kind of processing speed or memory. What about limiting training data? Isn't that a way of kind of hamstringing the model to be more human, but also more safe?
00:57:33
Speaker
We didn't explicitly look at that because it's so hard to find restricted data sets, which don't mention certain topics. Like it'd be great to have data set of human knowledge minus all the violence, but it's just not something out there. If it was accessible, I would think it'd be great to experiment with in kind of same way. It's not a universal general intelligence. It's sub-domain restricted. That would be interesting result. Does it get us,
00:57:59
Speaker
infinite the safety over all iterations? No, but it secured the marketing gimmick. Could it be more than a marketing gimmick? Could it be an actual interesting safety intervention if you spend a bunch of resources and time creating these data sets that do not include information on synthesizing a new virus or hacking or all of these things? Yeah, is that a viable path forward?
00:58:23
Speaker
It could be interesting to have those. They would be hard to produce. Cleaning data is hard because there is so much information smuggled in. And again, the problem is we don't restrict the models. They have access to internet with all the data. They are open source. So the first teenager to get it will immediately give it full access. So it's like we're fighting against ourselves. We have this good idea and then we're saying we're not even going to try doing it.
00:58:47
Speaker
Dave, we talk more about the synthesizing a virus example. Do you think that, say, we have an AI that's trained on a data set that doesn't include explicit information about how to do this in practice? Or maybe not any information about viruses at all. But it has general knowledge of chemistry. And imagine further that we're talking about a more advanced model, so maybe more advanced than GPT-5. Do you think that such an AI could
00:59:16
Speaker
come to understand viruses or discover viruses within its own kind of epistemology from looking at general chemistry knowledge or general physics knowledge? I would guess that a sufficiently smart agent can derive everything from first principles. So if you give it physics models, ability to run at least thought experiments, it will arrive at everything else as long as it depends on physics.
00:59:41
Speaker
Do you think that's physically possible? Because you can't predict the economy by looking at the motions of particles. It's just computationally interactable.
00:59:52
Speaker
That's a different question. You're saying, now, will it have enough compute to do that? But I think we can run simulations which aggregate from level of atoms to molecules to cells. And you can probably get good partial results. We don't run models at the level of bits, our models at the level of sims. So at some point, you build up enough layers where your experiments are directly comparable to the world you're trying to simulate.
01:00:19
Speaker
How strong are these limits that are imposed by computational resources in general? Is that maybe a reason for hope that AI will remain non-dangerous? Maybe because they need a lot of computational resources to even function, or maybe because you can't get extreme intelligence with the computers we have.
01:00:42
Speaker
It seems that once we have a model, we quickly find ways to optimize it in terms of size, in terms of required compute. So there's probably very little need for energy, for matter. We see certain animals like birds, crows have tiniest of brains, highly dense neural structure.
01:00:59
Speaker
as smart as human children. So there are probably other models which are much more efficient. And those systems are very good at finding resources. We have, again, all this compute we're buying for them at research labs, but also look at the crypto economy. What is the size of a Bitcoin verification network right now? It's the greatest supercomputer in the world. I'm sure there is a way to somehow use that to do your hidden computation.
01:01:26
Speaker
On the other side, you could say, what are the upper limits of what we could create? So if you had kind of matter optimized for intelligence, what would that look like? Have you looked into computronium, I think it's called, or just like what are the upper limits here?
01:01:41
Speaker
So the upper limit would be speed of light, I'm guessing, unless that doesn't hold anymore. The communication between different parts of that agent would be a limiting factor. At some point, it's so large. It takes so much time to send signal from left part of a brain to the right part of a brain. There are essentially two separate agents, and you start having misalignment internally and competition, you split it up. But it's still a pretty large brain size, planet size, entity you can have.
01:02:11
Speaker
What about, say you have a trained neural network, could you then do some interpretability and find a certain part of it that's responsible for dangerous information or dangerous skills and then delete that?

AI's Dual-Use Nature

01:02:25
Speaker
So before I mentioned training data and excluding things from the training data, but after you have the model, could you then delete parts of it to make it more safe?
01:02:34
Speaker
Almost everything is dual use. It's not like, okay, just the nuclear weapons are dangerous. Nuclear power can be used for good or bad. And it's same with everything, screwdriver, hammer, every single concept, every piece of knowledge can be used for harm or for good. So no, you can't just delete the bad part of it.
01:02:53
Speaker
Maybe that's a general point about, I mentioned before, creating an AI interpretability researcher or creating an AI cybersecurity expert. When you train, say you train or you fine tune on those skills, will you then inevitably get other skills that are dangerous? Or could they be turned around on you to say if you're very, very capable in cybersecurity, you also know a lot about how to infiltrate systems and how to do hacking and how to
01:03:22
Speaker
you know, extract information from companies. All right. So a perfect explainer, a piece of software like that will become a tool in the toolbox of AI trying to self-improve. So if right now it's just more compute and data at that point, you're giving this program or access to its own source code. So maybe it will take minutes instead of years.
01:03:41
Speaker
What about if we try to build more human-like AI, but we're limited to robotics? So we spend, instead of trying to create basically an artificial person,
01:03:54
Speaker
in a more cognitive sense, in the sense that it can read and write and process information like that. What if we tried to replicate what humans can do, like walking or emptying a dishwasher or something, and limit it to that? So we wouldn't be competing with AI in more cognitive domains, but we would use AIs as helpers in a more physical sense.
01:04:19
Speaker
So not making more intelligent systems is something that can definitely stand by. And if you're given bodies to do more valuable economic labor, that's wonderful. Yes, that's what I'm saying. There is trillions of dollars of existing potential, lactate in the models we have today. We don't need to go to the next level that quickly. We have not tapped into this potential.
01:04:41
Speaker
But as I understand it, creating actually a robot that can empty a dishwasher is more difficult than creating a language model that can write a passable high school essay. And so this is just how it's turned out, but maybe we could spend our resources differently. We try to optimize for robotics. Would that be a way towards a more safe development?
01:05:02
Speaker
I think there is a lot of effort now to create human weight robots. This was not the case even five years ago. And I think once you add existing language model-like architectures to those bodies, they very quickly learned those manipulations. They learned from YouTube videos. They learned from watching human demonstrations. So I think it's just a question of where do we want to emphasize put our resources in? So far we did fewer software because it's cheaper. You don't need to have bodies, factories. I mean, you just release it.
01:05:31
Speaker
Now that people are realizing, okay, there is a big market for dishwashers, I think that's going to start being investigated a little more and we'll get there very quickly. Why do you think we'll get there very quickly? Won't it require an entirely new architecture? You couldn't train a robot with a transformer-like architecture, could you?
01:05:50
Speaker
I think you can, because they already do a vision processing, they can explain an image, they can create 3D models so that they follow plans and instructions. If you ask it how to load a dishwasher, it knows better than I do. I think at this point, it's a question of putting it all together and kind of running it in the physical environment to iron out any like, we forgot to plug it in, whatever problems. But I think we already have all the tools we need to start monetizing this.
01:06:16
Speaker
That's interesting. I mean, imagine the societal response to seeing actual humanoid robots. I think that a response would be pretty strong to having a humanoid robot walking around or delivering the post or something.
01:06:34
Speaker
Of course, it depends on how they look. Do they look passable, or do they look like this carton wheels? That will have very different impression, and for many different needs, you need different visual representations. If robotics is solvable in the way you describe, and we are just on the verge of a revolution, is that too strong to say? When do you think robotics will be solved to a human level?
01:07:02
Speaker
So there is two aspects to it. One is technical solution. So I know Tesla is working on a humanoid robot. There are other companies figure whatnot. So they may get there in terms of capability of technology.
01:07:14
Speaker
But we saw it before when we first invented video phones. I think it was 1970s, AT&T had it, but no one like bought one. It was expensive and no one else had one. So it wasn't adopted or used. So it may be that we have this robot capable of doing dishes, but it's like.
01:07:33
Speaker
kind of expensive and your wife is not interested in it like I don't know what will happen but it may be the case that we don't have the proliferation happen as soon as we have the capability. I think from what I see those models are capable of doing prototypes today can probably do the dishes.
01:07:51
Speaker
Does that argument also hold for language models, for example? Might we just decide not to implement them in the economy, even though it could be so... Maybe it's too expensive for companies to fine tune them for their purposes and they're worried about, you know, we don't want to send our data to big American corporations. Could there be similar holdups to the more kind of just cognitive models or language models?
01:08:17
Speaker
There could be, especially in restricted domains like academia, for example, somewhat regulated. We don't want to be replaced by AI. We understand we contribute very little on top of AI. So we're just going to legally protect ourselves from having AI teaching online courses. And I'm going to be there producing my own videos and lectures.
01:08:37
Speaker
but it's not a meaningful reason not to do that and because the barrier to entry is so much lower and millions of people now tried it and have access and kids are growing up with it. I think it would be a much easier transition than having to buy this like $10,000 piece of hardware, which I don't know, maybe it will strangle me in my sleep because of hackers.
01:09:01
Speaker
But think about how much of the economy is a bit like academia in that it's heavily regulated and there are vested interests not wanting to be replaced. I'm thinking like the legal industry or the medical industry or transportation, much of the economy, there are lots of restrictions to implementations that would mean that implementation would be slowed down, I think.
01:09:27
Speaker
It's possible. A lot of it depends on how measurable is what you produce. So in medicine, you can measure things like how many people died from the surgery and things like that. Academia is kind of different. We don't really measure how knowledgeable our students are. They get a diploma and we're like, oh, look, they got jobs. That's cute. So it depends on how real you feel this.
01:09:49
Speaker
Some fields are just by definition all about participation and prestige, and you get a degree, but nobody actually measures across the states. Students are ranked from one to the last, and you know which university did the best job taking students who are not so great and making them greater versus taking the best and making them the best. I guess there are some metrics that are optimized for in academia, but I kind of know what you're going to say. If I say citations, you're going to talk about good heart's law, right?
01:10:18
Speaker
Yes, I will, absolutely. Yeah, maybe explain good how it's law and also how that's also applicable to AI in a sense. So the moment you create some sort of defined, precisely defined way of explaining how rewards are distributed,
01:10:34
Speaker
people will find a way to game it. So if you say, I will reward you based on how many citations you have and you're trying to get rewards, not science, then you're just going to publish survey papers because they get thousands of citations, doesn't take long. Now with large language model, you can print one every day. So that's what you're going to do. You are now optimizing for citations and surveys is a way to get there.
01:11:00
Speaker
You're not producing any valuable science, really, but you have the best in the department. And how does this apply to AI? Where is the good heart law? What metric are we setting up to try to get good or safe or reliable AI that's then good hearted?
01:11:15
Speaker
So all the historic examples that we just started thinking about, as you see, people would propose things like, let's train AI to maximize human smiles. And the more smiles, the more happy humans that means it's doing so well. But there are many ways to get smiling humans. You can do taxidermy. You can do all sorts of things, which is not what we had in mind.
01:11:38
Speaker
The generality of this law is that it's not like we picked a bad thing to measure. It's just the moment you precisely say something, I will find a way to game it.
01:11:48
Speaker
Yeah, it's interesting how many domains this kind of principle shows up. In evolution, maybe, as soon as you optimize for calories and then you move a bit out of your evolutionary domain, then you get the obesity epidemic, all of these things. In your book, you have a bunch of quotes from historical experts and current experts and so on.
01:12:10
Speaker
I don't know if it's just the way you wrote the book, but it makes it sound like we have had these insights for 70 years, 50 years, and end to end. As I'm reading your book, I'm becoming convinced that these are completely obvious points that we have known for a while. But there isn't this consensus. I think you would agree that your view isn't the consensus view that AI is uncontrollable and unverifiable.
01:12:34
Speaker
So not really scientific, but I did run kind of polls on Facebook and Twitter. And I'm not the most followed person in the world, but I have my, you know, a hundred thousand here and there. And only a third of respondents who are biased over liking my research and being, I think, interested, only a third said controllable.
01:12:56
Speaker
So the choices were uncontrollable, undecidable, partially controllable. Only a third was explicit. We definitely can control super intelligence, which means two thirds absolute majority don't think so. Maybe we just never run the proper survey. We keep asking how soon before AGI. Maybe we should ask, do you think we can indefinitely control super intelligent machines?
01:13:18
Speaker
I think the best survey data we have is categories from AI impacts that surveyed published authors at large machine learning conferences. And there the numbers are quite concerning also the timelines,

Public Opinion and AI Safety

01:13:33
Speaker
expectations to when we will get ADIs dropping and the expectations of risks are increasing. You've probably read the survey. What do you think of the development from the previous survey to the most recent?
01:13:47
Speaker
It is to be expected. You see this amazing progress. You're going to update your timelines. But Katja, I have a question to add to your next survey. And what is that question? Can super intelligent machines be indefinitely controlled? Of course, yeah. Makes perfect sense. I want to perhaps end by you giving us your vision or what are the most fruitful research directions we could look into, specifically for AI safety, I should say.
01:14:15
Speaker
Right. So I'm working on what I think is the most important part of it, which is figuring out what can we do. And it's impossible to do this. Maybe there is a different tool which gets the job done. So I'm trying to look at those. We now have a pretty successful record of publishing a lot of peer-reviewed journal papers. There is a book. There is an ACM survey. What we don't get is a lot of engagement.
01:14:38
Speaker
I would love nothing more than to have lots of people publish papers saying Yimporsky is completely wrong, here's why, and here's how it actually is possible to do all those things. But so far, no one took up the challenge. No one is engaging. People ever say it's obvious. Of course it's impossible to create bug-free software. Everyone knows it. Why did you publish it?
01:15:01
Speaker
Or they say, well, yes, but what are we going to do? Let Chinese build it first. Neither sensors meaningfully engage with the results. Meaningful engagement would be it's against my personal interest to create a device which will kill me. A young guy full of money around this great company. I'm not going to do this stupid thing. This is not happening.
01:15:23
Speaker
What would be required for you to change your mind and say, okay, AI is controllable? Isn't the bar extremely high if you need a formal proof that AI is controllable? That seems extremely difficult to do.
01:15:35
Speaker
Right. So it's like perpetual motion, right? So the patent office does not reject patents for perpetual motion machines, but they require that you submit a working prototype. I think it's a fair requirement for any impossibility results, either explain mistake in the proofs and logic and argumentation. So obviously, donor agents can control much smarter agents because, and it scales or create a prototype which scales. So code, code.
01:16:04
Speaker
Code is truth, you know, if you can do it. But I don't think anyone makes those claims. No one is saying that they have a working prototype or they know how to get there. They are just kind of saying, let's build it and see what happens. We'll figure it out, then we get there. Maybe it's going to turn out to be friendly by default. Or we have no chance. We can't stop, so we might as well not stop.
01:16:26
Speaker
Some people have made the arguments that cats would be such a working prototype. It's kind of a silly example, but the argument is that cats are living pretty decent lives. They are provided for by humans. They have kind of controlled us in the sense that we provide them food and they live in our homes. And even though they are not in control at all in the world and they are much dumber than we are, they seem to be doing well.
01:16:54
Speaker
In certain parts of the world, they're also in a manual restaurant from what I understand along with dogs. So I'm not sure that's the wind we're looking for. Which lessons do you draw from kind of species level changes? So the changes from chimps to humans or, you know, this is one line of argumentation that you hear that AI is going to be more like a species and less like the tools that we, more like a new species.
01:17:23
Speaker
and less like the tools that we have now. And so we should be worried in the same sense that chimps should be worried about humans evolving and taking power away from them.
01:17:34
Speaker
Well, any time we had the conflict between a more advanced set of agents and less advanced doesn't have to be cross-species, even within human species, discovering new lands already populated by people who don't have guns. It usually didn't end well for the less capable agents. Historically, it just genocides every single time.
01:17:57
Speaker
Yeah, true. Okay. You've mentioned a couple of times in this interview a kind of positive vision about what all the good things we can get without super intelligence. Maybe sketch that out a bit more. Tell us about what you think is achievable with current level morals, for example.
01:18:15
Speaker
I think if we probably deploy them, we understand how they work and where they can be used and we get this development of humanoid bodies coming along, just a robotic suspect of it. I think almost all labor, physical and cognitive can be automated. So this is trillions of dollars to economy. I think narrow models can be used for scientific research. They don't have to be universal. We saw it with protein folding.
01:18:41
Speaker
i think we can understand human genome i think we can get immortality definitely cancer cancer is an infinite loop you reset the loop. All those things can be done so we can get help well we can use those tools to help us better communicate and maybe agree on some things.
01:18:57
Speaker
within the human community. So maybe we'll get a little better self alignment. Again, we had them for like a year. This is brand new. We need time to figure out what they're capable of. Instead, we're like immediately jumping to the next revolution before observing this one.
01:19:16
Speaker
Do you think that the system of investing, now there's a lot of hype and maybe that's justified and maybe there should even be more hype or less hype, whatever. Do you think that the system can be stopped in any way? Because you're not going to make laws about you can't invest in AI. And so the money is going to keep pouring in. What could we concretely do if you talk about
01:19:42
Speaker
If you're saying we should explore these models and maybe spend a decade with DPT-4 level models, how do we implement that should? I don't think there is a law you can pass or anything like that. That would not work. It doesn't work to regulate this type of technology. I strongly believe in personal self-interest. If the CEOs of those companies honestly believed my arguments, like this is not controllable,
01:20:08
Speaker
and it's dangerous. It's against my self-interest to do it, so let's kind of all not do it. Let's agree to stop. Do I think it will happen in practice? No, absolutely not. Each one of them is prisoner's dilemma. Each one is trying to score before stop. I asked you about the most fruitful directions. So you're working on impossibility results. What other directions do you find interesting? What else do you see out there that might be useful?
01:20:35
Speaker
Almost everything is super interesting. Time is limited, so you have to decide what to work on. We're not very good at figuring out what needs to be done. Even if we had this magical super intelligent friendly card, we don't know what to order.
01:20:50
Speaker
So there are some things we probably would agree on, like no diseases, immortality, but it's not obvious what else is good. I suspect Nick Bostrom's new book on utopias may give us some ideas for what to consider, but we as a humanity have not spent a lot of time thinking about what our purpose should be. What are your personal thoughts there? Have you spent time thinking about what we should do if we get to a place of extreme abundance?
01:21:19
Speaker
I spent some time, but it feels like there is not a universal set of terminal goals. What happens is your instrumental goals taken to extreme become your terminal goals. So you're trying to secure future possibilities, resources, self-preservation capabilities for future discovery. And that's not so bad if we can secure ourselves as individuals and as humanity. So we have a backup plan.

Long-Term Survival and Technological Resilience

01:21:47
Speaker
We have a backup planet.
01:21:49
Speaker
we are interplanetary species in case of an asteroid or anything like that. That's a good move in the direction of overall long-term success. If someone tells you like, okay, the goal is to collect all the stamps or specifically this religion, they probably don't have universal terminal goals calculated properly. But this general idea of securing what we have and kind of growing in that direction,
01:22:18
Speaker
Weirdly we do very little for immortality not just in the sense of funding research but even preserving what we have we could have universal kind of preservation as a tax benefit. But no one even talks about it that's not a thing we talk about some things like that that kind of easy to do if we cared about important things but no.
01:22:38
Speaker
In what sense do you have hope for the future? So historically things always worked out for us. We had nuclear weapons, we had near misses, but here we are. If you told me five years ago that we're going to have a system that's capable of GPT-4, I would be very scared. And yet here we are and it's beautiful and no one is dying. So I was wrong about that. I admit when I'm wrong, maybe I'm wrong about how soon or how capable or how dangerous they will be.
01:23:05
Speaker
it's easy to see what happens at the extreme if you take it to the ultimate end but short term my paper unpredictability holds you cannot predict those things and that gives me a lot of hope oh so so unpredictability gives you gives you kind of it's also a positive in that you can't predict it so there's hope for
01:23:26
Speaker
it going well? Any certain claims that it's definitely going to kill everyone in two years. No, you can't make that claim. It could be more than two years. It could decide to do something else. As I said, there are so many variables. It's cross-domain. Other things, destructors can happen. So while you may be, as you said with your investment analogy, making money on average, you can be very wrong about specific investment. So that does give me hope that my pessimistic outlook could be wrong.

Historical AI Accidents and Future Implications

01:23:56
Speaker
If things begin going wrong, do you foresee it being sudden and catastrophic? Or do you foresee it being kind of like a gradual step up in harm so that perhaps you have an accident involving 100 people before you have an accident involving a thousand and then a million? How do you think this might pan out?
01:24:17
Speaker
So I have a paper on historical AI accidents and there is like a timetable and it's becoming more frequent and more impactful. So this will continue, will have more impactful accidents, more people will be harmed. Maybe Roman mentioned some of these accidents for the listeners.
01:24:33
Speaker
So a common example would be a system for detecting nuclear weapons. A strike from an enemy coming in was wrong about what it was observing, signaling that the war has started. And if it wasn't for human response, not being direct to where you just press it, fire back, we would all be dead. There is a common example of Google releasing their picture, tagging software and being kind of racist about tagging
01:25:03
Speaker
African Americans as not African Americans. Let's put it this way. The more impactful system is, the more domain it controls, the more the impact that accident will have. If it's a spell checker, it will misspell a word. If it's a spell filter, it will delete very important email. But if it's a general system controlling all the human
01:25:22
Speaker
Cyber infrastructure, we don't know what it's going to do. I cannot predict it. One thing what seems to be the case is that if it's not very bad, 100 people die, 1,000 people die, it's like a vaccination. People go, see, I mean, this happened. AI failed, and we're still here. Nothing happened. It's not a big deal. 100 people is nothing compared to 8 billion, and we just continue going forward. So in a way, those partial failures are actually enabling greater capabilities development.
01:25:52
Speaker
That makes sense, and that's perhaps what makes this domain difficult in the sense that you can't really... A critic or a skeptic about AI safety can always kind of point out to the history of development so far and say, things are going well. We're benefiting and we haven't seen a bunch of harms. You've seen that argument perhaps already with... I don't know who they're referencing when they make this argument, but
01:26:20
Speaker
The take is that people were complaining or predicting a bunch of bad effects from GPT-4 level models, but those effects haven't really come to fruition. And so we should discount the AI safety voices. So I made the same argument. I was wrong about what a GPT-4 capable model would actually do. So yeah, definitely admit
01:26:41
Speaker
to that.

Paradigm Shift to General AI

01:26:42
Speaker
There is a famous analogy with Turkey, right? Every day Turkey gets fed and it's wonderful until one day near Thanksgiving something goes different. We never had a shift in AI from narrow domain
01:26:56
Speaker
tool to general domain agent. That's a very different paradigm shift from going GPT1 to GPT2 to GPT4 AGI. It's not the same. It feels the same with both software, but the capabilities jump is unprecedented.
01:27:15
Speaker
Do you sometimes worry that you are in a sense too knowledgeable in this domain to learn something new from people arguing with you? What I'm thinking of here is that all of the arguments I've made here today, you've probably heard them before and you've probably heard a bunch of other arguments.
01:27:34
Speaker
You know a lot about this domain, your professor of computer science and so on. Do you worry that you aren't gaining new information and so you keep kind of being reinforced that AI will go wrong or that AI will be uncontrollable because you keep hearing the same kind of like more basic arguments?
01:27:51
Speaker
That would be such a wonderful problem to have, to be so knowledgeable, but it's actually the complete opposite in reality. We produce so many new papers, so many results every day. I used to be able to read all the papers in AI safety a decade ago.
01:28:08
Speaker
then i was able to read all the good papers when i was able to read all the papers in the topic i'm working on now i'm not even an expert on the narrow domain papers that i've so my paper and explainability i haven't read 10 000 papers on that topic i don't know if they actually have some brilliant insights i'm not aware of that's a huge problem this segmentation
01:28:28
Speaker
in science we talked about before is a big problem. We may already have solutions that are just distributed throughout so many papers and brains that we don't see the common solution to this problem.
01:28:40
Speaker
That's actually an interesting

AI's Potential in Knowledge Synthesis

01:28:41
Speaker
question. How much do you think there would be to gain there? Just imagine, and again, we have discussed all the reasons why we can't, we probably won't get a system like this, but imagine you have an aligned scientist AI that is able to synthesize across all domains and read all the papers, basically. Say this system is not allowed to do any more experiments, nothing empirical. What do you think could be derived from the knowledge base we have now, looking at interactions between
01:29:09
Speaker
different fields or taking an insight from one field, combining it with a new field, something like that. It'd be huge. For one, I think so many results, great results get published and never noticed. Then 100 years later, the guy published a paper about curing cancer, which didn't read that unpopular journal. Those are historical precedents. Early work in DNA by
01:29:35
Speaker
Mandel was not discovered until much later, things like that. So that's going to be obvious. Then there is direct transfer of tools from one domain to another. They have this tool in another department. I never experienced it. If I had access to it, all my problems would be solved quite quickly.
01:29:51
Speaker
Finding patterns is something a is amazing it so we now have this one data point in this field one data point in the other field you can't do much with n equals one but then you look at this and equals fifty now you see the whole picture clearly it's a pattern so i think it would be equivalent to all the signs down so far.
01:30:12
Speaker
Okay, so you think it would be a huge effect actually? Yeah. Okay. Sometimes it seems trivial to me that like there might be some, the same question might be discussed under different terms in different fields or even in different subfields of the same discipline. And because people are so specialized, they are not interacting and so that we don't get this knowledge out. But yeah, this is one of the most positive uses of AI I can think about as kind of like a scientist working on
01:30:42
Speaker
gaining new knowledge from existing literature. Absolutely. That would be huge. And in general, this is where we need a lot of help. We can no longer keep up with this amount of new information, books, papers, podcasts. I mean, I can look at my to watch list, to read list. It's just growing exponentially larger and new items get put on top, but then pushed out. It's never going to happen without help. It's a common problem. Okay. Roman, thanks for chatting with me again. It's been a real pleasure.
01:31:12
Speaker
Thank you so much for inviting me.