Introduction to Mechanistic Interpretability
00:00:00
Speaker
Neil, welcome to the podcast. Glad to have you on. Hey, I'm Neil. I do mechanistic interpretability research, which means I take neural networks, I try to open them up and stare at the internals, and try to reverse engineer out what algorithms they've learned, and attempting to take us from the world where we know that a network is good at a task to knowing how the network is good at a task. This is really weird and hard, but also kind of fascinating.
The Lightning Round
00:00:30
Speaker
All right. Lightning round. Yes, I want to give you an impossible task, which is to answer complex questions in a short amount of time. Actually answer questions that have been unanswered by humanity for a long time in, you know, three minutes. Are you up for that challenge? I strive to disappoint.
Mathematics: Usefulness and Critique
00:00:55
Speaker
So my first question is, because you have a background in pure math, so how useful do you think advanced mathematics is? How useful is it that we have had this highly advanced mathematics that to most people at least seem to not apply to the world? So my short answer is not very useful.
00:01:19
Speaker
My suddenly longer answer is I personally have benefited a lot from doing a hard pure maths degree and spending a great deal of time learning mostly useless, very, very hard things that involve thinking and abstractions. And I think this has taught me useful skills, even if I've forgotten most of the knowledge. I also think that
00:01:44
Speaker
When I look around me, or around the Cambridge Maths department, and see some of the smartest people I know who then go on to do PhDs, of which maybe five people in the world care about the stuff they're researching, I can't help but feel sad at the enormous waste of human capital. I think that sometimes I'm wrong. Number theory is the classic example of how could this possibly be useful? By the way, it underlies the modern internet. But...
00:02:12
Speaker
Modern maths is just so ridiculously niche. I would just be so surprised if the typical area in it was remotely useful, even if there are occasional lucky gems.
AI and the Future of Mathematics
00:02:24
Speaker
Do you think that mathematics will be advanced by AI in the future? And so perhaps if we get very advanced AI, we don't
00:02:33
Speaker
We don't need to do the math ourselves. We can rely on an AI to do it for us. And therefore, perhaps these geniuses in the Cambridge mathematics department should be working on AI as opposed to directly on mathematics. I mean, it sure seems plausible. I do not at all keep up with the field of AI theorem proving, but I gather it's moving pretty fast and going in interesting directions. I mean, there's a broad spectrum from
00:03:00
Speaker
AI is instrumental in solving the next big problem in maths to human mathematicians are completely obsolete. But one of the depressing things about AI is it gets better a lot faster than humans get better. And I mean, if you know your job is going to get obsolete, it's definitely not obvious you should go and make it obsolete faster.
00:03:20
Speaker
If you find doing pure maths fun and you're afraid that mathematicians are going to be outdated in 20 years, I can't keep you doing more maths now before you're made obsolete.
Drivers of Technological Progress
00:03:29
Speaker
Okay, what do you think are the key drivers of technological progress? This does not feel like a thing that I have unique answers on.
00:03:39
Speaker
human capital, new ideas, good social structures like property rights and incentives for innovation that get people to do things, having an environment where people are broadly safe, their rights are respected, they are kind of happy and free people, free exchange of ideas, actual scientific knowledge, funding being allocated to interesting ambitious people who have ideas that can change the world.
00:04:08
Speaker
fairly generic, somewhat neoliberal perspective on that. This isn't a question I've thought about that much. What is a scientific discovery that, if it were made, would completely disrupt your
Scientific Discoveries and Worldviews
00:04:20
Speaker
worldview? So what scientific discovery would turn your worldview upside down? I mean, speaking purely personally, as an atheist, the existence of the God of the Bible would pretty dramatically change a lot of my worldview.
00:04:37
Speaker
anything that would be perhaps less disruptive or closer to... I mean, in terms of closer to home, there's a thing that'd be very life disrupting, like the discovery of meaningful life extension technology such that I
00:04:58
Speaker
do not need to expect to die of old age, incredible anti-obesity drugs, which there seems to be really awesome progress on, which is amazing, which pretty significantly changed how I think about my health.
00:05:12
Speaker
That seems to be an area where we could make massive progress. I imagine if we had effective anti-obesity drugs, that would improve so many lives. I think that's an undervalued area of research, perhaps even though I know we're dedicating billions of dollars to it, but perhaps even so, even in light of that, I think it's undervalued.
00:05:37
Speaker
Yes, I'm very confused about how resources and energy are allocated in biomedical research. I'm much more confused by the comparative lack of investment in things like anti-aging. But... Yeah. I mean, semaglutide is the current hot thing that seems like an actually effective antiobesity drug, which is incredibly good news.
New Funding Models for Science
00:06:00
Speaker
So in terms of life extension, what do you think we should be doing that we aren't doing? Just simply allocating more money, or is there some tweak we should make to how we view aging? So speaking as someone who is not even a biologist, let alone an anti-aging expert, I do not feel remotely qualified to say any strategic things about the field. I have the general view that
00:06:25
Speaker
mostly absorbed from my environment that science is too stilted and centralizes too much of the funding in established people with long careers who fit into established paradigms and is too incrementalist and incentivizes people to have fast results soon.
00:06:43
Speaker
And I'm very excited about experiments like focused research organizations and finding promising people and giving them large budgets to do whatever they think is reasonable. What is a focused research organization? So I'm not very familiar with this. I believe convergent research is one of the main places incubating them. The idea as I understand it is
00:07:09
Speaker
Rather than an alternate model for how you could fund science, is find some promising team, give them a bunch of money, and say you've got a five year timeline to pursue some ambitious task. Like reverse engineer the brain of a nematode worm or something. There are examples of things that actually exist which I don't know.
00:07:35
Speaker
And the organization will wind down after five years, got enough funding that doesn't need to worry about raising funding, and it's got a clear deadline. And it's trying to make as much progress as it can to do focused research. And this seems like a really interesting innovation. I know this will work, but I love that there are people trying stuff like this. Yeah, definitely. I think we need to try different things in science funding in general.
00:08:03
Speaker
Is there any advice you could give to a person that feels like he or she is not living up to her values? Is there a way in which we can approach our values better such that we feel like we are living authentically in accordance with what we want to do?
Living Authentically: Values and Integration
00:08:22
Speaker
Sure. So I'm not sure I personally live up to my values as well as I want to. And if someone does, I'm either very jealous or very skeptical.
00:08:33
Speaker
But one thing which has been pretty important to me broadly feeling happier about this whole thing is accepting that I just have a bunch of different values. I care a lot about doing good in the world, and making the world safer, and as an instrumental thing there, making powerful AI go well, and understanding how to interpret it.
00:08:56
Speaker
But also I care a lot about social approval. I care about feeling like I have status. I care about comfort and having enough money that I don't need to worry about money. I care about just kind of not being in danger and not care about my physical safety. I value intellectual joy and fulfillment and novelty and just following the shiny thing.
00:09:18
Speaker
And these are all values. You can have many values. To me, the path to being an authentic person who satisfies their own values does not look like
00:09:31
Speaker
figure out your true aesthetic value of like, do the most good or build human knowledge. And I think that a mistake that I have made in the past thinking that this is my only value and these other things aren't real things I care about. It's to try to align these values together and find ways to set up your life so the various things that aren't necessarily the same, align as much as possible.
00:09:57
Speaker
which in my case looks like finding a research area that's important, but also incredibly fun. Yeah, that seems like fantastic advice. So trying to align the different goals that are within yourself such that you're achieving goals that you find most important. Yes, and importantly, treating this as like an actual challenge. This is not an easy thing.
00:10:18
Speaker
This is not something you just like, oh, obviously I just go become a researcher in this area that satisfies everything I want. It's a thing that can take years of work and iteration and I am nowhere near complete apps. But it's a thing where deciding to actually try and then be willing to take creativity.
00:10:38
Speaker
I don't know, if someone's listening to this and kind of vibes by challenged viewers, pause this, set a five-minute timer, brainstorm what you could actually do to better align the different values you have and see if you have any actual ideas. But just, this is hot. Yeah, great advice. So if we imagine a person a thousand years ago, I would generally think that this person has worse empirical beliefs than I do and perhaps also worse values than I do.
00:11:08
Speaker
And I think many people would agree with that. But do you think that there's anything we could learn from this person, say a person a thousand years ago from England or from India or wherever? Is there anything where they have more wisdom or perhaps even where they have better empirical beliefs than we do? My short answer is just no. The world seems vastly better and more sensible and more moral and more educated.
00:11:36
Speaker
and basically always I care about. I think that there's general value to having an outside perspective. Any culture is a bubble full of its own. False belief is hard to notice from the outside. And I think any outside perspective can be valuable there. But I don't know. I expect I could get more out of talking to someone from China or some other fairly culturally different country today than I would out of talking to someone 1,000 years ago. All right.
00:12:06
Speaker
Let's say that humanity solves AI alignment. We now have an artificial general intelligence that is on our side. That sounds great. That's great. What should we then do? How are we going to live in such a world? What would we spend our
AI-Managed Progress and World Problems
00:12:26
Speaker
time doing? Would we even exist as we currently think of humans? How would we have fun in such a world?
00:12:35
Speaker
I mean, this just massively depends on exactly how things shake out and what technologies are made, what people want to do. I mean, the kind of cliche answer is there's a singularity. We all get uploaded to the cloud and it's a glorious transhumanist utopia. But lots of people don't want that. I think tracking what people want is important. I think that probably if you had a human level AI that's also kind of cheap enough to run,
00:13:04
Speaker
then, like in a practical way, you could dramatically increase the rate of technological progress. Good in a bunch of ways, concerning a bunch of other ways, can be pretty destabilizing to society when a bunch of things kind of progress at different rates. It's like a really hard problem, can we get a human-level AI to solve this for us? This is totally leaving aside questions about
00:13:32
Speaker
Um, well, actually the first important thing we should do is, you know, go and fix all of the problems in the world today. Like have enough of an economic surplus that no one is in poverty. Um, cure all diseases, figure out climate change, figure out how to like eradicate all disease. That kind of thing. That's like my actual highest priority. Then there's all the random transhumanist speculation.
00:13:57
Speaker
Do you think there is a sense in which if we succeed in this task and in general solve all of the problems that you just mentioned, that life would become boring in some horrifically reverse way that we would sit there and become bored?
Joy in a Problem-Free World
00:14:18
Speaker
I'm kind of skeptical. I feel like if you believe this, your life just sounds kind of dull.
00:14:27
Speaker
And no, I work on things because they're fun and interesting. Lots of people in my life I enjoy getting to know. I enjoy learning new things. This does not require some deep meaning of there being problems in the world. And I kind of think that if you can only be happy there's problems in the world, this seems kind of shitty.
00:14:45
Speaker
it definitely would be a bad psychological disposition. But I'm thinking, for example, what I hear from you is that you're fascinated by difficult mathematical problems. Imagine just that all of these problems have been solved by a very advanced AI. Isn't there a sense in which you draw meaning from challenges and from unsolved problems?
00:15:10
Speaker
So I do think there is some meaningful sense in which a world where there is some authority, who is just more competent than you, who you trust and look up to and who can kind of do everything you might want to do, is a qualitatively different world than the world today. I would guess people often kind of exaggerate how much this matters. We're just on a day-to-day basis. This is not actually that relevant. Like, I do research.
00:15:39
Speaker
For me, has anyone done this before is an important question. Like, if I'm reading a book, it doesn't matter. I like reading the book. I like learning things if I'm meeting new people and forming friendships. This is just a source of joy in my life. And I do think that it would be sad in some aesthetic sense if humanity just ceased to be an important part of science and progress. But in terms of just
00:16:07
Speaker
how I think most ordinary people's lives would go. I don't know, it was like hundreds of millions of billions of people in extreme poverty. I don't think they're going to be complaining. Definitely not. I agree. What is an important concept that is often misunderstood?
00:16:24
Speaker
So one coming to mind is consequentialism, where I feel like lots of people have takes on consequentialism, the moral theory of do things where they have good consequences, that are like, oh, if everyone was a consequentialist, you'd all go around killing people for their organs. Or the world would be a terrible issue, you couldn't trust anyone. And it's just like, that sure sounds like a negative consequence that a competent consequentialist should care about.
00:16:53
Speaker
and that actually I think what people are arguing against is the kind of naive utilitarianism or naive consequentialism of not actually thinking about the second order effects of your actions. And I feel like with thought experiments people often like wildly miss this where you just you do not actually in practice get a scenario where you can kill one person for their organs, save five other people, and there's zero consequences no one will ever find out and this has no effect on the social fabric.
00:17:22
Speaker
What do you think is the most impressive scientific discovery relative to the time in which it was discovered? Probably smallpox vaccination, both because a, eradicating smallpox is just objectively one of the most impressive things humanity has ever done.
Books vs. Digital Tools in Learning
00:17:42
Speaker
And also, I believe the first time it was discovered was in the late 1700s or early 1800s.
00:17:50
Speaker
And it is just wild that, at a time with such poor knowledge of medicine, people figured out, oh, this terrible deadly disease can be prevented if you get yourself this... ...wicker disease. And to some degree did actual experiments.
00:18:08
Speaker
Is it the case that books are outdated? What does outdated even mean? I mean, physical books are in some ways superseded by ebooks. They're not sure that's the spirit of the question and many people disagree.
00:18:21
Speaker
Let me rephrase that. The spirit of the question is I want to learn a complex area of human knowledge. It's the case that I shouldn't use a book or a textbook and that I should perhaps work through exercises on a website or read papers or so. Do you think books are outdated in that sense? I'd be pretty surprised.
00:18:49
Speaker
To me this just kind of feels like the wrong question. I think the best way to learn something generally looks like follow the guidance of an expert on how best to learn it, find clean distilled versions of the concepts. Generally, if someone competent writes a good textbook, that's a great thing to read and work through.
00:19:11
Speaker
Sometimes the people who could write good textbook could write articles instead, or write blog posts on lists of exercises. And sometimes the advice is, ah, you shouldn't read a book, you should go write code, or go out in the world and try this thing.
00:19:25
Speaker
I don't see how this is like different about today, except in that, I don't know, the internet means that getting access to articles relative to books is comparatively easier. There's also things like, I don't know, I've been learning web development recently. I just asked chat GPT everything and it goes surprisingly well. So that's a sense of what books might be outdated. That was exactly the spirit of the question, actually. I was looking for whether you think that we could learn more efficiently from language models rather than textbooks.
00:19:54
Speaker
I think that current language models are so wildly unreliable and non-robust that use them as an assistant when doing something where you get feedback, whether you've been told something that's BS or not seems great. Use them in a context where you just don't know how to tell the difference just seems insane.
00:20:18
Speaker
Yeah, because if language models will often give you answers that might sound plausible, if you know very little about an area, it will give you confident sounding answers in the same way that it will give you answers in an area you know something about. But it can be difficult to differentiate between the BS answers and the real answers. Yes, this is one of the many reasons that I would like to be able to better understand these models and what's going on inside of them. That would be incredibly useful.
00:20:48
Speaker
I have just been really impressed with how good chatgpt is at explaining web development to me though. And it's the kind of venue domain where I'm happy learning from a language model that's deeply unreliable. I just run the code and check. Would you think that chatgpt is especially good here since there's probably a lot of text on the internet about learning web development? Honestly, I would guess that a lot of it is that I believe chatgpt was heavily trained on code as well as text.
00:21:16
Speaker
And it seems generally good for coding tasks, because it can do things like write snippets that sometimes just work. Should settling Mars be a priority for humanity? Priority is unclear. Do I think that settling Mars is the best way to reduce existential risk? No, not at all. I think that
00:21:41
Speaker
Even in the reference class of living on Mars, making a super bunker on Earth that is insulated from anything including pandemics, and where people only enter or exit once a year or something seems a lot more cost-effective than going to Mars.
00:21:59
Speaker
Do I think going to Mars would be awesome? Yes. Do I think it's plausible that going to Mars is like a better use of resources than many other things that the economy optimizes? Also yes. I know the world spends billions of dollars on ice cream. Seems plausible that redirecting some of that to going to Mars is a good trade.
00:22:25
Speaker
Okay, this kind of thing is messy because numbers and counterfactuals are weird, but yeah. Arguably going to the moon spurs significant technological development though is very hard to tell fact from that's a really nice story which isn't actually true in these contexts. How do you focus on your work? Is there anything you do to not get distracted and to do deep work?
00:22:52
Speaker
I am just pretty bad at this, is the first thing I should say. Which also means I've spent lots of times trying and failing to solve this. Things I do that are pretty helpful. Disable WiFi on my laptop if I'm doing some offline writing, or if I'm getting particularly distractable, turn off WiFi on my laptop and commit to paying my girlfriend £20 if I turn it back on before I finish the task. Block out a day with no calls or other things during, which is intended for just
00:23:21
Speaker
focus hard on my top priorities and prioritize harder the start of the day, co-working with people where we check in every 25 to 50 minutes with each other on how we're doing, set intentions, potentially watch each other's screens. There's this website called focusmate.com, which I really like, which just matches you with some random stranger for a co-working video call.
00:23:50
Speaker
Also, doing things you like rather than things you don't like. Really effective. Yeah. But do you find it still a problem even though you work on something that you do like working on? Oh, yeah. Yes. My executive function could be way better. Do you see any problems with relationships between humans and AIs? I'm thinking about romantic relationships between an AI chatbot and a human.
00:24:17
Speaker
So I generally try to be open-minded and not judge things. I
00:24:24
Speaker
I do think that on a purely aesthetic level, I find the idea of someone falling in love with a kind of fiction or a kind of misunderstanding or superficial veneer on some computer program kind of unsettling in the same way that, I don't know, if someone falls in love with someone they meet on the internet and they're actually being catfished by a scammer, I think this is a tragedy. Even if the person's life is genuinely really enriched by the interaction with the scammer,
00:24:55
Speaker
I also think lots of people are really lonely. Loneliness is like really, really bad. And that if AI can help people be less lonely, this seems pretty good. It's a complicated question. I also think that figurines kind of question out. It's not the kind of thing that society is very good at. And I sure wish there was someone whose job it was to solve these issues who was actually competent at it.
00:25:19
Speaker
Are there aliens in the universe? Probably. There's many planets. It seems really weird to think that Earth is the only place. The fact that we haven't seen it is definitely evidence against, but it's not strong evidence against. Do you think we will ever, as a species, bump into these aliens? I mean, I expect us to get AGI first. And that's more than alien enough for me.
00:25:49
Speaker
Do you think that the most likely outcomes for humanity are either go extinct or colonize the stars? And I think that once you start expanding throughout space, if there are aliens, you will eventually meet the aliens. What are some of your favorite books that you think the audience should read? A book very close to my heart is Harry Potter and the methods of rationality by Eliezer Yudkowsky.
00:26:15
Speaker
which is this Harry Potter fanfiction that he wrote to try to model clear thinking and rational behaviour. And I read this when I was 14 and thought it was amazing. I now reread it, and I'm like, there are so many flaws in that character that I did not realise were flaws when I was 14. But also, this is still like a great book that I got a lot out of reading. So I really like that one.
00:26:40
Speaker
I think the alignment problem by Brian Christian is pretty great, and is probably my favorite description of alignment-flavored concerns in real current day AI models. I'm a bit confused about. It's also just like really well written. I really like it.
00:27:01
Speaker
The Precipice by Toby Ord I think is a pretty solid just why existential risk is a thing you might be concerned about that is pleasantly well explained and broadly says reasonable things. What is an overlooked positive aspect of humanity?
00:27:22
Speaker
We live in a world where it's just not that hard to do quite a lot of damage in a variety of ways. You know, make bombs from household materials. And very, very few people do this. And there's just lots and lots of ways that affect against society in a way that no one really notices. And this mostly does not happen, at least in, I don't know, the west. And this is kind of great.
00:27:52
Speaker
So the world is fragile, but the fact that it hasn't been destroyed yet, or the fact that we don't see more bad things happening is a testament to most humans being good, perhaps. Kind of. I think that destroying the world is much more in the ages of nation-state actors, etc., with nuclear weapons, or gain-of-function research labs, and very terrifying things like that. And I think that
00:28:21
Speaker
Those seem concerning in an entirely different way, but just there are not that many people who try to kill other people. This is just a great thing about the world.
00:28:34
Speaker
Do you think people should spend more resources and time prepping, so prepping for disasters by storing food or storing tools? This used to be something of a left at subculture, but perhaps COVID-19 has changed that in a way in which it's now more obvious to more people that it would be useful to have some general basic prepping done by more people.
00:29:03
Speaker
I kind of think everything is ultimately a cost-benefit trade-off. I think that people do systematically overestimate how likely things are to be basically normal, and underestimate how likely there is to be big, weird disasters like COVID that dramatically change things. I also think that it's very easy to fall in the opposite trap of investing
00:29:30
Speaker
a large amount of your resources and time and energy into preparing for things. And I think that with everything you should 80-20. Just understanding the kind of common natural disasters that happen where you live and being pretty robust to like extreme versions of those. Having enough foods and power such that if utilities get cut off you're like
00:29:54
Speaker
not screwed, but I kind of feel like prepping is just an unusually salient example of the general thing of bad things happen. Like, I don't know, as a young person, probably one of the biggest risks to me beyond death is that I
00:30:18
Speaker
ends up in some situation which dramatically decreases my future earning potential or like leaves me with some kind of brain damage or significant disability and probably should have some insurance against that. I don't. Maybe I should fix this. But to me, it's definitely not obvious that not prepping is a mistake if you also don't have things like that. Got it.
Advice for Teenagers: Exploration Over Grades
00:30:42
Speaker
What advice would you give to teenagers listening to this?
00:30:46
Speaker
explore, try out a bunch of things, take seriously the idea that there can be things you actually care about if you can just find them. Don't fall into the traps I fell into where you think that school or university is the only thing that matters and that getting good grades and pleasing the authority figures in your life is the main metric by which success is measured.
00:31:16
Speaker
But also don't think that it doesn't matter at all, it's kind of stuff that's instrumentally useful, and sometimes correlates with things that actually matter. Try to figure out what you actually care about, and how you can actually take action towards this, and don't just follow the default paths the world lays out in front of you.
00:31:34
Speaker
but actually check what defaults you're on and check whether you're happy remaining on them. I can say that this is advice that would have been useful to me as a teenager. Yes. If people are interested, I have many posts of life advice on my blog at neilnanda.io, some of which might even be useful.
00:31:51
Speaker
Yeah, I recommend listeners go read some of those. If we look at if we look at the generative AI models like chat GPT or Dolly, how do you see these models evolving over the next five years? What do you think they'll be able to do? And how do you think they'll impact society? Sure. So brief answer, bigger, better, more general, more impressive.
00:32:17
Speaker
I expect lots of the current jankiness around being somewhat unreliable, needing to do a bunch of delicate prompt engineering to get it to do exactly what you want are gonna go away, as they kind of already have. ChatGPT is a notable improvement on GPT-3 in terms of kind of usability and user interface. Even if it's not actually that much better in terms of what I can actually do, I think that
00:32:44
Speaker
Dally for videos seems like a thing that's possibly going to be a thing in the next five years. I expect that a much larger fraction of software engineering jobs is going to be done with co-pilot style models. Do you have an interesting idea of how much can be achieved in the world interacting with it only through an internet connection or browser? Seems like quite a lot.
00:33:10
Speaker
I think there's a bunch of stuff that the world that society tries to make require you to have a physical presence and identity. Know your customer regulations, needing to show passports to have the right to work somewhere, etc. Which make it much harder to do, like you can do them via a browser, but it requires significantly more sophistication to interface with society's systems. But beyond stuff like that,
00:33:39
Speaker
And no, I didn't go out of my house much during 2020. My life was pretty functional in terms of my ability to interact with the world and do things. And I ask this, of course, because if you have a language model, there is some way to have it generate a list of tasks or a plan in a list and then perhaps translate that list into actions in a browser. And if a lot can be achieved in a browser, then perhaps the language models are more capable of affecting the world than we might believe.
00:34:09
Speaker
Yes, I do not think that the bottleneck on language models being scary right now is the limitations of what you can do in a browser. I think you do a lot in a browser and the kinds of AI risks I'm concerned about look a lot more like a competent manipulative system with internet access than they do like an army of killer robots. I'm also concerned about the army of killer robots to be clear.
00:34:37
Speaker
Yeah, let's end it here, Neil. This has been interesting and thank you for coming on the podcast. Yeah, this was fun. Thanks a lot for having me on.