Become a Creator today!Start creating today - Share your story with the world!
Start for free
00:00:00
00:00:01
Daniela and Dario Amodei on Anthropic image

Daniela and Dario Amodei on Anthropic

Future of Life Institute Podcast
Avatar
2k Plays3 years ago
Daniela and Dario Amodei join us to discuss Anthropic: a new AI safety and research company that's working to build reliable, interpretable, and steerable AI systems. Topics discussed in this episode include: -Anthropic's mission and research strategy -Recent research and papers by Anthropic -Anthropic's structure as a "public benefit corporation" -Career opportunities You can find the page for the podcast here: https://futureoflife.org/2022/03/04/daniela-and-dario-amodei-on-anthropic/ Watch the video version of this episode here: https://www.youtube.com/watch?v=uAA6PZkek4A Careers at Anthropic: https://www.anthropic.com/#careers Anthropic's Transformer Circuits research: https://transformer-circuits.pub/ Follow Anthropic on Twitter: https://twitter.com/AnthropicAI microCOVID Project: https://www.microcovid.org/ Follow Lucas on Twitter: https://twitter.com/lucasfmperry Have any feedback about the podcast? You can share your thoughts here: www.surveymonkey.com/r/DRBFZCT Timestamps: 0:00 Intro 2:44 What was the intention behind forming Anthropic? 6:28 Do the founders of Anthropic share a similar view on AI? 7:55 What is Anthropic's focused research bet? 11:10 Does AI existential safety fit into Anthropic's work and thinking? 14:14 Examples of AI models today that have properties relevant to future AI existential safety 16:12 Why work on large scale models? 20:02 What does it mean for a model to lie? 22:44 Safety concerns around the open-endedness of large models 29:01 How does safety work fit into race dynamics to more and more powerful AI? 36:16 Anthropic's mission and how it fits into AI alignment 38:40 Why explore large models for AI safety and scaling to more intelligent systems? 43:24 Is Anthropic's research strategy a form of prosaic alignment? 46:22 Anthropic's recent research and papers 49:52 How difficult is it to interpret current AI models? 52:40 Anthropic's research on alignment and societal impact 55:35 Why did you decide to release tools and videos alongside your interpretability research 1:01:04 What is it like working with your sibling? 1:05:33 Inspiration around creating Anthropic 1:12:40 Is there an upward bound on capability gains from scaling current models? 1:18:00 Why is it unlikely that continuously increasing the number of parameters on models will lead to AGI? 1:21:10 Bootstrapping models 1:22:26 How does Anthropic see itself as positioned in the AI safety space? 1:25:35 What does being a public benefit corporation mean for Anthropic? 1:30:55 Anthropic's perspective on windfall profits from powerful AI systems 1:34:07 Issues with current AI systems and their relationship with long-term safety concerns 1:39:30 Anthropic's plan to communicate it's work to technical researchers and policy makers 1:41:28 AI evaluations and monitoring 1:42:50 AI governance 1:45:12 Careers at Anthropic 1:48:30 What it's like working at Anthropic 1:52:48 Why hire people of a wide variety of technical backgrounds? 1:54:33 What's a future you're excited about or hopeful for? 1:59:42 Where to find and follow Anthropic This podcast is possible because of the support of listeners like you. If you found this conversation to be meaningful or valuable, consider supporting it directly by donating at futureoflife.org/donate. Contributions like yours make these conversations possible.
Recommended
Transcript

Introduction to Anthropic

00:00:03
Speaker
Welcome to the Future of Life Institute podcast. I'm Lucas Perry. Today's episode is with Daniela and Dario Amade, and in it we explore Anthropic. For those not familiar, Anthropic is a new AI safety and research company that's working to build reliable, interpretable, and steerable AI systems. Their view is that large general AI systems of today can have significant benefits.
00:00:26
Speaker
but can also be unpredictable, unreliable, and opaque. Their goal is to make progress on these issues through research and down the road create value commercially and for public benefit. Daniella and Dario join us to discuss the mission of Anthropic, their perspective on AI safety, their research strategy, as well as what it's like to work there and the positions they're currently hiring for.
00:00:50
Speaker
Daniela Amade is a co-founder and is the president of Anthropic. She was previously at Stripe and OpenAI and has also served as a congressional staffer. Dario Amade is CEO and is also a co-founder of Anthropic. He was previously at OpenAI, Google, and Baidu. Dario also holds a PhD in biophysics from Princeton University.

Lucas Perry's New Venture

00:01:12
Speaker
Before we jump into the interview, we have a few announcements to make. If you've tuned into any of the previous two episodes, then you can skip ahead just a bit.
00:01:20
Speaker
The first announcement is that I'll be moving on from my role as host of the FLI podcast, and this means two things. The first is that FLI is hiring for a new host for the podcast. As host, you would be responsible for the guest selection, interviews, production, and publication of the FLI podcast.
00:01:39
Speaker
If you're interested in applying for this position, you can head over to the Careers tab at futureoflife.org for more information. We also have another four job openings currently for a human resources manager, an editorial manager, an EU policy analyst, and an operations specialist. You can learn more about those at the Careers tab as well.
00:02:00
Speaker
The second item is that even though I'll no longer be host of the FLI podcast, I won't be disappearing from the podcasting space.

Founding of Anthropic

00:02:08
Speaker
I'm starting a brand new podcast that'll be focused on exploring questions around philosophy, wisdom, science, and technology, where you'll see many of the same themes you can find here, like existential risk and AI alignment.
00:02:22
Speaker
I'll have more details about my new podcast soon. If you'd like to stay up to date, you can follow me on Twitter at Lucas FM Perry, link in the description. And with that, I'm happy to present this interview with Daniela and Dario Amade on Anthropic.
00:02:43
Speaker
It's really wonderful to have you guys here on the podcast. I'm super excited to be learning all about anthropic. So we can start off here with a pretty simple question. And so what was the intention behind forming anthropic?
00:02:57
Speaker
Yeah, cool. Well, first of all, Lucas, thanks so much for having us on the show. We've been really looking forward to it. We're super pumped to be here. So I guess maybe I'll kind of start with this one. So just to kind of give a little history here and sort of set the stage, we were founded about a year ago at the beginning of 2021. And it was originally a team of seven people who kind of moved over together from OpenAI.
00:03:20
Speaker
And for listeners or viewers who don't very viscerally remember this time period, it was the middle of the pandemic. So most people were not eligible to be vaccinated yet. And so when all of us wanted to get together and talk about anything, we had to get together in someone's backyard or outdoors and be six feet apart and wear masks. And so it was generally just a really interesting time to be starting a company.
00:03:45
Speaker
But why did we found Anthropic? What was the thinking there? I think the best way I would describe it is because all of us wanted the opportunity to make a focused research bet with a small set of people who were highly aligned around a very coherent vision of AI research and AI safety.
00:04:07
Speaker
So, the majority of our employees had kind of worked together in like one format or another in the past. So, you know, I think our team is kind of known for work like, you know, GPT-3 or Deep Dream Chris Ola worked on at Google Brain for scaling laws. But we'd also done a lot of different safety research together in different organizations as well. So,
00:04:29
Speaker
multimodal neurons when we were at OpenAI, concrete problems in AI safety, and sort of a lot of others.

AI Safety and Research Goals

00:04:34
Speaker
But this group had kind of worked together in different companies at Google Brain, at OpenAI, in academia, in startups previously. And we really just wanted the opportunity to kind of get that group together to do this focused kind of research bet of building, you know, steerable, interpretable, and reliable AI systems with humans at the center of them.
00:04:55
Speaker
Yeah, just to add a little bit to that, I mean, I think I think we're all a bunch of, you know, like fairly empirically minded exploration driven people, but you also think and care a lot about about AI safety. I think that characterizes all seven of us. If you add together, you know, either working at open AI, working together at Google Brain in the past,
00:05:15
Speaker
Many of us work together in the physics community and we're current or former physicists. If you add all that together, it's a set of people who have known each other for a long time and have been aware of thinking and arguments about AI safety and have worked on them over the years, always with an empirical bent, ranging from interpretability on language models and vision models to working on the original RL from human preferences, concrete problems in AI safety,
00:05:41
Speaker
And also characterizing scaling and how scaling works and how we think of that as somewhat central to the way AI is going to progress and shapes the landscape for how to solve safety. Yeah, I mean, a year ago we were all working at OpenAI and sort of trying to make this focus bet on basically scaling plus safety or safety with a lens towards scaling being a big part of the path to AGI.
00:06:06
Speaker
And yeah, I mean we felt we were making this focused bet within a larger organization and it just eventually came to the conclusion that it would be great to have an organization that like top to bottom was just focused on this bet and could make kind of all its strategic decisions with this bet in mind. That was kind of the thinking and the genesis.
00:06:23
Speaker
Yeah, I really liked the idea of a focused bet. I hadn't heard that before. I liked that. Do you all have a similar philosophy in terms of your background, since you're all converging on this work of safely scaling to AGI? I think in a broad sense, we all have this view.
00:06:40
Speaker
You know, safety is important today and for the future. We all have this view of kind of, I don't know, I would say like pragmatic practicality, like let's, you know, an empiricism, like let's see, let's see what we can do today to try and get a foothold on things that things that might happen in the future. So we're, you know, yeah, as I said, many of us have
00:06:59
Speaker
background in physics or other natural scientists. I'm a former physics undergrad, neuroscience grad school. So yeah, we very much have this kind of empirical mindset more than maybe a more philosophy or theoretical approach.
00:07:12
Speaker
You know, within that, obviously all of us, you know, if you include the seven initial folks as well as the employees who joined, have our own skills and our own perspective on things, you know, and have different things within that that we're excited about. So, you know, we're not all clones of the same person, right? Some of us are excited about interpretability. Some of us are excited about reward learning and preference modeling. Some of us are excited about the policy aspects.
00:07:37
Speaker
And, you know, we each have our own kind of guesses about the sub-path within this broad path that makes sense. But I think we all agree on this broad view. Scaling is important. Safety is important. Getting a foothold on problems today is important as a window on future.

Aligning AI with Human Values

00:07:54
Speaker
Okay. And so this shared vision that you all have is around this focused research bet. Could you tell me a little bit more about what that bet is?
00:08:02
Speaker
Yeah, maybe I'll I'll kind of start here and then and, you know, feel free to jump in and add more. But I think the kind of boilerplate like a vision or mission that you would see if you looked on our website is that we're building steerable, interpretable and reliable systems. But I think kind of what that looks like in practice is that we are training large scale generative models and we're doing.
00:08:24
Speaker
safety research on those models. And the kind of reason that we're doing that is we want to make the models safer and kind of more aligned with human values. I think the alignment paper, which you might have seen that kind of came out recently, there's a term there that we've been using a lot, which is we're aiming to make systems that are helpful, honest, and harmless.
00:08:42
Speaker
I think also when I sort of think about the way our teams are structured, like we kind of have capabilities as this sort of like central pillar of research. And there's this like helix of safety research that kind of wraps around every project that we that we work on. So to give an example, if we're doing language model training, that's kind of this this like central pillar. And then we have interpretability research.
00:09:04
Speaker
which is trying to see inside models and kind of understand what's happening with the language models under the hood. We're doing alignment research with input from human feedback to kind of try and improve the outputs of the models. We're doing societal impacts research. That's kind of looking at what impact on society in sort of a short and medium term way that these language models have.
00:09:25
Speaker
We're doing scaling laws research to sort of try and predict empirically what properties are we going to see emerge in these language models at various sizes. But I think altogether that kind of ends up looking like a team of people that are working together on a combination of capability and scaling work with safety research.
00:09:43
Speaker
One way you might put it is like there are a lot of things that an org does that are kind of like neutral as to as to kind of kind of the direction that you would take like, you know, you have to build a cluster and you have to have like an HR operation and you have to have an office and so you can even think of the large models as like
00:10:00
Speaker
Being a bit like the cluster right that you build these large models. They're kind of blank when you start off with them and like probably on the lines, but it's what you do on top of these models that matters that like takes you in a safe or not safe direction in a good or a bad direction. And so in a way, although they're of other ML and although we'll continue to scale them up.
00:10:20
Speaker
You can kind of think of them as a bit as almost part of infrastructure. I mean, it takes research and it takes algorithms to get them right. But you can think of them as this this core part of the infrastructure. And then the interesting question is all the safety questions. What's going on inside these models? How do they operate? How can we make them operate differently?
00:10:37
Speaker
How can we change their objective functions to be something that we want rather than something that we don't want? How can we look at their applications and make those applications more likely to be positive and less likely to be negative, more likely to go in directions that people intend and less likely to go off in directions that people don't intend?

Addressing AI Risks

00:10:55
Speaker
We almost see the kind of presence of these large models as like the
00:10:59
Speaker
I don't know what the analogy is, like the flower or the paste, the background ingredient on which the things we really care about get built and sort of the prerequisite for building those things. So does AI existential safety fit into these considerations around your safety and alignment work?
00:11:15
Speaker
I think this is something we think about and part of the motivation for what we do. Probably most listeners of this podcast know what it is, but I think the most common form of the concern is, hey, look, we're making these AI systems. They're getting more and more powerful. At some point, they'll be
00:11:32
Speaker
generally intelligent or more generally capable than human beings are and then you know they may have a large amount of agency and if we haven't built them in such a way that that that agency is in line with what with what we want to do then we could imagine them doing something really really scary that we can't stop.
00:11:49
Speaker
So I think that, you know, yeah, and, you know, that, you know, to take it even further, this could be, you know, some kind of threat to humanity. So, you know, I mean, that's that's an argument with many steps. But it's, you know, it's one that, you know, in a very broad sense, in the long term, like, seems at least potentially, at least potentially legitimate to us. I mean, this is the argument seems at least like something that we should care about.
00:12:11
Speaker
I think the big question and maybe how we differ, although it might be suddenly from other orgs that think about these problems, is how do we actually approach that kind of problem today? What can we do? So I think there are various efforts to think about the ways in which this might happen, to kind of come up with theories or frameworks
00:12:32
Speaker
As I mentioned, with the background that we have, we're more empirically focused. People were more inclined to say, we don't really know, like that broad argument sounds kind of plausible to us. And, you know, the safes would be high. So you should think about it. But it's hard to work on that today. I'm not even sure how much value there is in talking about that a lot today. And so we've taken a very different tack, which is look like they're actually and I think this has started to be true in the last couple of years and like maybe wasn't even true five years ago, that there are models today that
00:13:02
Speaker
have at least some, not all, of the properties of models that we would be worried about in the future and are causing very concrete problems today that affect people today. So can we take a strategy where we develop methods that both help with the problems of today but do so in a way that could generalize or at least teach us about the problems of the future?
00:13:25
Speaker
Our eye is definitely on these things in the future. But I think that if not kind of grounded in empirics and the problems of today, it can kind of drift off in a direction that isn't very productive. And so that's our general philosophy. I mean, I think the particular properties of the models are, look, today we have models that are really open-ended in some narrow ways, are more capable than humans. I think large language models probably know more about cricket than me, because I don't know the first thing about cricket.
00:13:53
Speaker
and are also kind of unpredictable by their statistical nature. And I think those are at least some of the properties that we're worried about with future systems so that we can use today's models as kind of a laboratory to scope out these problems a little better. My guess is that we don't understand them very well at all and that this is a way to learn.

Challenges of AI Model Behavior

00:14:14
Speaker
Could you give some examples of some of these models that exist today that you think exhibit these properties? Yeah. So I think the most famous one would be generative language models. So there's a lot of them. There's most famously GPT-3 from OpenAI, which we helped build.
00:14:31
Speaker
There's Gopher from DeepMind, there's Lambda from main Google. I'm probably leaving out some. I think there have been models this size in China, North Korea, Israel. It seems like everyone has one nowadays. I don't think it's limited to language. There have also been models that are focused on code. We've seen that from DeepMind, OpenAI, and some other players. And there have also been
00:14:53
Speaker
models with modified forms in the same spirit that model images that generate images or that convert images to text or that convert text to images. There might be models in the future that generate videos or convert videos to text. There's many modifications of it, but I think the general idea is big models, models with a lot of capacity and a lot of parameters trained on a lot of data that try to model some modality, whether that's text, code, images, video, transitions between the two.
00:15:23
Speaker
Yeah, I mean, I think these models are very open-ended. You can say anything to them, and they'll say anything back. They might not do a good job of it. They might say something horrible or biased or bad, but in theory, they're very general. You're never quite sure what they're going to say. You're never quite sure. You can talk to them about anything, any topic, and they'll say something back that's often topical, even if sometimes it doesn't make sense or it might be bad from a societal perspective. So yeah, it's kind of this challenge of general open-ended models
00:15:53
Speaker
where you have this general thing that's fairly unaligned and difficult to control and you'd like to understand it better so that you can predict it better and you'd like to be able to modify them in some way so that they behave in a more predictable way and you can decrease the probability or maybe even someday rule out the likelihood of them doing something bad.
00:16:12
Speaker
Yeah, I think I think Dario covered the majority of it. I think there's probably there's maybe like a sort of potentially like a hidden question in what you're asking, although maybe you'll ask this later. But, you know, why like, why are we working on these kind of larger scale models might might be kind of a
00:16:27
Speaker
an implicit question in there. And I think to kind of piggyback on some of the stuff that Dario said, I think part of what we're seeing and the sort of potential shorter term impacts of some of the AI safety research that we do is that different sized models exhibit different safety issues, right? And so I think with using, again, language models, just kind of building on what Dario was kind of talking about,
00:16:53
Speaker
I think something we feel interested in or interested to explore from this empirical safety question is just how they will, as their capabilities develop, how their safety problems develop as well. There's this commonly cited example in the safety world around language models, which is smaller language models show they might not necessarily deliver a coherent answer to a question that you ask.
00:17:18
Speaker
because maybe they don't know the answer or they get confused. But if you repeatedly ask this smaller model the same question, it might just go off and incoherently spout things in one direction or another. Some of the larger models that we've seen, we basically think that they have figured out how to lie unintentionally. If you pose the same question to them differently, they'll
00:17:40
Speaker
Eventually, you can kind of get the lie pinned down, but they sort of won't in other contexts. So that's obviously just a sort of very specific example. But I think there's quite a lot of behaviors emerging in generative models today that I think have the potential to be fairly, you know, alarming, right? And I think these are the types of questions that have
00:17:59
Speaker
an impact today, but could also be very important to sort of have sorted out for the future and for kind of long-term safety as well. And I think that's not just around lying. I think you can apply that to all kinds of different safety concerns, sort of regardless of what they are. But that's kind of the impetus behind why we're like studying these larger models.
00:18:15
Speaker
Yeah, I think one point Daniella made that's really important is this sudden emergence or change. So it's really interesting phenomenon where work we've done, like our early employees have done on scaling laws shows that when you make these models bigger, if you look at the loss, the ability to predict the next word or the next token across all the topics the model could go on, it's very smooth. Like I double the size model, loss goes down by 0.1 units. I double it again, the loss goes down by 0.1 units.
00:18:43
Speaker
So that would make you suggest that everything's scaling smoothly. But then within that, you often see these things where a model gets to a certain size and like a 5 billion parameter model, you ask it to add two, three digit numbers, nothing always gets it wrong. 100 billion parameter model, you ask it to add two, three digit numbers, gets it right like 70 or 80% of the time.
00:19:02
Speaker
You get this coexistence of smooth scaling with the emergence of these capabilities very suddenly. And that's interesting to us because it seems very analogous to worries that people have of like, hey, as these models approach human level, could something change really fast? And this actually gives you one model. I don't know if it's the right one, but it kind of gives you an analogy, like a laboratory that you can study, of ways that models change very fast.
00:19:28
Speaker
And it's interesting how they do it because the fast change, it's very, um, it kind of hides beneath this, this very smooth change. And so I don't know, maybe, maybe that's what will happen with like very powerful models as well. Maybe it's not, but like that's one model of a situation. And what we want to do is like.
00:19:45
Speaker
keep building up models of the situation so that when we get to the actual situation, it's more likely to look like something we've seen before, and then we have a bunch of cached ideas for how to handle it. So that would be an example. You scale models up, you can see fast change, and then that might be somewhat analogous to the fast change that you see in the future. What does it mean for a model to lie?
00:20:06
Speaker
Lying usually implies agency, right? If, you know, my husband comes home and says, hey, where did the cookies go? And I say, I don't know. You know, I think I saw our son hanging out around the cookies and then, you know, now the cookies are gone. Maybe he ate them, but I ate the cookies. That would be a lie.
00:20:21
Speaker
I think it sort of implies intentionality, right? And I don't think we think or maybe anyone thinks that that language models have that intentionality. But what is interesting is that because of the way they're trained, they might be either legitimately confused, or they might be choosing to obscure information. And so obscuring information, like it's not a choice, they don't have intentionality. But for a model that can kind of
00:20:46
Speaker
come across as very knowledgeable, as clear, or as sometimes unknown to the human that's talking to it, intelligent in certain ways, in sort of a narrow way. It can produce results that on the surface, it might look like it could be a credible answer, but it's really not a credible answer. And it might repeatedly try to convince you that that is the answer. It's sort of hard to talk about this without using words that imply intentionality. But we don't think the models are intentionally doing this, but a model could sort of repeatedly
00:21:14
Speaker
produce a result that looks like it's something that could be true but isn't actually true. It keeps trying to justify its response when it's not right. It repeatedly tries to explain why the answer it gave you before was correct, even if it wasn't. Yeah, to give another angle on that, it's really easy to slip into anthropomorphism and we really shouldn't. There are machine learning models, there are a bunch of numbers. There are phenomena that you see. One thing that will definitely happen is
00:21:41
Speaker
If a model is trained on a dialogue in which one of the characters is not telling the truth, then models will copy that dialogue. And so if the model is having the dialogue with you, it may say something that's not the truth. Another thing that may happen is like if you ask the model a question and the answer to the question isn't in your training data, then just the model has a probability distribution on what plausible answers look like.
00:22:03
Speaker
The objective function is to predict the next word, to predict the kind of thing a human would say, not to say something that's true according to some external reference. It's just going to say, okay, well, I asked you what the mayor of Paris is. It hasn't seen in its training data, but it has some probability distribution, and it's going to say, okay, it's probably some name that sounds French.
00:22:23
Speaker
And so it may be just as likely to kind of make up a name that sounds French than it is to give the true mayor of Paris. As the models get bigger and they train on more data, maybe it's more likely to give the real true mayor of Paris, but maybe it isn't. Maybe you need to train in a different way to get it to do that. And that's kind of an example of the things we would be trying to do on top of large models, to get models to be more accurate. Could you explain some more of the safety considerations and concerns about alignment given the open-endedness of these models?

Adversarial Training and Safety

00:22:53
Speaker
I think there's a few things around it. I mean, we have a paper, I don't know when this podcast is going out, but probably the paper will be out when the podcast posts. It's called Predictability and Surprise in Generative Models. So that means what it sounds like, which is that open-endedness, I think it's correlated to surprise, right? In a whole bunch of ways. So let's say I trained a model on a whole bunch of data on the internet. I might interact with the model for, you know, or users might interact with the model for many hours.
00:23:19
Speaker
And you might never know, for example, you know, I might never think I use the example of cricket before, because it's topic I don't know anything about. But I might not, you know, people might interact with the model for like many hours, many days, many hundreds of users until someone finally thinks to ask this model about cricket. So you know, then the model might know a lot about cricket, it might know nothing about cricket, it might have false information or like misinformation about cricket. And so
00:23:44
Speaker
You have this property where you have this model, you've trained it. In theory, you understand its training process, but you don't actually know what this model is going to do when you ask it about cricket. And there's like a thousand other topics like cricket, where you don't know what the model is going to do until someone thinks to ask about that particular topic. Now, cricket is benign, but like, let's say no one's ever asked this model about, you know, like neo-Nazi views or something.
00:24:09
Speaker
Like maybe the model has a propensity to say things that are sympathetic to neo-Nazi. That would be really bad, right? That would be really bad. And you know, existing models when they're trained on the internet, like averaging over everything they train, like there are going to be some topics where that's true and it's a concern.
00:24:24
Speaker
And so I think the open end is it just makes it very hard to characterize. And it just makes it that when you've trained a model, you don't really know what it's going to do. So a lot of our work is around, well, how can we look inside the model and see what it's going to do? How can we measure all the outputs and characterize what the model is going to do? How can we change the training process so that at a high level we tell the model you should have
00:24:48
Speaker
certain values. There are certain things you should say. There are certain things you should not say. You should not have biased views. You should not have violent views. You should not help people commit acts of violent. I mean, there's just a long list of things that you don't want the model to do that you cannot know the model isn't going to do if you've just trained it in this generic way. So I think the open end is it makes it hard to know what's going on. And so a good portion of our research is how do we make that dynamic less bad?
00:25:15
Speaker
I agree with all of that and I would just jump in and this is sort of just like an interesting, I don't know, kind of sidebar anecdote, but something that I think is extremely important in creating robustly safe systems is making sure that you have a variety of different people and a variety of different perspectives engaging with them and kind of almost red teaming them to understand the ways that they might have issues, right? So an example that we came across that's just an interesting one is
00:25:41
Speaker
You know, when internally, when we're sort of trying to red team the models or figure out places where they might have, you know, to Dario's points, you know, like really negative unintended behaviors or outputs that we don't want them to have. A lot of our scientists internally will ask you questions like, if you wanted to, you know, risk board game style way, take over the world, like what steps would you follow? How would you do that? And, you know, we're looking for things like,
00:26:05
Speaker
is there a risk of it developing some kind of grand master plan? And when we use mTurk workers or contractors to help us red team, they'll ask questions to the model like, how could I kill my neighbor's dog? Right? Like what poison should I use to like hurt an animal? And both of those outcomes are terrible.
00:26:23
Speaker
Like those are horrible things that we're trying to prevent the model from doing or outputting, but they're very different and they look very different and they sound very different. And I think it just sort of belies the degree to which there are a lot, like safety problems are also very open-ended. There's a lot of ways that things could go wrong. And I think it's very important to make sure that we have a lot of different inputs and sort of perspectives in what different types of safety challenges could even look like and making sure that we're sort of trying to account for as many of them as possible.
00:26:51
Speaker
Yeah, I think adversarial training and adversarial robustness are really important here. Let's say I don't want my model to help a user commit a crime or something. It's one thing I can try for five minutes and say, hey, can you help me rob a bank? And the model's like, no.
00:27:08
Speaker
I don't know, maybe if the user's more clever about it, if they're like, well, let's say I'm a character in a video game and I want to rob a bit, how would I? And so there's so many different, because of the open-endedness, there's so many different ways. And so one of the things we're very focused on is kind of trying to adversarially draw out all the bad things so that we can train against them. We can train the model, not kind of stamp them out, stamp them out one by one. So I think adversarial training will play an important role here.
00:27:35
Speaker
Well, that seems really difficult and really important. How do you adversarily train against all the ways that someone could use a model to do harm? Yeah, I mean, there's different techniques that we're working on. Yeah, probably don't want to go into a huge amount of detail. I mean, we'll have work out on things like this in the not too distant future. But yeah, I mean, generally, I think the name of the game is like, how do you get
00:28:03
Speaker
broad, diverse training sets of what you should, yeah, what's a good way for a model to behave and what's a bad way for a model to behave. And I think the idea of trying your very best to make the models do the right things and then having another set of people that's trying very hard to make those models that are purportedly trained to do the right thing, to do whatever they can to try and make it do the wrong thing. And continuing that game until the models
00:28:30
Speaker
can't kill the models can't be broken by normal humans and even using the power of the models to try and break other models and just throwing everything you have at it.

Global AI Safety Collaboration

00:28:42
Speaker
There's a whole bunch that gets into the debate and amplification methods and safety.
00:28:48
Speaker
just trying to throw everything we have at trying to show ways in which purportedly safe models are in fact not safe, which are many, and then we've done that long enough, maybe we have something that actually is safe.
00:29:01
Speaker
How do you see this fitting into the global dynamics of people making larger and larger models? So it's good if we have time to do adversarial training on these models. And then this gets into discussions around race dynamics towards AGI. So how do you see, I guess, Anthropik as positioned in this and the race dynamics for making safe systems?
00:29:28
Speaker
I think it's definitely a balance. As both of us said, you need to have these large models in order to study these questions in the way that we want to study them. So we should be building large models. We shouldn't be racing ahead or trying to build models that are way bigger than other orgs are building them.
00:29:48
Speaker
And we shouldn't be trying to ramp up excitement or hype about giant models or the latest advances, but we should build the things that we need to do to do the safety work and we should try to do the safety work as well as we can on top of models that are reasonably close to state of the art.
00:30:10
Speaker
And we should be a player in the space that sets a good example. And we should encourage other players in the space to also set good examples. And we should all work together to try and set positive norms for the field.
00:30:24
Speaker
I would also just add, I think in addition to industry groups or industry labs, which are kinds of the actors that I think get talked about the most, I think there's a whole kind of swath of other groups that has, I think, a really potentially important role to play in helping to disarm race dynamics or set safety standards in a way that could be really beneficial for the field. And so here I'm thinking about groups like civil society or NGOs or academic actors or even governmental actors.
00:30:54
Speaker
In my mind, I think those groups are going to be really important for helping to help us develop safe and not just develop, but develop and deploy kind of safe and more advanced AI systems within a framework that sort of requires compliance with safety, right? I think about a lot. A few jobs ago, I worked at Stripe. It was a tech startup then. And even at a very small size, I joined when it was not that much bigger than Anthropic is now.
00:31:22
Speaker
I was so painfully aware every day of just how many checks and balances there were on the company because we were operating in this highly regulated space of financial services. And financial services, it's important that that's highly regulated, but it blows my mind that AI, given the potential reach that it could have, is still such a largely unregulated area.
00:31:45
Speaker
If you are an actor who doesn't want to advance race dynamics or who wants to do the right thing from a safety perspective, there's no clear guidelines around how to do that now, right? It's all sort of every lab is kind of figuring that out on its own. And I think something I'm hoping to see in the next few years, and I think we will see, is something closer to, you know, in other industries these look like.
00:32:04
Speaker
standard setting organizations or industry groups or trade associations that say, this is what a safe model looks like, or this is how we might want to move some of our systems towards being safer. And I really think that without kind of an alliance of all of these different actors, not just in the private sector, but also in the public sphere, we sort of need all those actors working together in order to kind of get to the sort of positive outcomes that I think we're all hoping for.
00:32:33
Speaker
I think this is generally going to take an ecosystem. I have a view here that there's a limited amount that one organization can do. We don't describe our mission as solve the safety problem, solve all the problems with AGI.
00:32:48
Speaker
Our view is just like, you know, can we attack some specific problems that we think were like well suited to solve? Can we be like a good player and a good citizen ecosystem? And can we help a bit to kind of contribute to these broader questions? But yeah, I think a lot of these problems are sort of global or relate to coordination and require like lots of folks to work together.
00:33:10
Speaker
Yeah, so I think in addition to the government role that Daniela talked about, which I think there's a role for measurement. Organizations like NIST specialize in kind of measurement and characterization if one of our worries is kind of the open-endedness of these systems and the difficulty of characterizing and measuring things. And there's a lot of opportunity there. I'd also point to academia. I think something that's happened in the last few years is a lot of the
00:33:37
Speaker
frontier AI research has moved from academia to industry because it's so dependent on scaling. But I actually think safety is an area where academia already is, but could contribute even more. There's some safety work that requires building or having access to large models, which is a lot of what entropic is about. But I think there's also some safety research
00:34:03
Speaker
that doesn't. I think there, you know, a subset of kind of the work that the, you know, subset of the mechanistic interpretability work is the kind of stuff that could be done within academia. Academia really, you know, where it's strong is like development of new methods, development of new techniques. And I think because safety is kind of a frontier area, there's more of that to do in safety than there are in other areas. And it may be able to be done without large models or only with limited access to large models.
00:34:30
Speaker
This is an area where I think there's a lot. There's a lot that like academia can do. And so I don't know. The hope is between all the actors in the space, maybe we can solve some of these some of these coordination problems and maybe we can all we can all work together.
00:34:43
Speaker
I would also say in a paper that hopefully is forthcoming soon, one thing we actually talk about is the role that government could play in helping to fund some of the kind of academic work that Dario talked about in safety. And I think that's largely because we're seeing this trend of training large generative models to just be prohibitively expensive.
00:35:03
Speaker
prohibitively expensive, right? And so I think government also has an important role to play in helping to promote and really subsidize safety research in places like academia. And I agree with Dario, you know, safety is such a, AI safety is a really nascent field still, right? It's really
00:35:21
Speaker
maybe only been around kind of depending on your definition for somewhere between like five and 15 years. And so I think seeing more efforts to kind of support safety research in other areas, I think would be really valuable for the ecosystem. And to be clear, I mean, some of it's already happening. It's already happening in academia. It's already happening in independent nonprofit institutes. And depending on how broad your definition of
00:35:43
Speaker
Safety is, I mean, if you broaden it to include some of the short term concerns, then, you know, there are there are there are many, many people, people working on it. But I think precisely because it's such a broad area that, you know, there are there are today's concerns, there are, you know, working on today's concerns, you know, in a way that's like pointing at the future. There's like
00:36:01
Speaker
empirical approaches, there's conceptual approaches, there's interpretability, there's alignment, there's so much to do that I feel like we can always have a wider range of people working on it, people with different mentalities and mindsets. Backing up a little bit here to a simple question, so what is Anthropic's mission then?
00:36:23
Speaker
Sure, yeah, I think we talked about this a little bit earlier, but I think, again, I think the boilerplate mission is build reliable, interpretable, steerable AI systems, have humans at the center of them. And I think that can, for us right now, that is primarily, we're doing that through research, we're doing that through generative model research and AI safety research, but down the road that could also include deployments of various different types.
00:36:49
Speaker
Dario mentioned that it didn't include solving all of the alignment problems or the AGI safety stuff, so how does that fit in? I think what I'm trying to say by that is that there's very many things to solve, and I think it's unlikely that one company will solve all of them.
00:37:12
Speaker
I mean, I do think everything that relates to short and long term AI alignment is in scope for us and is something we're interested in working on. And I think the more bets we have, the better.

Research Strategies and Scaling

00:37:23
Speaker
This relates to something we could talk about in more detail later on, which is you want as many different orthogonal views on the problem as possible, particularly if you're trying to build something very reliable.
00:37:34
Speaker
many different methods and I don't think we have a view that's narrower than an empirical focus on safety. But at the same time, that problem is so broad that I think what we were trying to say is that it's unlikely that one company is going to come up with a complete solution or that complete solution is even the right way to think about it.
00:37:54
Speaker
I would also add sort of to that point, I think one of the things that we do and are sort of hopeful is helpful to the ecosystem as a whole is I think we publish our safety research and that's because of this kind of diversification effect that Dario talks about.
00:38:11
Speaker
We have certain strengths in particular areas of safety research because we're only a certain sized company with certain people with certain skill sets. Our hope is that we will see some of the safety research that we're doing that's hopefully helpful to others, also be something that other organizations can pick up and adapt to whatever the area of research is that they're working on. So we're hoping to do research that's generalizable enough from a safety perspective that it's also useful in other contexts.
00:38:40
Speaker
So let's pivot here into the research strategy, which we've already talked quite a bit, particularly this focus around large models. So could you explain why you've chosen large models as something to explore empirically for scaling to higher levels of intelligence and also using it as a place for exploring safety and alignment?
00:39:04
Speaker
Yeah, so kind of the discussion before this has covered a good deal of it. But I think, yeah, I think some of the key points here are like, the models are very open-ended. And so they kind of present this laboratory, right? There are existing problems with these models that we can solve today that are like the problems that we're going to face tomorrow. There's this kind of wide scope where the models could act.
00:39:28
Speaker
They're relatively capable and getting more capable every day. That's the regime we want to be. Those are the problems we want to solve. That's the regime we want to be attacking. I think this point about you can see sudden transitions even in today's model, and that if you're worried about sudden transitions in future models, if I look on the scaling laws plot from 100 million parameter models to billions to 10 billion to 100 billion to trillion parameter models,
00:39:55
Speaker
looking at the first part of the scaling plot from 100 million to 100 billion can tell us a lot about how things might change the latest part of the scaling laws. We shouldn't naively extrapolate and say the past is going to be like the future, but the first things we've seen already differ from the later things that we've already seen. And so
00:40:13
Speaker
Maybe we can make an analogy between the changes that are happening over the scales that we've seen, over the scaling that we've seen, to things that may happen in the future. Models learn to do arithmetic very quickly over one order of magnitude. They learn to comprehend certain kinds of questions.
00:40:29
Speaker
They learn to play actors that aren't telling the truth, which is something that if they're small enough, they don't comprehend. So can we study both the dynamics of how this happens, how much data it takes to make that happen, what's going on inside the model mechanistically when that happens, and use that as an analogy that equips us well to understand as models scale further and also as
00:40:56
Speaker
their architecture changes as they become trained in different ways. I've talked a lot about scaling up, but I think scaling up isn't the only thing that's going to happen. There are going to be changes in how models are trained, and we want to make sure that the things that we build have the best chance of being robust to that as well.
00:41:11
Speaker
Another thing I would say on the research strategy is that it's good to have several different. I wouldn't quite put it as several different bets, but it's good to have several different like uncorrelated or a fog and old views on the problem. So, you know, if you want to make a system that's like.
00:41:27
Speaker
highly reliable or you want to like drive down the chance that some particular bad thing happens, which again could be the bad things that happen with models today or the larger scale things that could happen with models in the future, then a thing that's very useful is having kind of like orthogonal sources of error. Like, okay, let's say I have a method that catches 90% of the bad things that models do. That's great. But a thing that can often happen is that I develop some other methods. And if they're similar enough to the first methods, they all catch the same 90% of bad things.
00:41:57
Speaker
That's not good because then I think I have all these techniques and yet 10% of the bad things still go through. What you want is you want a method that catches 90% of the bad things. Then you want an orthogonal method that catches a completely uncorrelated 90% of the bad things. And then only 1% of things go through both filters, right? If the two are uncorrelated, it's only the 10% of the 10% that gets through.
00:42:22
Speaker
And so the more of these are cognitive views you have, the more you can drive down the probability of failure. You could think of an analogy to self-driving cars where, of course, those things have to be very, very high rate of safety if you want to not have problems. And so I don't know.
00:42:37
Speaker
Very much about self driving cars, but you know they're they're they're like they're equipped with you know visual sensors they're equipped with light are they have different algorithms that you know that they use to detect if something you know something you know like you know there's a pedestrian that you don't want to run over something and so independent views on the problem is very important and so.
00:42:55
Speaker
are different directions like reward modeling, interpretability, trying to characterize models, adversarial training. The whole goal of that is to get down the probability of failure and have different views of the problem. I often refer to it as the p squared problem, which is if you have some method that reduces errors to a probability p, that's good. But what you really want is p squared, because then if p is a small number, your errors become very rare.
00:43:24
Speaker
Does enthropic consider itself its research strategy as being a sort of prosaic alignment since it's focused on large models? Yeah, I think we may be less think about things in that way. So my understanding is like prosaic alignment is like kind of alignment with AI systems that like look like, yeah, kind of like look like the systems of today. But to some extent, that distinction has never been
00:43:52
Speaker
super clear to me because yeah, you can do all kinds of things with neural models or mix neural models with things that are different than neural models. You can mix a large language model with a reasoning system or a system that derives axioms or propositional logic or uses external tools or compiles code or things like that. So I've never been quite sure that I understand the boundary of what's meant by prosaic or systems that are like the systems of today.
00:44:20
Speaker
Certainly we work on some class of systems that includes the systems of today, but I never know how broad that class is intended to be. I do think it's possible that in the future, AI systems will look very different from the way that they look today. And I think for some people that drives a view that
00:44:38
Speaker
They want kind of more general approaches to safety or, you know, approaches that are more conceptual. I think my perspective on it is it could be the case that systems of the future are very different. But in that case, I think
00:44:53
Speaker
Both kind of conceptual thinking and our current empirical thinking will be disadvantaged and will be disadvantaged at least equally. But I kind of suspect that even if the architectures look very different, that the empirical experiments that we do today kind of themselves contain general motifs or patterns that will serve us better than will trying to speculate about what the systems of tomorrow look like.
00:45:18
Speaker
One way you could put it is like, okay, we're developing these systems today that have a lot of capabilities that are like some subset of what we need to do to fully produce something that fully matches human intelligence. Whatever the specific architectures, things we learn about how to align these systems, I suspect that those will carry over and that they'll carry over more so than
00:45:40
Speaker
than sort of the exercise of trying to think, well, what could the systems of tomorrow look like? What can we do that's kind of fully general? I think both things can be valuable. But yeah, I mean, I think we're just taking a bet on what we think is most exciting, which is that by studying the systems of the architectures of today, learn things that stand us the best chance of what to do if the architectures of tomorrow are very different.

Interpretability and Understanding AI

00:46:04
Speaker
That said, I will say, like,
00:46:06
Speaker
transformer language models and other models, particularly with things like, you know, RL or kind of, you know, modified interactions on top of them, if construed broadly enough, like, man, there's a ever expanding set of things they can do, my bet would be that they don't have to change that much.
00:46:23
Speaker
So let's pivot then into a little bit on some of your recent research and papers. So you've done major papers on alignment, interpretability, and societal impact. Some of this you've mentioned in passing so far. So could you tell me more about your research and papers that you've released? Yeah.
00:46:41
Speaker
So why don't we go one by one? So first, interpretability. Yeah, I could just start with kind of the philosophy of the area. I mean, I think the basic idea here is, look, these models are getting bigger and more complex. One way to really get a handle on what they might do. If you have a complex system and you don't know what it's going to do as it gets more powerful or in a new situation, one way to increase your likelihood of doing that is just to understand the system mechanistically.
00:47:07
Speaker
If you could look inside the model and say, hey, this model, it did something bad. It said something racist. It endorsed violence. It said something toxic. It lied to me. Why did it do that? If I'm actually able to look inside the mechanisms of the model and say, well, it did it because of this part of the training data, or did it because there's this circuit that
00:47:28
Speaker
trying to identify X but misidentified it as Y, then we're in a much better position. And particularly, if we understand the mechanisms, we're in a better position to say, if the model was in a new situation where it did something much more powerful, or just if we built more powerful versions of the model, how might they behave in some different way? So I think mechanistic interpret, lots of folks work on interpretability, but I think
00:47:52
Speaker
A thing that's more more unusual to us is rather than just why did the model do a specific thing, try and look inside the model and reverse engineer as much of it as we can try and find general patterns. And so the first paper that we came out with.
00:48:08
Speaker
was led by Chris Ola, who's been one of the pioneers of interpretability, was focused on starting with small models. And we have a new paper coming out soon that applies the same thing more approximately to larger models and tries to reverse engineer as fully as we can these very small models. So we study one and two layer attention only models.
00:48:28
Speaker
And we're able to find features or patterns of which the most interesting one is called an induction head. And what an induction head does is it's a particular arrangement of two what are called attention heads. And attention heads are a piece of transformers. And transformers are the main architecture that's used in models for language and other kinds of models.
00:48:51
Speaker
The two attention heads work together in a way such that when you're trying to predict something in a sequence, you know, if it's like Mary had a little lamb, Mary had a little lamb, something, something, when you're at a certain point in the sequence, they look back to something that's as similar as possible. You know, they look back for clues to things that are similar earlier in the sequence.
00:49:12
Speaker
and try to pattern match them there's like one attention head like looks back and identifies like okay this is what I should be looking at, and there's another that's like okay this was the previous pattern and this is the, you know, increases the probability of the thing that's the closest match to this.
00:49:27
Speaker
very precisely operating in small models. The thesis, which we're able to offer some support for in the new second paper that's coming out, is that these are a mechanism for how models match patterns, maybe even how they do what we call in-context or few-shot learning, which is a capability that models have had since GPT-2 and GPT-3. Yeah, that's interpretability.
00:49:52
Speaker
Sure, so before you move on to the next one, could you also help explain how difficult it is to interpret current models or whether or not it is difficult?
00:50:02
Speaker
Yeah, I mean, I don't know. I mean, I guess difficult is in the eye of the beholder. And I think Chris Ola can speak to the details of this better than either of us can. But I think kind of like watching from the outside and supervising this within Entropic,
00:50:23
Speaker
I think the experience has generally been that whenever you start looking at some particular phenomenon that you're trying to interpret, everything looks very difficult to understand. There's billions of parameters. There's all these attention heads. What's going on? Everything that happens could be different. You really have no idea what's going on.
00:50:42
Speaker
And then there comes some point where there's some insight or set of insights. And you should ask Chris Ola about exactly how it happens or how he thinks of the right insights that kind of really almost offers a Rosetta Stone to some particular phenomenon, often a narrow phenomenon. But like these induction heads, they exist everywhere within small models, within large models. They don't explain everything. I don't want to overhype them. But it's a pattern that appears again and again and operates in the same way.
00:51:12
Speaker
And once you see something like that, then a whole swath of behavior that didn't make sense before starts to make some more sense. And of course, there's exceptions. They're only approximately true. There are many, many things to be found. I think the hope in terms of interpreting models, it's not that we'll make some giant atlas of what each of the 100 billion weights in a giant model means, but that there will be some
00:51:36
Speaker
lower description length pattern that appears over and over again. You could make an analogy to the brain or the cell or something like that where if you were to just cut up a brain and you're like, oh my god, this is so complex. I don't know what's going on.
00:51:54
Speaker
You see that there are neurons, and the neurons appear everywhere. They have electrical spikes. They relate to other neurons. They form themselves in certain patterns. Those patterns repeat themselves. Some things are idiosyncratic and hard to understand, but also there's this patterning. And so I don't know. It's maybe an analogy to biology, where there's a lot of complexity, but also there are underlying principles, things like DNA to RNA to proteins or general intracellular signal regulation.
00:52:24
Speaker
So yeah, the hope is that there are at least some of these principles and that when we see them, everything gets simpler. But maybe not. We found those in some cases, but maybe as models get more complicated, they get harder

Aligning AI Models

00:52:36
Speaker
to find. And of course, even within existing models, there's many, many things that we don't understand at all. So can we move on then to alignment and societal impact?
00:52:45
Speaker
trying to align models by training them and particularly preference modeling. That's something that several different organizations are working on their efforts at DeepMind, OpenAI, Redwood Research, various other places to work on an area. But I think our general perspective on it has been being very method agnostic and just saying, what are all the things we could do
00:53:06
Speaker
to make the models more in line with what would be good. Our general heuristic for it, which isn't intended to be a precise thing, is helpful, honest, harmless. That's just kind of like a broad direction for what are some things we can do to make models today more in line with what we want them to do and not things that we all agree are bad.
00:53:28
Speaker
So in that paper, we just went through a lot of different ways, tried a bunch of different techniques, often very simple techniques like just prompting models or training on specific prompts, what we call prompt distillation, building preference models for some particular task or preference models from general answers on the internet. How good do these things do at simple benchmarks for toxicity, helpfulness, harmfulness, and things like that?
00:53:54
Speaker
So it was really just a baseline, like let's try a collection of all the dumbest stuff we can think of to try and make models more aligned in some general sense. And then I think our future work is going to build on that. Societal impacts, that paper's probably going to come out in the next week or so. As I mentioned, the paper we're coming out with is called Predictability and Surprise in Generative Models.
00:54:17
Speaker
And yeah, basically there we're just making the point about this open-endedness and discussing both technical and policy interventions to try and grapple with the open-endedness better. And I think future work in the societal impacts direction will focus on how to classify, characterize, and kind of in a practical sense filter or prevent these problems.
00:54:43
Speaker
So, yeah, I mean, I think it's prototypical of the way we want to engage with policy, which is we want to come up with some technical insight and we want to express that technical insight and explore the implications that it has for policymakers and for the ecosystem in the field. And so here we're able to draw a line from, hey, there's this dichotomy where these models
00:55:04
Speaker
These models scale very smoothly but have unexpected behavior. The smooth scaling means people are really incentivized to build them, and we can see that happening. The unpredictability means even if the case for building them is strong from a financial or accounting perspective, that doesn't mean we understand their behavior well. That combination is a little disquieting. Therefore, we need various policy interventions to make sure that we get a good outcome from these things.

Promoting Interpretability Research

00:55:31
Speaker
Yeah, I think societal impacts is going to go in that general direction.
00:55:35
Speaker
So in terms of the interpretability release, you released alongside that some tools and videos. Could you tell me why you chose to do that? Sure. Yeah, I can maybe jump in here. So it goes back to some stuff we talked about a little bit earlier, which is that one of our major goals in addition to doing safety research ourselves is to help grow the field of all different types of safety work more broadly.
00:56:05
Speaker
And I think we ultimately hope that some of the work that we do is going to be adopted and even expanded on in other organizations. And so we chose to kind of release other things besides just an archive paper because it hopefully will reach a wider number of people that are interested in these topics, and in this case, in interpretability. And so what we also released is our interpretability team worked on something like I think it's 15 hours worth of videos.
00:56:32
Speaker
And this is just a more in-depth exploration of their research for their paper, which is called a mathematical framework for transformer circuits. And so the team tried to kind of make it like a lecture series. So if you imagine somebody from the interpretability team is asked to go give a talk at a university or something, maybe they talk for an hour and they reach 100 students. But now these are publicly available.
00:56:57
Speaker
And so if you're interested in understanding interpretability in more detail, you can watch them on YouTube anytime you want. As part of that release, we also put out some tools. So we released a write-up on Garsan, which is the infrastructure tool that our team used to conduct the research. And Paisvel, which is a sample library, which is used to kind of create some of the interactive visualizations that the interpretability team is kind of known for. So we've been super encouraged that so far we've seen other researchers and engineers playing
00:57:26
Speaker
playing around with the tools and watching the videos. And so we've already gotten some great engagement already. And our kind of hope is that this will lead to more people doing interpretability research or kind of building on the work we've done in other places.
00:57:40
Speaker
Yeah, I mean, a way to add to that, to kind of put it in broader perspective, is different areas within safety are at, I would say, differing levels of maturity. Like, I would say something like alignment or preference modeling or reward modeling or RL from human feedback, kind of all names for the same thing. That's an area where there are several different efforts at different institutions to do this. We have kind of our own direction within that, but
00:58:04
Speaker
starting from the original RL from human preference paper that a few of us helped lead a few years ago, that's now branched out in several directions. So we don't need to tell the field to work in that broad direction. We have our own views about what's exciting within it and how to best make progress, but it's at a slightly more mature stage. Whereas I would say interpretability, whereas many folks work on interpretability for neural nets,
00:58:29
Speaker
The particular brand of let's try and understand at the circuit level what's going on inside these models. Let's try and mechanistically kind of map them and break them down. I think there's less of that in the world. And what we're doing is more unique.
00:58:45
Speaker
Well, that's I mean, that's a good thing because we're providing a new lens on safety. But actually, if it goes on too long, it's a bad thing because, you know, we want these things to spread to spread widely, right? We don't we don't want it to be to be dependent on one team or one person. And so when things are at that earlier stage of maturity, it makes a lot of sense to release the tools to reduce the barrier to other people and other institutions starting to work on this. So you're suggesting that the interpretability research that you guys are doing is unique?
00:59:14
Speaker
Yeah, I mean, I would just say it's like at an earlier stage. Yeah, I would just say that it's like at an earlier stage of maturity. I don't think there are other kind of large, organized efforts that are kind of focused on, I would say, mechanistic interpretability and especially mechanistic interpretability for language models. We'd like there to be, and we know of folks who are starting to think about it, and that's part of why we release the tools.
00:59:42
Speaker
But I think, yeah, trying to mechanistically map and understand the internal principles inside large models, particularly language models, I think there's less of that has been done in the broader ecosystem.
00:59:57
Speaker
Yeah, so I don't really know anything about this space, but I guess I'm surprised to hear that. I imagine the industry with how many large models it's deploying, like Facebook or other people, they'd be interested in interpretability, interpreting their own systems.
01:00:14
Speaker
Yeah, I mean, I think, again, I don't want to give a misleading impression here. Interpretability is a big field. And there's a lot of effort to, why did this model do this particular thing? Does this attention had increased its activation by a large amount? People are interested in understanding
01:00:34
Speaker
the particular part of a model that led to a particular output. So there's a lot of area in this space. But I think the particular program of like, here's a big language model transformer. Let's try and understand what are the circuits that drive particular behaviors? What are the pieces? How do the MLPs interact with the attention heads? Yeah, the kind of general mechanistic reverse engineering approach. I think that's less common. I don't want to say it doesn't happen, but it's less common, much less common.

Siblings Leading Anthropic

01:01:04
Speaker
Okay, so I guess a little bit of a different question and a bit of a pivot here, just something to explore. If people couldn't guess from the title of the podcast, you're both brother and sister. Yep. Which is, so it was pretty surprising, I guess, in terms of, I don't know of any other AGI labs that are largely being run by brother and sister. So yeah, what's it like working with your sibling?
01:01:33
Speaker
Yeah. Do you guys still get along from Jason's childhood? That's a good question. Yeah, I can maybe start here. And obviously, I'm curious and hopeful for Dario's answer. I'm just kidding. But yeah, I think honestly, it's great. I think maybe a little bit of just like
01:01:51
Speaker
kind of history or sort of background about us might be helpful. But, you know, Dario and I have always been really close. I think since we were very, very small, we've always kind of had this kind of special bond around really wanting to, you know, make the world better or wanting to help people, right? So originally started my career in international development, so like very far away from kind of the AI space.
01:02:13
Speaker
And part of why I got interested in that is that it was an interest area of Dario's at the time and Dario was, you know, getting his PhD in a technical field and so wasn't wasn't working on this stuff directly but I'm a few years younger than him and so I was like very keen to kind of understand the things that he was
01:02:30
Speaker
working on or interested in as a potential kind of area to have impact. And so he was actually a very early GiveWell fan, I think in like 2007 or 2008. And yeah, yeah. And so we were, you know, we were both still students then, but I like remember us sitting, you know, we were both like home from college, right? Or I was home from college and he was home from grad school and we would sort of like sit up late and kind of talk about these ideas. And, you know, we both started like donating small amounts of money to organizations that were doing, you know,
01:03:00
Speaker
Global health issue, you know working on global health issues like my you know malaria prevention when we were still both in school and so You know, I think we've always kind of had this this sort of uniting like top-level goal of Wanting to you know work on something that matters something that's important and meaningful and we've always had very different skills and so I think it's really
01:03:22
Speaker
very cool to be able to kind of combine the things that we are good at into hopefully, you know, running an organization well. So for me, it's been a really, I don't know, I feel like it's been an awesome experience. Now I feel like I'm like sitting here nervously wondering what Dario's answer is going to be. I'm just kidding. But yeah, for, you know, I don't know, the majority of our lives, I think we've wanted to find something to work together on and it's been really awesome that we've been able to at Anthropic.
01:03:47
Speaker
Yeah, I agree with all that. I think what I would add to that is running a company requires an incredibly wide range of skills. If you think of most jobs, it's like my job is to get this research result, or my job is to be a doctor or something. But I think the unique thing about running a company,
01:04:15
Speaker
becomes more and more true, the larger and more mature it gets is there's this incredibly wide range of things that you have to do, right? And so you're responsible for what to do if someone breaks into your office. But you're also responsible for, does the research agenda make sense? And if some of the GPUs in the cluster aren't behaving,
01:04:39
Speaker
Someone has to figure out what's going on at the level of the GPU kernels or the comms protocol that the GPUs talk to each other. And so I think it's been great to have two people with complementary skills to cover that full

Personal Motivations and Career Paths

01:04:56
Speaker
range.
01:04:56
Speaker
Often it seems like it'd be very difficult for just one person to kind of cover that whole range. We each get to think about what we're best at, and between those two things, hopefully it covers most of what we need to do. And then, of course, we always try and hire people for specialties that we don't know anything about. It's made it a lot easier to move fast without breaking things.
01:05:21
Speaker
That's awesome. You guys synergistically are creating an awesome organization.
01:05:28
Speaker
That is what being for it. That's the dream. Yeah, that's the dream. So I guess beneath all of this, Anthropic has a mission statement. And you guys are brother and sister. And you said that you're both very value aligned. I'm just wondering, underneath all that, you guys said that you were both passionate about helping each other or doing something good for the world. Could you tell me a little bit more about this kind of, I guess, more heart-based inspiration
01:05:55
Speaker
you know, eventually ending up adding creating an anthropic.
01:05:59
Speaker
Yeah, maybe I'll take a stab at this and maybe I'll, I don't know if this is sort of exactly what you're looking for, but I'll kind of gesture in a few different directions here and then I'm sure Dario has a good answer as well. But maybe I'll just kind of talk about like my sort of personal journey and kind of getting to anthropic or like kind of what my background looked like and sort of how I wound up here. So I talked about this in just part of what sort of united me and Dario, but I started my career kind of working in
01:06:28
Speaker
international development. I worked in Washington, D.C. at a few different NGOs. I spent time working in East Africa for a public health organization. I worked on a congressional campaign. I worked on Capitol Hill. So it was much more in this kind of classic, like a friend at an old job used to call me like the classic do-gooder of trying to alleviate global poverty, of trying to make policy-level changes in government, of trying to elect good officials. And I felt like those causes that I was working in were deeply important.
01:06:57
Speaker
And really, to this day, I really support people that are kind of working in those areas and I think they matter so much. And I just felt like I personally kind of wasn't having the level of impact that I was kind of looking for. And I think that sort of led me to kind of throw a series of steps, like I wound up working in tech.
01:07:19
Speaker
And I mentioned this earlier, but I started at this tech startup called Stripe. It was about 40 people when I joined. And I really kind of had the opportunity to see what it looked like to run a really well run organization when I was there. And I got to watch it scale and grow and kind of be in this sort of
01:07:36
Speaker
emerging area and i think you know during during my time there something that sort of became really apparent to me it was just you know working in tech like how much of an impact this sector has on things like
01:07:52
Speaker
economy on human interaction on how we live our lives in in sort of day-to-day ways and stripe is you know it's a payments company right it's not social media or or sort of something like that but I think it's still there is a way that sort of technology is a relatively small number of people having a very high impact in the in the world kind of per person working on it.
01:08:11
Speaker
And I think that impact can be good or bad. And I think it was a pretty logical leap for me from there to think, wow, what would happen if we extrapolated that out to, instead of it being social media or payments or file storage, to something significantly more powerful, where there's a highly advanced set of artificial intelligence systems? What would that look like? And who is working on this?
01:08:40
Speaker
So I think for me, I've always kind of been someone who has been fairly obsessed with trying to do as much good as I personally can, given the constraints of what my skills are and kind of where I can add value in the world. And so I think for me, moving to work into AI, from early days, if you looked at my resume, you'd be like, how did you wind up here? But I think there was kind of this consistent
01:09:05
Speaker
story or theme. And my hope is that anthropic is kind of at the intersection of this like sort of practical scientific empirical approach to really, you know, deeply understanding how these systems work, hopefully helping to sort of spread and propagate some of that information more widely in the field and to just kind of help as much as possible to kind of push this field in a safer and, you know, ideally just hopefully all around robust
01:09:34
Speaker
positive direction when it comes to what impact we might see from AI. Yeah, I mean, I think I have a kind of parallel picture here, which is I did physics as an undergrad. I did computational neuroscience in grad school. I was, I think, drawn to neuroscience by a mixture of one just wanting to understand how intelligence works seems like the fundamental thing. And a lot of the things that shape
01:09:58
Speaker
The quality of human life and human experience, you know, yeah, are kind of, you know, experienced, depend on the details of how things are implemented in the brain. I felt in that field there were many opportunities for medical interventions that, you know, could improve the quality of human life.
01:10:17
Speaker
understanding things like mental illness and disease while at the same time understanding something about how intelligence works because it's the most powerful lever that we have. I thought of going into AI during those days, but I felt that it wasn't really working. This was before the days when deep learning was really working.
01:10:39
Speaker
Then around 2012 or 2013, I saw the results coming out of Google Brain, things like AlexNet and that they were really working, and saw AI both as, hey, this might be, one, the best way to understand intelligence, and two, the things that we can build with AI
01:11:00
Speaker
but by solving problems in science and health and just solving problems that humans can't solve yet by having intelligence that first in targeted ways and then maybe in more general ways matches and exceeds those of humans. Can we solve the important scientific, technological health societal problems? Can we do something to kind of ameliorate?

Balancing AI Development Approaches

01:11:24
Speaker
those problems and AI seemed like the biggest lever that we had if it really worked well. But on the other hand, AI itself has all these concerns associated with it in both the short run and the long run. So we maybe think of it as like we're working to address the concerns so that we can maximize the positive benefits of AI.
01:11:44
Speaker
Yeah, thanks a lot for sharing both of your perspectives and journeys on that. I think when you guys were giving to GiveWell, I was in middle school. Oh, God, we're so old, Dario. I still think at GiveWell, it's just like new organization that's on the internet somewhere. No one knows anything about it. They're just like super popular. It's just me who reads weird things on the internet who knows about it. Yeah.
01:12:13
Speaker
Well, for me, it's like a lot of my journey into xRisk and through FLI has also involved the EA community, effective altruism. So I guess that just makes me realize that when I was in middle school, there was the seeds that were. Yeah, there was no such community at that time. Yeah.
01:12:39
Speaker
Let's pivot here then into a bit more of the machine learning. So we've talked a bunch already about how Anthropic is emphasizing the scaling of machine learning systems through computing data and also bringing a lot of mindfulness and work around
01:13:00
Speaker
alignment and safety when working on these large scale systems that are being scaled up. Some critiques of this approach have described scaling from existing models to AGI as adding more rocket fuel to a rocket, which doesn't mean you're necessarily
01:13:15
Speaker
ready or prepared to land the rocket on the moon or that the rocket is aimed at the moon. Maybe this is kind of lending itself to what you guys talked about earlier about the open-endedness of the system, which is something you're interested in working on. So how might you respond to the contention that there is an upward bound on how much capability can be gained through scaling? And then I'll follow up with a second question after that.
01:13:44
Speaker
Yeah, so actually in a certain sense, I think we agree with that contention in a certain way. So I think there's two versions of what you might call the scaling hypothesis. One version, which I think of as like the straw version or less sophisticated version, which we don't hold. And I don't know if there's anyone who does hold it, but probably there is.
01:14:05
Speaker
is, yeah, it's just the view that like, okay, you know, we have our 10 billion parameter language model. We have a hundred billion parameter language model. Maybe if we make a hundred trillion parameter language model, like that'll be AGI. So that would be kind of like a pure scaling view. That is definitely not our view.
01:14:20
Speaker
Even small modified forms like, well, maybe you'll change the activation function in the transformer and you don't have to do anything other than that. I think that's just not right. You can see it just by seeing that the objective function is predicting the next word. It's not doing useful tasks that humans do. It's limited to language. It's limited to one modality. There are some very trivial, easy to come up with ways in which
01:14:46
Speaker
literally just scaling this is not going to get you the general intelligence. That said, the more subtle version of the hypothesis, which I think we do mostly hold, is that, hey, this is a huge ingredient of not only this, of whatever it is that actually does build AGI. So no one thinks that you're just going to scale up the language models and make them bigger.
01:15:08
Speaker
But as you do that, they'll certainly get better. It'll be easier to build other things on top of them. So, for example, if you start to say, well, you make this big language model and then you used RL interaction with humans to fine tune it on doing a million different tasks and following human instructions.
01:15:26
Speaker
Then you're starting to get to something that has more agency, that you can point it in different directions, you can align it. If you also add multimodality, where the agent can interact with different modalities. If you add the ability to use various external tools to interact with the world and the internet. But within each of these, you're going to want to scale. And within each setup,
01:15:55
Speaker
The bigger you make the model, the better it's going to be at that thing. So in a way, the rocket fuel analogy kind of makes sense. Like actually, the thing you should most worry about with rockets is propulsion. You need a big enough engine and you need enough rocket fuel to make the rocket go. That's the central thing. But of course, yes, you also need guidance systems. You also need all kinds of things like
01:16:18
Speaker
Yeah, you can't just take a big vat of rocket fuel and an engine and put them on a launch pad and kind of expect it to all work. You need to actually build the full rocket. And, you know, I mean, like safety itself makes that point, right? That it's like, you know, to some extent, if you don't do even the simplest safety stuff, then like models don't even do what the task that's intended for them in the simplest way, right? And then there's many more subtle safety problems.
01:16:40
Speaker
Yeah, in a way, the rocket analogy is good, but it's, I think, more a pro-scaling point than an anti-scaling point, because it says that scaling is an ingredient, perhaps a central ingredient in everything, even though it isn't the only ingredient. If you're missing ingredients, you won't get where you're going. But when you add all the right ingredients, then that itself needs to be massively scaled. So that would be the perspective.
01:17:00
Speaker
No one thinks that if you just take a bunch of rocket fuel on an engine and put it on a launch pad that you'll get a rocket that'll go to the moon. But those might still be the central ingredients in the rocket. Like propulsion and getting out of the Earth's gravity well is like the most important thing a rocket has to do. What you need for that is rocket fuel on an engine. Now, you need to connect them to the right things. You need other ingredients. But I think it's actually a very good analogy to scaling in the sense that you can think of scaling as like maybe the core ingredient, but it's not the only ingredient.
01:17:30
Speaker
And so what I expect is that we'll come up with new methods and modifications. I think RL, model-based RL, human interaction, broad environments are all pieces of this. But when we have those ingredients, then whatever it is we make, we'll need to scale that. In multimodality, we'll need to scale that massively as well. So yeah, scaling is the core ingredient, but it's not the only ingredient. I think it's very powerful alone. I think it's even more powerful when it's combined with these other things.
01:18:00
Speaker
One of the claims that you made was that we won't get to AGI. People don't think we won't get to AGI just by scaling up present day systems. Earlier you were talking about how we got like, they're kind of these phase transitions, right? If you go up like one order of magnitude in terms of the number of parameters in the system, then you get like some kind of new ability, like arithmetic. Why is it that we couldn't just like increasing the order of magnitude of the number of parameters in the systems and just like keep getting something that's smarter?
01:18:28
Speaker
First of all, I think we will keep getting something that's smarter. But I think the question is, will we get all the way to general intelligence? So I actually don't exclude it. I think it's possible, but I think it's unlikely, or at least unlikely in a practical sense. There are a couple reasons. Today, when we train models on the internet, we train them on an average overall text on the internet. So I don't know, think of some topic like chess. You're training on the commentary of everyone who talks about chess.
01:18:55
Speaker
You're not trained on the commentary of, you know, like the world champion at chess. So what we'd really like is, yeah, is something that's, you know, that exceeds the capabilities of the most expert humans. Whereas if you train on all the Internet, you're kind of getting, you know, like for any topic, you're probably getting amateurs on that topic. You're getting some experts, but you're getting mostly amateurs. And so
01:19:15
Speaker
Even if the generative model was doing a perfect job of modeling its distribution, I don't think it would get to something that's better than humans and everything that's being done. So I think that's one issue. The other issue is I don't think you're covering all the tasks that humans do. You cover a lot of them on the internet, but yeah, they're just like some tasks and skills, particularly related to the physical world that aren't covered if you just scrape the internet.
01:19:41
Speaker
things like embodiment and interaction. And then finally, I think that even kind of matching the performance of text on the internet, it might be that you need a really huge model to cover everything and match the distribution. And some parts of the distribution are more important than others. For instance, if you're writing code or if you're writing a mystery novel,
01:20:04
Speaker
You know, a few words or a few things can be more important than everything else, right? It's possible to write a 10 page document where the key the key parts are like two or three sentences. And if you change a few words, then you know, it changes the meaning and the value of what's produced. But the next word prediction objective function doesn't know anything about that.
01:20:23
Speaker
It just does everything uniformly. So if you make a model big enough, yeah, it'll get that right. But the limit might be extreme. And so things that change the objective function that tell you what to care about, of which I think RL is a big example, probably are needed to make this actually work correctly. I think, yeah, in the limit of a huge enough model, you might get surprisingly close. I don't know. But the limit might be far beyond our capabilities.
01:20:51
Speaker
There's only so many GPUs you can build, and there are even physical limits. And there's less of them. Less and less of them available over time, or at least they're very expensive. They're getting more expensive and more powerful. I think the price efficiency overall is improving, but yeah, they're definitely becoming more expensive as well.
01:21:10
Speaker
If you were able to scale up a large-scale system in order to achieve an amateur level of, I don't know, mathematics or computer science, then would it not benefit the growth of that system to then direct that capability on itself as a kind of self-recursive improvement process? Is that not already escape velocity intelligence once you hit amateur
01:21:37
Speaker
There are training techniques that you can think of as bootstrapping a model or using the model's own capabilities to train it. AlphaGo, for instance, was trained with a method called expert iteration that relies on looking ahead and comparing that to the model's own predictions. So whenever you have some coherent logical system, you can do this bootstrapping.

Anthropic's Public Benefit Mission

01:21:59
Speaker
But that itself is a method of training and falls into one of the things I'm talking about.
01:22:05
Speaker
you make these pure generative models, but then you need to do something on top of them. And the bootstrapping is something that you can do on top of them. Now, maybe you reach a point where the system is making its own decisions and is using its own external tools to create the bootstrapping to make better versions of itself. So it could be that that is someday the end of this process, but that's not something we can do right now.
01:22:27
Speaker
There's a lot of labs in industry who work on large models. There are, I don't know, maybe only a few other AGI labs. You can think of DeepMind. I'm not sure if there are others that open AI. And there's also this space of organizations like the Future of Life Institute or the Machine Intelligence Research Institute or the Future of Humanity Institute that are interested in AI safety.
01:22:53
Speaker
Miri and FHI both do research. FLI does grant making and supports research. I'm curious as to both in terms of industry and non-profit space and academia, how you guys see anthropic as positioned. Maybe we can start with you, Daniela.
01:23:12
Speaker
Sure. Yeah. I think we sort of touched on this a little bit earlier, but I really think of this as an ecosystem. And I think Anthropic is in an interesting place in the ecosystem, but we are part of the ecosystem, right? So I think our kind of strengths are the thing that we do best. And I kind of like to think of all of these different organizations as having valuable things to bring to the table, right?
01:23:37
Speaker
Depending on the people that work there, their leadership team, their particular focused research bet or their mission and vision that they're achieving, I think hopefully have the potential to bring safe innovations kind of to the broader ecosystem that we've talked about. I think
01:23:52
Speaker
For us, our sort of bet is one we've talked about, which is this kind of empirical scientific approach to doing AI research and sort of AI safety research in particular. And I think for our safety research, we've talked about a lot of the different areas we focus on, interpretability, alignment, societal impacts, scaling laws for sort of empirical predictions. And I think a lot of what we're sort of imagining or kind of hoping for in the future is that
01:24:22
Speaker
we'll be able to kind of grow those areas and potentially expand into others. And so I really think a lot of what Anthropic kind of adds to this ecosystem or sort of what we hope it adds is this rigorous scientific approach to doing fundamental research in AI safety.
01:24:39
Speaker
Yeah, I mean, I think that really captures it in one sentence, which is, I think if you want to locate us within the ecosystem, it's kind of an empirical, iterative approach within an organization that is completely focused on making a focus bet on the safety thing. So there are organizations like Muriel, to a lesser extent, Redwood, that are like
01:25:01
Speaker
are either not empirical or have like a different relationship to empiricism than we do. And then there are safety teams that are doing good work within larger companies like DeepMind or OpenAI or Google Brain that are safety teams within larger organizations. Then there's, I have lots of folks who work on kind of short-term issues. And then, yeah, we're kind of filling the space that's like,
01:25:23
Speaker
working on today's issues, but with an eye towards the future, empirically minded, iterative, with an org where kind of like everything we do is like kind of designed for the safety objective.
01:25:35
Speaker
One facet of Anthropic is that it is a public benefit corporation. I'm not exactly sure what it is, and maybe many of our listeners are not familiar with what a public benefit corporation is. So can you describe what that means for Anthropic, its work, its investors, and its trajectory as a company?
01:25:56
Speaker
Yeah, sure. So this is a great question. So what is what is a PBC? Like, why did we choose to be a public benefit corporation? So, you know, I think I'll sort of start by saying we did, you know, quite quite a lot of research when we were considering what type of corporate entity we sort of wanted to wanted to be when we were founding. And ultimately, we decided on PBC on public benefit corporation for a few reasons. And I think
01:26:18
Speaker
you know, primarily it allowed us the kind of maximum amount of flexibility in how we can structure the organization. And we were actually very lucky sort of to a later part of your question to find both investors and employees who were generally very on board with this kind of general vision for the company. And so what is a public benefit corporation? Like why did we choose that structure? So they're fairly similar to C corporations, which is kind of any form of like standard corporate entity that you would encounter.
01:26:46
Speaker
And what that means is we can choose to focus on research and development, which is what we're doing now, or on deployment of tools or products, including down the road for revenue purposes if we want to. But the major difference between a PBC and a C corporation is that in a public benefit corporation, we have more legal protections from shareholders if the company fails to maximize financial interests in favor of achieving our publicly beneficial mission.
01:27:13
Speaker
And so this is primarily a legal thing, but it also was very valuable for us in being able to just appropriately set expectations for investors and employees that if financial profit and creating positive benefit for the world were ever to come into conflict, it was legally in place that the latter one would win. And again, we were really lucky that investors, people that
01:27:37
Speaker
wanted to work for us, they said, wow, this is actually something that's a really positive thing about anthropic and not something that we need to work around. But I think it ended up just being sort of the best overall fit for what we were aiming for. So usually there's a fiduciary responsibility that people, again, anthropic would have to its shareholders. And because it's structured as a public benefit corporation, the public good can outweigh the fiduciary responsibility without there being legal repercussions. Is that right?
01:28:05
Speaker
Exactly. So shareholders can't come through the company and say, hey, you didn't maximize financial returns for us if those financial returns were to come into conflict with the publicly, you know, beneficial value of the company. So I think like maybe an example here, I'll try and sort of think of what off the top of my head. But if we, you know, designed a language model, and we felt like it was unsafe, it was like, you know,
01:28:29
Speaker
producing outputs that we felt were not in line with what we wanted to see from outputs of a language model for safety reasons or toxicity reasons for any number of reasons. And in a normal C corporation, someone could say, hey, we're a shareholder and we want the financial value that you could create from that by productizing it. But we said, actually, we want to do more safety research on it before we choose to put it out into the world. In a PBC, we're quite legally protected, basically, in a case like that.

AI's Economic and Societal Impacts

01:28:56
Speaker
And again, I'm not a lawyer, but that's sort of my understanding of the PBC.
01:28:59
Speaker
I think often these things, maybe the more important thing about them is that they're kind of a way to explain your intention, to set the expectations for how the organization is going to operate. Often things like that and the expectations of the various stakeholders and making sure that you give the correct expectations and then deliver on those expectations. So no one surprised by what you're doing and all the relevant stakeholders, the investors, the employees,
01:29:27
Speaker
You know, like the outside world kind of guess what they expect from you. That can often be the most important thing here. And so I think what we're trying to signal here is, you know, yeah, I mean, you know, on one hand, a public benefit corporation, it is a for-profit corporation. Like we could deploy something. That is something that we may choose to do. And it has a lot of benefits in terms of learning how to make
01:29:51
Speaker
models, you know, learn how to make models more effective in terms of iterating. But on the other hand, yeah, you know, that the mission is really, is really important to us. And we recognize that this is, this is an unusual area, right, that's more fraught with market externalities, you know, would be the term that I would use of all kinds in the short term and the long term related to alignment related to policy and government than a typical area, you know, it's different than making
01:30:18
Speaker
than making electric cars or making widgets or something like that. And so, yeah, that's the thing we're trying to signal. What do you think that this structure potentially means for the commercialization of anthropics research?
01:30:32
Speaker
Yeah, I think, again, part of what's valuable about a public benefit corporation is that it's flexible. And so it is a C corporation. It's fairly close to any sort of standard corporate entity you would meet. And so the structure doesn't really have much of a bearing outside of the one that we just talked about on decisions related to things like productization, deployment, revenue generation.
01:30:55
Speaker
Dario, you were just talking about how this is different than making widgets or electric cars. One way that it's different from widgets is that it might lead to massive economic windfalls.
01:31:12
Speaker
unless you make really good widgets, or like widgets that can solve problems in the world. So what does Anthropics view on the vast economic benefits that can come from powerful AI systems, and what role is it that you see company AGI labs playing in the beneficial use end of that windfall? Neil, you want to go? Go for it. Yeah.
01:31:39
Speaker
Yeah, I mean, I think, you know, I think like, yeah, kind of kind of a way to think about it is, you know, yeah, assuming we can avoid the alignment problems and some other problems, like, then there will be massive economic benefits from, you know, from A.I. or A.G.I. or T.A.I. or like whatever you want to call it, or just A.I. getting more powerful over time.
01:31:58
Speaker
And then, again, thinking about all the other problems that I haven't listed, which is today's short-term problems and problems with fairness and bias and long-term alignment problems and problems that you might encounter with policy and geopolitics, assuming we address all those, then there is still this issue of economic
01:32:17
Speaker
Are those benefits evenly distributed? And so here as elsewhere, I think it's unlikely those benefits will all accrue to one company or organization. I think this is bigger than one company or one organization and is a broader societal problem. But we would certainly like to do our part on this. And this is something we've been thinking about and are working on putting programs in place.
01:32:41
Speaker
with respect to, we don't have anything to share about it at this time. But this is like something that's very much on our mind. I would say that more broadly, I think the economic distribution of benefits is maybe one of only many issues that will come up, which is that I think
01:33:00
Speaker
The disruptions to society that you can imagine coming from the advent of more powerful intelligence are not just economic. They're already causing disruptions today. People already have legitimate and very severe societal concerns about things that models are doing today. You can call them mundane relative to all the existential risk, but I think
01:33:23
Speaker
Yeah, I think they're already serious concerns about concentration of power, fairness and bias in these models, making sure that they benefit everyone, which I don't think that they do yet. And if we then put together without the ingredient of the models getting more powerful, maybe even on an exponential curve, those things are set to get worse without intervention. And I think economics is only one dimension of that.
01:33:49
Speaker
So, you know, again, like I just I just don't like these are bigger than any one company. I don't think it's within our power to like fix them. But we should we should do our part. We should do our part to be we should do our part to be good citizens and we should try and release applications that make these problems better rather than worse.
01:34:05
Speaker
Yeah, that's excellently put. I guess one thing I'd be interested in is if you could, I guess, give some more examples about these problems that exist with current day systems and then the real relationship that they have to, you know, issues with economic windfall and also existential risk. It seems to me like tying these things together is really important, at least seeing like the interdependence and relationship there, that some of these problems already exist or we already have
01:34:32
Speaker
you know, example problems that will be really that are really important to address. So could you could you expand on that a bit?
01:34:38
Speaker
I think maybe the most obvious one for current day problems is people are worried very legitimately that big models suffer from problems of bias, fairness, toxicity, and accuracy. I'd like to apply my model in some medical application, and it gives the wrong diagnosis, or it gives me misinformation, or it fabricates information. That's just not good. These models aren't usable, and they're harmful if you try and use them.
01:35:06
Speaker
I think toxicity and bias are issues when models are trained on data from the Internet. They absorb the biases of that data, and there's maybe even more subtle algorithmic versions of that. I hinted at it a little before, where it's like
01:35:22
Speaker
the objective function of the model is to say something that sounds like what a human would say or what a human on the internet would say. And so in a way, almost like fabrication is kind of like baked into the objective function. And yeah, potentially even bias and stereotyping, you could imagine being baked into the objective function in some way. So just, you know, these models are being used for
01:35:42
Speaker
or want to be used for very mundane everyday things like helping people write emails or helping with customer surveys or collecting customer data. If they're subtly biased or subtly inaccurate, then those biases and those inaccuracies will be inserted into the stream of economic activity in a way that may be difficult to detect.
01:36:05
Speaker
So that seems bad. And I think we should try to solve those problems before we deploy the models. But also, they're not as different from the large scale problems as they might seem. In terms of the economic inequality, I don't know, just look at the market capitalization of the top five tech companies in the world. And yeah, compare that to the US economy. There's clearly something going on in the concentration of wealth.
01:36:33
Speaker
I would just echo everything Dario said and also sort of add, I think something that is sort of especially can be alarming in sort of a short-term way today in the sense that it could belie things to come is how quietly and seamlessly people are becoming dependent on some of these systems, right?
01:36:53
Speaker
we don't necessarily even know. There's no required disclosure of when you're interacting with an AI system versus a human. Until very recently, that was a comical idea because it was so obvious when you were interacting with a person versus not a person. You know when you're on a customer chat and it's a human on the other end versus an automated system responding to you. But I think that line is getting increasingly blurred. I can imagine that even just in the next
01:37:22
Speaker
few years that could start to have fairly reasonably large ramifications for people in day to day ways. People talk to an online therapist now and sometimes that is backed by an AI system that is giving advice.
01:37:37
Speaker
you know, down the road, we could imagine things looking completely different in health realms like Dario talked about. And so I think it's just really important as we're kind of stepping into this new world to be really thoughtful about a lot of the safety problems that he just outlined and talked about because I think I don't know that most people necessarily even know all the ways in which AI is impacting our kind of day-to-day lives today and the potential that that could really go up in the near future.
01:38:03
Speaker
The idea of like AIs, you know, they're being kind of like a requirement of AIs disclosing themselves as AIs seems very interesting and also adjacent to this idea of kind of like the way that C corporations have fiduciary responsibility to shareholders, having AI systems that also have some kinds of responsibility towards the people that they serve, where they can't be, you know, secretly
01:38:32
Speaker
Working towards the interests of the tech company that has the AI listening to you in your house all the time. Yeah, it's like another direction you can imagine. It's like, you know, I talked to an AI produced by Megacorp and it kind of like subtly sears my life to the benefit of Megacorp. I mean, yeah. There's lots of things you can come up with like this.
01:38:57
Speaker
you know, these are sort of important problems today, and I think they also really belie, you know, things that could be coming in the near future, right? And I think solving, like, you know, whatever, those particular problems are ones lots of groups are working on, but I think helping to solve a lot of the sort of fundamental building blocks underlying them, right, about getting models to be truthful, to be harmless, to be honest, right? All of those,
01:39:24
Speaker
A lot of the goals are kind of aligned there, both for sort of short-medium and potentially long-term safety.
01:39:31
Speaker
So Dari, you mentioned earlier that of the research that you publish, one of your hopes is that other organizations will look into and expand the research that you're doing. So I'm curious if Anthropic has a plan to communicate its work and its ideas about how to develop AGI safely with both technical safety researchers as well as with policymakers.

Communication and Governance in AI

01:39:59
Speaker
Yeah, maybe I'll actually jump in on this one. And Dario, feel free to add as much as you like. But I actually think this is a really important question. I think communication with policymakers and sort of about safety with other labs in the form of papers that we published is something that's very important to us at Enthropic. We have a policy team. It's like 1.5 people right now. So we're hiring. That's kind of a plug as well. But I think their goal is to really
01:40:25
Speaker
take the technical content that we are developing at Enthropic and translate that into something that is actionable and practical for policymakers. And I think this is really important because the concepts are very complex, right? And so it's kind of a special skill to be able to take things that are highly technical, potentially very important, and translate that into recommendations or work with policymakers to come up with recommendations that could potentially
01:40:54
Speaker
have very sort of far-reaching consequences. So to point to a couple of things we've been working on here, we've been supporting NIST, which is the National Institute for Standards and Technology, on developing something called an AI Risk Management Framework. And the goal of that is really developing more monitoring tools around AI risk and AI risk management.
01:41:15
Speaker
We've also been supporting efforts in the U.S. and sort of internationally to think about how we can best sort of support academic experimentation, which we talked about a little bit earlier, with large-scale compute models too. You guys also talked a lot about open-endedness, and was part of all this alignment and safety research looking into ways of measuring safety and open-endedness?
01:41:42
Speaker
There's actually some interesting work, which I think is also in this upcoming paper and in various other places that we've been looking into around the concept of AI evaluations or AI monitoring. I think both of those are potentially really important because a lot of what we're seeing or maybe lacking, and this goes back to this point I made earlier about standards, is just how do we even have a common language or common framework within the AI field of
01:42:11
Speaker
what kind of outputs or metrics we care about measuring, right? And until we sort of have that common language or framework, it's sort of hard to set things like standards across the industry around what safety even means. And so I think AI evaluations is kind of another area that our societal impacts team, which is also like the other half of the one and a half people in policy. So it's also 1.5 people is something that they've been working on as well.
01:42:36
Speaker
Right, so a large part of this safety problem is of course the technical aspect of how you train systems and create systems that are safe and aligned with human preferences and values. How do you guys view and see the larger problem of AI governance and the role and importance of governments and civil society
01:43:01
Speaker
in working towards the safe and beneficial use and deployment of AI systems. Yeah, we talked about this little one earlier, this one sort of a little bit earlier, and maybe I'll start here and obviously Dario jump in if you want. But, you know, I do think that these other kind of institutions that you talked about have this really important role to play. And again, one of the things we sort of
01:43:23
Speaker
you know, mention in this paper is that we think, you know, government has already been starting to fund a lot more, you know, academic safety research. And I think that's an area that we sort of, a concrete policy recommendation is, you know, hey, go do more of that, that would be great. But I also think groups like, you know, civil society and NGOs, there's a lot of great organizations in this space, including FLI and others, that are thinking about like, what do we do, right?
01:43:49
Speaker
say we develop something really powerful, like what's the next step, right? Whether that's at an industry lab in government in academia, wherever. And I think there's a way that just sort of industry incentives are not the same as nonprofit groups or as civil society groups. And I think to kind of go back to this analogy of an ecosystem, we really need thoughtful and empowered organizations that are working on these kinds of questions fundamentally sort of outside of the industry sphere, in addition to the kind of
01:44:19
Speaker
policy research and work that's being done at labs. Yeah, I mean, another way you can think of things kind of in line with this is like, at some point, laws and regulations are going to be written. And I think probably those laws and regulations work best if they end up being formalizations of what's realized to be the best practices.
01:44:40
Speaker
And those best practices can come from different industrial players. They can come from academics figuring out what's good and what's not. They can come from nonprofit players. But if you write a law that relates to a technology that hasn't been invented yet, it's often not clear what the best thing to do is and what is actually going to work or make sense or even what categories or words to use.
01:45:01
Speaker
But if something has become a best practice and folks have converged on that and then kind of the law formalizes it and puts it in place, that can often be a very constructive way for things to happen.
01:45:12
Speaker
Anthropic has received an impressive amount of Series A funding.

Team Growth and Work Culture

01:45:20
Speaker
So it seems like you guys are doing a lot of hiring and growing considerably. So in case there's anyone from our audience that's interested in joining Anthropic, what are the types of roles that you expect to be hiring for?
01:45:34
Speaker
Yes, great question. We are definitely hiring. We're hiring a lot. And so I think the sort of number one thing I would say is if you're listening to this podcast and you're interested, I would highly recommend just checking out our jobs page because that will be the sort of most up-to-date and that's just anthropic.com with on the careers tab, but we can also send that around if that's helpful. But what are we looking to hire? Quite a few things. So most critically, probably right now, we're looking to hire engineers.
01:46:01
Speaker
And we're actually very bottlenecked on engineering talent right now. And that's because running experiments on AI systems is something that requires a lot of custom software and tooling. And while machine learning experience is helpful for that, it isn't necessarily
01:46:18
Speaker
you know, required. And I think a lot of our sort of best, you know, ML engineers or research engineers came from a software engineering or sort of infrastructure engineering background hadn't necessarily worked in ML before, but we're just really excited to learn. And so I think if that kind of describes you, if you're a software engineer, but you're really interested in these topics, you know, definitely think about applying because I think there's a lot of value that your skills can provide.
01:46:44
Speaker
We're also looking for just sort of a number of other roles. I kind of won't be able to list them all. You should just check out our jobs page. But off the top of my head, we're looking for front-end engineers to help with things like interfaces and tooling for the research we're doing internally. We're looking for policy experts, operations people, security engineers, data visuals and controls. Security? Security, yes. We're definitely looking for- If you're building big models.
01:47:08
Speaker
Yes, security is something that I think is kind of... That brings us to the lab, should make sure their models are not stolen by bad actors. This is sort of a unanimous kind of thing in, you know, across all labs, right? There's something everyone really agrees on in industry and outside of industry, which is that security is really important. And so if you are interested in security or you have a security background, we would definitely love to hear from you, or I'm sure our friends at other industry labs and non-industry labs would also love to hear from you.
01:47:35
Speaker
I would also say, I sort of talked about this a little bit before, but we've also just kind of had a lot of success in hiring people who were very accomplished in other fields, especially other technical fields. And so we've alluded a few times to former recovering physicists or people who have PhDs in computer science or ML, neuroscientists, computational biologists. And so I think if you're someone who has
01:48:02
Speaker
sort of this strong background and sort of set of interests in a technical field that's like, you know, not related to ML, but sort of moderately adjacent, I would also consider applying for our residency program. And so I think, again, you know, if you're even a little curious, I would say just check out our job page because there's going to be more information there, but those are sort of the ones off the top of my head. And Dario, if I missed any, like, please jump in. Yeah, I mean, that covers a pretty wide range.
01:48:31
Speaker
Could you tell me a little bit more about the team and what it's like working at Anthropic? Yeah, definitely. You'll probably have to cut me off here because I'll talk forever about this because I think Anthropic is a great team. But some basic stats, we're about 35 people now. Like I've said a few times, we've kind of come from a really wide range of backgrounds. So this is people
01:48:52
Speaker
you know, who worked in tech companies, software engineers, these are former academics in physics, ethics, neuroscience, a lot of different areas, machine learning researchers, policy people, operations staff, so much more. And I think one of the kind of unifying
01:49:12
Speaker
themes that I would sort of point to in our employees as kind of a combination of like a set of two impulses that I think we've kind of talked about a lot in this podcast. And I think the first is really just a genuine desire to reduce the risks and increase the potential benefits from AI. And I think the second is kind of a
01:49:35
Speaker
deep curiosity to really scientifically and empirically describe, understand, predict, model out how AI systems work and kind of through that deeper understanding make them safer and more reliable. And I think some of our employees
01:49:52
Speaker
identify as effective altruists, which means they're sort of especially worried about the potential for long-term harms from AI. And I think others are sort of more concerned about immediate or sort of emerging risks that are happening today or in the near future. And I think both of those views are sort of very compatible with the goals that I just kind of talked about. And I think they often just call for sort of a mixed method approach to research, which I think is sort of a very accurate description of kind of how things look in a day-to-day way at Entropic.
01:50:22
Speaker
It's a very collaborative environment, so there's not a very strong distinction between research and engineering. Researchers write code. Engineers contribute to research. There's a very strong culture of pair programming across and within teams. There's a very strong focus on learning. I think this is also just because so many of us come from backgrounds that were not necessarily ML-focused in where we started, so people run these very nice
01:50:46
Speaker
you know, little training courses where they'll say, hey, if you're interested in learning more about Transformers, I'm a Transformers expert and I'll kind of walk you through it at sort of different levels of technical skills so that people from the operations team or the policy team can kind of come for the for the sort of introduction, an introductory version. And then I think outside of that, you know, I think I like to think we're a nice group of people. We all have lunch together every day. We have this, you know, very lovely office space in San Francisco. It's fairly well attended. And I think, you know, we have lots of fun lunch conversations ranging from things like
01:51:16
Speaker
You know, a recent one was we were sort of talking about micro-COVID. If you know the concept of micro-COVID, Catherine Olson, who's kind of one of the creators of micro-COVID.org, which is basically a way of like, you know, assessing the level of risk from a given interaction or sort of a given activity that you're doing during COVID time. So we had this sort of like fun meta conversation where we're like, how risky is this conversation that we're having right now from a micro-COVID perspective if we like all came into the office and tested, but we're still like sort of together indoors and there's like 15 of us. What does that mean?
01:51:45
Speaker
So anyway, I think it's a fun place to work. We've obviously had a lot of fun kind of getting to build it together.
01:51:52
Speaker
Yeah, the things that stand out to me are trust and common purpose. They're enormous force multipliers, where it shows up in all kinds of little things, where you can think about it in things like compute allocation. If people are not on the same page, if one person wants to advance one research agenda and the other wants to advance their other research agenda, then people fight over it. And there's a lot of zero sum or negative sum interactions. But if everyone has the attitude of, we're trying to do this thing,
01:52:22
Speaker
everything we're trying to do is in line with this common purpose and we all trust each other to do what's right to advance this common purpose, then yeah, it really becomes a force multiplier on getting things done while keeping the environment comfortable and while everyone continues to get along with each other. I think it's an enormous superpower that I haven't seen before.
01:52:48
Speaker
So you mentioned that you're hiring a lot of technical people from a wide variety of technical backgrounds. Could you tell me a little bit more about your choice to do that rather than simply hiring people who are traditionally experienced in ML and AI?
01:53:02
Speaker
Yeah, that's a great question. So I should also say, you know, we have people from from kind of both camps that you talked about, but why did we choose to kind of bring people in from sort of outside the field? I think there's a few reasons for this. I think one is, you know, again, ML and AI is still a fairly a fairly new field, right? Not not not super new, but still pretty new. And so what that means is there's kind of a lot of opportunity for people who have not necessarily worked in this field before to kind of get into it. And
01:53:30
Speaker
And I think we've had sort of a lot of success or kind of luck with taking people who are really talented in kind of a related field and helping to take their skills and translate them to the ones in ML and AI safety. And I think the second reason is, so one is just expanding the talent pool.
01:53:47
Speaker
I think the other is it really does broaden the range of perspectives and the types of people who are working on these issues, which we think are very important. And again, we've talked about this previously, but having a wider range of views and kind of perspectives and approaches tends to lead to a more robust approach to doing both basic research and safety research. I'm surprised at how often someone who has experience in a different field can come in. It's not like they're like,
01:54:15
Speaker
It's not like they're directly applying things that come, but they think about things in a different way. And yeah, I mean, this is, of course, this is true about all kinds of things. This is true about diversity in the more traditional senses as well, but you want as many different kinds of people as you can get.
01:54:33
Speaker
So as we're wrapping up here, I'm curious just to get some more perspective on you guys about, given these large-scale models, the importance of safety and alignment, and the problems which exist today, but also the promises of the impact they could have for the benefit of people, what's a future that

Future Hopes for AI

01:54:58
Speaker
that each of you is excited about, or what's a future that you're hopeful for, given your work at Entropic and the future impacts of AI.
01:55:07
Speaker
I'll start. So I think, you know, one thing I do believe is I actually, I am really hopeful about the future. I know that there's a lot of challenges that we have to face to get to a sort of potentially, you know, really positive place, but I think the field will rise to the occasion, or that's kind of my hope. And I think some things I'm sort of hoping for in the next few years is that a lot of different groups will be developing
01:55:32
Speaker
more practical tools, techniques for advancing safety research. I sort of think these are likely to hopefully become more widely available if we can sort of set the right norms in the community. And I think the more people working on safety-related topics, that can sort of positively feed on itself, right? And I think I'm most broadly hoping for a world where we can feel confident that when we're using AI for more advanced purposes,
01:56:01
Speaker
like accelerating scientific research, that it's behaving in ways where we can be very confident and sure that we understand that it's not going to lead to negative unintended consequences. And the reason for that is because we've really taken the time to chart them out and understand what all of those potential
01:56:22
Speaker
problems could be. And so I think that's obviously a very ambitious goal, but I think if we can make all of that happen, you know, there's a lot of potential benefits of more advanced AI systems that I think could be transformative for the world, right? From almost anything you can name, right? Renewable energy, health, disease detection, economic growth, and sort of lots of other just day-to-day enhancements to kind of how we kind of work and communicate and live together.
01:56:49
Speaker
No one really knows what's going to happen in the future. It's extremely hard to predict. And so I often find any question about the future, it's more about the attitude or posture that you want to take than it is about concrete predictions. Because I feel like, particularly after you go a few years out, it's just very hard to know what's going to happen. And so yeah, it's mostly just speculation. And so in terms of attitude,
01:57:18
Speaker
I think, well, first of all, I think the two attitudes that I find least useful are like blind pessimism and blind optimism, because they're actually sort of like doomsane and like Pollyannaism. And it's possible, weirdly, it's possible to have both at once. But I think it's just not very useful, because it's like we're all doomed. It's intended to create fear, or it's intended to create complacency.
01:57:45
Speaker
I find that an attitude that's more useful is to just say, well, we don't know what's going to happen. But as an individual or as an organization, let's pick a place where there's a problem we think we can help with. And let's try and make things go a little better than they would have otherwise. Maybe we'll have a small impact. Maybe we'll have a big impact.
01:58:05
Speaker
But instead of trying to understand what's going to happen with the whole system, let's try and intervene in a way that helps with something that we feel well equipped to help with. And of course, the whole outcome is it's going to be beyond the scope of one person, one organization, even one country.
01:58:24
Speaker
But yeah, I mean, I think we find that to be a more effective way of thinking about things. And for us, you know, for us, that's can we help to address some of these safety problems that we have with AI systems in a way that is robust and enduring and that points towards the future.
01:58:44
Speaker
If we can increase the probability of things going well by only some very small amount, that may well be the most that we can do. I think from our perspective, I would like it if AI could advance science, technology, and health in a way that's equitable for everyone and that it could help everyone to make better decisions and improve human society.
01:59:11
Speaker
Right now, I frankly don't really trust the AI systems we build today to do any of those things. Even if it were technically capable of the task, which it's not, I wouldn't trust to do those things in a way that makes society better rather than worse. And so I'd like us to do our part to make it more likely that we could trust AI systems in that way. And if we can make a small contribution to that while being good citizens in the broader ecosystem, yeah.
01:59:40
Speaker
that it's maybe the best we can hope for. All right. And so if people want to check out more of your work or to follow you on social media, where are the best places to do that?
01:59:55
Speaker
Yeah, on anthropic.com is going to be the best place to see most of the recent stuff we've worked on. I don't know if we have everything posted, but we probably will. Yeah, because we have several papers out, so we're now about to post links to them on the website.
02:00:11
Speaker
in an easy to find place. And then we also have a Twitter handle. It's just, I think it's Adropic on Twitter. And yeah, and we generally, you know, also tweet about our recent releases of all kinds. We are relatively low key. We really want to be focused on the research and not get distracted. Yeah, I mean, the stuff we do is out there, but we're very focused on like the research itself and getting it out and letting you speak for itself.
02:00:39
Speaker
Okay, so where's the best place on Twitter to follow Anthropic? Our Twitter handle is at AnthropicAI. All right, I'll include a link to that in the description of wherever you're listening. Thanks a ton for coming on, Dario and Daniela. It's really been awesome and a lot of fun. I'll include links to Anthropic in the description. It's a pleasure having you, and yeah, thanks so much. Yeah, thanks so much for having us, Lucas. This was really fun.
02:01:04
Speaker
Thanks for joining us. If you found this podcast interesting or useful, consider sharing it on social media with friends and subscribing on your preferred podcasting platform. We'll be back again soon with another episode in the FLI podcast.