Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

Neel Nanda on Avoiding an AI Catastrophe with Mechanistic Interpretability

238 Plays2 years ago

Neel Nanda joins the podcast to talk about mechanistic interpretability and how it can make AI safer. Neel is an independent AI safety researcher. You can find his blog here: https://www.neelnanda.io Timestamps: 00:00 Introduction 00:46 How early is the field mechanistic interpretability? 03:12 Why should we care about mechanistic interpretability? 06:38 What are some successes in mechanistic interpretability? 16:29 How promising is mechanistic interpretability? 31:13 Is machine learning analogous to evolution? 32:58 How does mechanistic interpretability make AI safer? 36:54 36:54 Does mechanistic interpretability help us control AI? 39:57 Will AI models resist interpretation? 43:43 Is mechanistic interpretability fast enough? 54:10 Does mechanistic interpretability give us a general understanding? 57:44 How can you help with mechanistic interpretability? Social Media Links: ➡️ WEBSITE: https://futureoflife.org ➡️ TWITTER: https://twitter.com/FLIxrisk ➡️ INSTAGRAM: https://www.instagram.com/futureoflifeinstitute/ ➡️ META: https://www.facebook.com/futureoflifeinstitute ➡️ LINKEDIN: https://www.linkedin.com/company/future-of-life-institute/

Recommended

What Markets Tell Us About AI Timelines (with Basil Halperin)

Future of Life Institute Podcast

01:36:09·4 days ago

AGI Security: How We Defend the Future (with Esben Kran)

Future of Life Institute Podcast

01:18:20·14 days ago

Reasoning, Robots, and How to Prepare for AGI (with Benjamin Todd)

Future of Life Institute Podcast

01:27:00·21 days ago

From Peak Horse to Peak Human: How AI Could Replace Us (with Calum Chace)

Future of Life Institute Podcast

01:37:20·1 month ago

How AI Could Help Overthrow Governments (with Tom Davidson)

Future of Life Institute Podcast

01:53:49·1 month ago

What Happens After Superintelligence? (with Anders Sandberg)

Future of Life Institute Podcast

01:44:54·1 month ago

Why the AI Race Ends in Disaster (with Daniel Kokotajlo)

Future of Life Institute Podcast

01:10:26·2 months ago

Preparing for an AI Economy (with Daniel Susskind)

Future of Life Institute Podcast

01:03:37·2 months ago

Will AI Companies Respect Creators' Rights? (with Ed Newton-Rex)

Future of Life Institute Podcast

01:27:14·2 months ago

AI Timelines and Human Psychology (with Sarah Hastings-Woodhouse)

Future of Life Institute Podcast

01:15:49·2 months ago

Could Powerful AI Break Our Fragile World? (with Michael Nielsen)

Future of Life Institute Podcast

01:01:28·2 months ago

Facing Superintelligence (with Ben Goertzel)

Future of Life Institute Podcast

01:32:33·3 months ago

Will Future AIs Be Conscious? (with Jeff Sebo)

Future of Life Institute Podcast

01:34:27·3 months ago

Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz)

Future of Life Institute Podcast

01:35:09·3 months ago

Inside China’s AI Strategy: Innovation, Diffusion, and US Relations (with Jeffrey Ding)

Future of Life Institute Podcast

01:02:32·4 months ago

How Will We Cooperate with AIs? (with Allison Duettmann)

Future of Life Institute Podcast

01:36:02·4 months ago

Brain-like AGI and why it's Dangerous (with Steven Byrnes)

Future of Life Institute Podcast

01:13:13·5 months ago

How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil)

Future of Life Institute Podcast

01:34:33·5 months ago

Special: Defeating AI Defenses (with Nicholas Carlini and Nathan Labenz)

Future of Life Institute Podcast

02:23:12·5 months ago

Keep the Future Human (with Anthony Aguirre)

Future of Life Institute Podcast

01:21:03·5 months ago

Transcript