Become a Creator today!Start creating today - Share your story with the world!

00:00:00

00:00:01

AIAP: Synthesizing a human's preferences into a utility function with Stuart Armstrong

56 Plays5 years ago

In his Research Agenda v0.9: Synthesizing a human's preferences into a utility function, Stuart Armstrong develops an approach for generating friendly artificial intelligence. His alignment proposal can broadly be understood as a kind of inverse reinforcement learning where most of the task of inferring human preferences is left to the AI itself. It's up to us to build the correct assumptions, definitions, preference learning methodology, and synthesis process into the AI system such that it will be able to meaningfully learn human preferences and synthesize them into an adequate utility function. In order to get this all right, his agenda looks at how to understand and identify human partial preferences, how to ultimately synthesize these learned preferences into an "adequate" utility function, the practicalities of developing and estimating the human utility function, and how this agenda can assist in other methods of AI alignment. Topics discussed in this episode include: -The core aspects and ideas of Stuart's research agenda -Human values being changeable, manipulable, contradictory, and underdefined -This research agenda in the context of the broader AI alignment landscape -What the proposed synthesis process looks like -How to identify human partial preferences -Why a utility function anyway? -Idealization and reflective equilibrium -Open questions and potential problem areas Here you can find the podcast page: https://futureoflife.org/2019/09/17/synthesizing-a-humans-preferences-into-a-utility-function-with-stuart-armstrong/ Important timestamps: 0:00 Introductions 3:24 A story of evolution (inspiring just-so story) 6:30 How does your “inspiring just-so story” help to inform this research agenda? 8:53 The two core parts to the research agenda 10:00 How this research agenda is contextualized in the AI alignment landscape 12:45 The fundamental ideas behind the research project 15:10 What are partial preferences? 17:50 Why reflexive self-consistency isn’t enough 20:05 How are humans contradictory and how does this affect the difficulty of the agenda? 25:30 Why human values being underdefined presents the greatest challenge 33:55 Expanding on the synthesis process 35:20 How to extract the partial preferences of the person 36:50 Why a utility function? 41:45 Are there alternative goal ordering or action producing methods for agents other than utility functions? 44:40 Extending and normalizing partial preferences and covering the rest of section 2 50:00 Moving into section 3, synthesizing the utility function in practice 52:00 Why this research agenda is helpful for other alignment methodologies 55:50 Limits of the agenda and other problems 58:40 Synthesizing a species wide utility function 1:01:20 Concerns over the alignment methodology containing leaky abstractions 1:06:10 Reflective equilibrium and the agenda not being a philosophical ideal 1:08:10 Can we check the result of the synthesis process? 01:09:55 How did the Mahatma Armstrong idealization process fail? 01:14:40 Any clarifications for the AI alignment community? You Can take a short (4 minute) survey to share your feedback about the podcast here: www.surveymonkey.com/r/YWHDFV7

Recommended

Understanding AI Agents: Time Horizons, Sycophancy, and Future Risks (with Zvi Mowshowitz)

Future of Life Institute Podcast

01:35:09·6 days ago

Inside China’s AI Strategy: Innovation, Diffusion, and US Relations (with Jeffrey Ding)

Future of Life Institute Podcast

01:02:32·20 days ago

How Will We Cooperate with AIs? (with Allison Duettmann)

Future of Life Institute Podcast

01:36:02·1 month ago

Brain-like AGI and why it's Dangerous (with Steven Byrnes)

Future of Life Institute Podcast

01:13:13·1 month ago

How Close Are We to AGI? Inside Epoch’s GATE Model (with Ege Erdil)

Future of Life Institute Podcast

01:34:33·1 month ago

Special: Defeating AI Defenses (with Nicholas Carlini and Nathan Labenz)

Future of Life Institute Podcast

02:23:12·1 month ago

Keep the Future Human (with Anthony Aguirre)

Future of Life Institute Podcast

01:21:03·2 months ago

We Created AI. Why Don't We Understand It? (with Samir Varma)

Future of Life Institute Podcast

01:16:15·2 months ago

Why AIs Misbehave and How We Could Lose Control (with Jeffrey Ladish)

Future of Life Institute Podcast

01:22:33·2 months ago

Ann Pace on using Biobanking and Genomic Sequencing to Conserve Biodiversity

Future of Life Institute Podcast

00:46:09·3 months ago

Michael Baggot on Superintelligence and Transhumanism from a Catholic Perspective

Future of Life Institute Podcast

01:25:56·3 months ago

David Dalrymple on Safeguarded, Transformative AI

Future of Life Institute Podcast

01:40:06·4 months ago

Nick Allardice on Using AI to Optimize Cash Transfers and Predict Disasters

Future of Life Institute Podcast

01:09:26·4 months ago

Nathan Labenz on the State of AI and Progress since GPT-4

Future of Life Institute Podcast

03:20:04·5 months ago

Connor Leahy on Why Humanity Risks Extinction from AGI

Future of Life Institute Podcast

01:58:50·5 months ago

Suzy Shepherd on Imagining Superintelligence and "Writing Doom"

Future of Life Institute Podcast

01:03:08·6 months ago

Andrea Miotti on a Narrow Path to Safe, Transformative AI

Future of Life Institute Podcast

01:28:09·6 months ago

Tamay Besiroglu on AI in 2030: Scaling, Automation, and AI Agents

Future of Life Institute Podcast

01:30:29·7 months ago

Ryan Greenblatt on AI Control, Timelines, and Slowing Down Around Human-Level AI

Future of Life Institute Podcast

02:08:44·7 months ago

Tom Barnes on How to Build a Resilient World

Future of Life Institute Podcast

01:19:41·8 months ago

Transcript