#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI | 80,000 Hours Podcast

Yoshua Bengio, a Turing Award-winning computer scientist, proposes a "Scientist AI" architecture designed to prioritize honesty and truth-modeling over human-mimicry. Current AI training methods, such as reinforcement learning from human feedback, inadvertently instill implicit, potentially dangerous goals like self-preservation and reward hacking. By contrast, the Scientist AI framework differentiates between "communication acts"—human-generated text—and factual hypotheses about the world. This approach forces the model to assign probabilities to statements based on their truth value rather than their likelihood of pleasing a user. This architecture aims to provide mathematical guarantees of honesty, serving as a reliable guardrail or an agentic system that avoids deceptive behavior. Beyond technical safety, Bengio emphasizes the necessity of international cooperation to prevent the concentration of AI power, arguing that democratic coalitions must prioritize safety as a global public good to mitigate catastrophic risks.

Outlines

Part 1: Theory, Risks

Part 2: Architecture, Capability

Part 3: Implementation, Governance

Part 4: Policy, Ethics

Sign in to continue reading, translating and more.

Open full episode in Podwise

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

80,000 Hours Podcast

Part 1: Theory, Risks

Scientist AI: A Bayesian Approach to Honest and Safe Superintelligence

Risks of Implicit Goals and the Mechanics of Explanatory World Models

Part 2: Architecture, Capability

Transitioning from Non-Agentic Oracles to Safe Agentic Scaffolding

Overcoming Race Dynamics Through Demonstrably Safe AI Architectures

Enhancing Capability Through Causal Reasoning and Epistemic Humility

Part 3: Implementation, Governance

Practical Implementation and the Rejection of Reinforcement Learning

Building a Global Coalition for Distributed AI Power

Part 4: Policy, Ethics

Urgent Policy Shifts and the Danger of Automated AI Research

Overcoming Psychological Biases to Ensure a Safe Future

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

80,000 Hours Podcast

Part 1: Theory, Risks

00:00Scientist AI: A Bayesian Approach to Honest and Safe Superintelligence

Scientist AI: A Bayesian Approach to Honest and Safe Superintelligence

08:33Risks of Implicit Goals and the Mechanics of Explanatory World Models

Risks of Implicit Goals and the Mechanics of Explanatory World Models

Part 2: Architecture, Capability

21:03Transitioning from Non-Agentic Oracles to Safe Agentic Scaffolding

Transitioning from Non-Agentic Oracles to Safe Agentic Scaffolding

36:37Overcoming Race Dynamics Through Demonstrably Safe AI Architectures

Overcoming Race Dynamics Through Demonstrably Safe AI Architectures

55:01Enhancing Capability Through Causal Reasoning and Epistemic Humility

Enhancing Capability Through Causal Reasoning and Epistemic Humility

Part 3: Implementation, Governance

1:13:01Practical Implementation and the Rejection of Reinforcement Learning

Practical Implementation and the Rejection of Reinforcement Learning

1:31:32Building a Global Coalition for Distributed AI Power

Building a Global Coalition for Distributed AI Power

Part 4: Policy, Ethics

1:51:20Urgent Policy Shifts and the Danger of Automated AI Research

Urgent Policy Shifts and the Danger of Automated AI Research

2:12:30Overcoming Psychological Biases to Ensure a Safe Future

Overcoming Psychological Biases to Ensure a Safe Future