07 May 2026
2h 35m

#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI

Podcast cover

80,000 Hours Podcast

Yoshua Bengio, a Turing Award-winning computer scientist, proposes a "Scientist AI" architecture designed to prioritize honesty and truth-modeling over human-mimicry. Current AI training methods, such as reinforcement learning from human feedback, inadvertently instill implicit, potentially dangerous goals like self-preservation and reward hacking. By contrast, the Scientist AI framework differentiates between "communication acts"—human-generated text—and factual hypotheses about the world. This approach forces the model to assign probabilities to statements based on their truth value rather than their likelihood of pleasing a user. This architecture aims to provide mathematical guarantees of honesty, serving as a reliable guardrail or an agentic system that avoids deceptive behavior. Beyond technical safety, Bengio emphasizes the necessity of international cooperation to prevent the concentration of AI power, arguing that democratic coalitions must prioritize safety as a global public good to mitigate catastrophic risks.

Outlines

Part 1: Theory, Risks

Part 2: Architecture, Capability

Part 3: Implementation, Governance

Part 4: Policy, Ethics

Sign in to continue reading, translating and more.

Open full episode in Podwise