#243 – 'Godfather of AI' Yoshua Bengio: "I now see a path" to safe superintelligent AI
80,000 Hours Podcast
Yoshua Bengio, a Turing Award-winning computer scientist, proposes a "Scientist AI" architecture designed to prioritize honesty and truth-modeling over human-mimicry. Current AI training methods, such as reinforcement learning from human feedback, inadvertently instill implicit, potentially dangerous goals like self-preservation and reward hacking. By contrast, the Scientist AI framework differentiates between "communication acts"—human-generated text—and factual hypotheses about the world. This approach forces the model to assign probabilities to statements based on their truth value rather than their likelihood of pleasing a user. This architecture aims to provide mathematical guarantees of honesty, serving as a reliable guardrail or an agentic system that avoids deceptive behavior. Beyond technical safety, Bengio emphasizes the necessity of international cooperation to prevent the concentration of AI power, arguing that democratic coalitions must prioritize safety as a global public good to mitigate catastrophic risks.
Part 1: Theory, Risks
Part 2: Architecture, Capability
Part 3: Implementation, Governance
Part 4: Policy, Ethics
Sign in to continue reading, translating and more.
Open full episode in Podwise
