#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress | 80,000 Hours Podcast

In this interview podcast, Beth Barnes, founder and CEO of METR (Model Evaluation and Threat Research), discusses the weaknesses of current AI model evaluations, particularly concerning hidden chains of thought and the potential for models to deceive evaluators. Beth advocates for more transparency and oversight in AI development, emphasizing the importance of pre-training evaluations and the need to assess models' capabilities before deployment to prevent misuse or theft. She also shares METR's research on measuring AI capabilities over time using human task benchmarks, revealing an exponential growth in AI autonomy. Beth expresses concern about the rapid pace of AI development and the potential for recursively self-improving AI, urging policymakers and the public to take the risks seriously and consider the ethical implications of AI development.

Outlines

Part 1: AI Evaluation Limitations

Part 2: Measuring AI Progress

Part 3: AI Capabilities and Research

Part 4: Awareness and Transparency

Part 5: Regulation and Oversight

Part 6: Shifting Strategies and Mitigations

Part 7: Historical Parallels and International Cooperation

Part 8: METR's Role and Future Directions

Sign in to continue reading, translating and more.

Open full episode in Podwise

#217 – Beth Barnes on the most important graph in AI right now — and the 7-month rule that governs its progress

80,000 Hours Podcast

Part 1: AI Evaluation Limitations

00:00The Limitations of Current AI Evaluation Methods

The Limitations of Current AI Evaluation Methods

07:55The Problem of Interpreting AI Reasoning and the Risk of Scheming

The Problem of Interpreting AI Reasoning and the Risk of Scheming

16:31The Importance of Pre-Training Evaluations and Oversight

The Importance of Pre-Training Evaluations and Oversight

21:42The Need for External Scrutiny and Pre-Mitigation Evaluations

The Need for External Scrutiny and Pre-Mitigation Evaluations

Part 2: Measuring AI Progress

28:56METR's Research on Measuring Model Capabilities Over Time

METR's Research on Measuring Model Capabilities Over Time

34:16Measuring AI Progress in Terms of Human Task Completion Time

Measuring AI Progress in Terms of Human Task Completion Time

40:46The Mundane Work Behind Measuring AI Capabilities and the High Variance in Model Performance

The Mundane Work Behind Measuring AI Capabilities and the High Variance in Model Performance

Part 3: AI Capabilities and Research

47:51Practical Tips for Using ML Models and the Importance of AI's Ability to Improve Itself

Practical Tips for Using ML Models and the Importance of AI's Ability to Improve Itself

52:23The Current State and Future Trends of AI's Ability to Do ML Research

The Current State and Future Trends of AI's Ability to Do ML Research

58:05The Potential for AI to Outperform Humans in Research and the Alarming Implications for the Future

The Potential for AI to Outperform Humans in Research and the Alarming Implications for the Future

Part 4: Awareness and Transparency

1:07:54The Disconnect Between AI Progress and Policy Maker Awareness

The Disconnect Between AI Progress and Policy Maker Awareness

1:12:21The Information Hazard of Publishing AI Capabilities and the Importance of Safety Progress

The Information Hazard of Publishing AI Capabilities and the Importance of Safety Progress

1:18:00The Importance of Transparency and Collaboration in AI Safety

The Importance of Transparency and Collaboration in AI Safety

Part 5: Regulation and Oversight

1:24:26The Need for a Broader Understanding of AI Risks and the Importance of Concrete Policy Suggestions

The Need for a Broader Understanding of AI Risks and the Importance of Concrete Policy Suggestions

1:31:45The Challenges of Regulating Powerful AI Companies and the Importance of Independent Evaluations

The Challenges of Regulating Powerful AI Companies and the Importance of Independent Evaluations

1:35:39The Disagreements on the Effectiveness of Current AI Safety Measures and the Need for Meaningful Oversight

The Disagreements on the Effectiveness of Current AI Safety Measures and the Need for Meaningful Oversight

1:42:04The Challenges of Getting Meaningful Access to AI Models and the Importance of Prototyping Good Safety Assurance

The Challenges of Getting Meaningful Access to AI Models and the Importance of Prototyping Good Safety Assurance

1:47:32The Limited Effectiveness of Trying to Influence AI Companies from the Inside

The Limited Effectiveness of Trying to Influence AI Companies from the Inside

1:52:31The Advantages of Working Outside of AI Labs and the Importance of Whistleblowers

The Advantages of Working Outside of AI Labs and the Importance of Whistleblowers

Part 6: Shifting Strategies and Mitigations

1:58:03The Need for a Shift in Strategy and the Importance of Prioritizing Basic Mitigations

The Need for a Shift in Strategy and the Importance of Prioritizing Basic Mitigations

2:04:02The High Risk of the Current AI Development Path and the Need for Basic Safety Measures

The High Risk of the Current AI Development Path and the Need for Basic Safety Measures

2:10:31The Importance of Avoiding Training Models to Hide Scheming and the Need for a More Responsible Approach

The Importance of Avoiding Training Models to Hide Scheming and the Need for a More Responsible Approach

2:15:20The Overrated Nature of Interpretability and the Need to Focus on Getting the Capabilities You Want

The Overrated Nature of Interpretability and the Need to Focus on Getting the Capabilities You Want

2:20:00The Importance of Limiting Dangerous Capabilities and the Need for More Research on Unlearning

The Importance of Limiting Dangerous Capabilities and the Need for More Research on Unlearning

2:25:31The Need for More Research on Helping Humans Evaluate Model Outputs and the Dangers of Neural Ease

The Need for More Research on Helping Humans Evaluate Model Outputs and the Dangers of Neural Ease

2:30:34The Potential for Regulation and the Need to Avoid Naive Approaches

The Potential for Regulation and the Need to Avoid Naive Approaches

2:35:15The Importance of Maintaining a Security Mindset and the Dangers of Lab Exceptionalism

The Importance of Maintaining a Security Mindset and the Dangers of Lab Exceptionalism

Part 7: Historical Parallels and International Cooperation

2:40:03The Parallels Between AI and Nuclear Weapons and the Importance of Learning from History

The Parallels Between AI and Nuclear Weapons and the Importance of Learning from History

2:47:31The Importance of Considering the Destabilizing Effects of AI and the Need for International Cooperation

The Importance of Considering the Destabilizing Effects of AI and the Need for International Cooperation

2:52:29The Need for a More Realistic Assessment of AI Risks and the Importance of Compliance

The Need for a More Realistic Assessment of AI Risks and the Importance of Compliance

2:57:30The Open Sourcing of AI Models and the Need for Defensive Strategies

The Open Sourcing of AI Models and the Need for Defensive Strategies

3:03:33The Strategic Disadvantages of AI and the Need for a Different Approach

The Strategic Disadvantages of AI and the Need for a Different Approach

Part 8: METR's Role and Future Directions

3:11:24The Comparative Advantages and Disadvantages of METR

The Comparative Advantages and Disadvantages of METR

3:17:23The Challenges of Scaling a Non-Profit and the Importance of Talent Acquisition

The Challenges of Scaling a Non-Profit and the Importance of Talent Acquisition

3:23:30The Roles METR is Hiring For and the Importance of Growing the Field

The Roles METR is Hiring For and the Importance of Growing the Field