How METR measures Long Tasks and Experienced Open Source Dev Productivity - Joel Becker, METR | AI Engineer

The podcast explores the challenges and potential slowdowns in AI development, particularly concerning compute growth and its impact on AI capabilities. It addresses the question of whether a causal relationship exists between compute and time horizon, suggesting that a halving of compute growth could proportionally reduce the time horizon. The conversation covers the limitations of current AI models, especially in complex tasks requiring tacit knowledge and the difficulties in automating chip production, with opinions diverging on the timeline for achieving full automation. The discussion also examines the effectiveness of AI in various fields, such as data science, law, and robotics, highlighting the gap between theoretical potential and practical application due to issues like data quality and the need for human oversight.

Outlines

Part 1: Compute Growth and AI Scaling

Part 2: Measuring Time Horizon and Productivity

Part 3: Developer Workflows and Open Source Context

Part 4: AI in Data Science and Specialized Domains

Part 5: Professional Standards and Quality

Part 6: Future Capabilities and Safety Benchmarks

Part 7: Hardware, Robotics, and Fabrication

Sign in to continue reading, translating and more.

Open full episode in Podwise

How METR measures Long Tasks and Experienced Open Source Dev Productivity - Joel Becker, METR

AI Engineer

Part 1: Compute Growth and AI Scaling

00:20The Causal Proportionality Argument: Compute Growth and Time Horizon

The Causal Proportionality Argument: Compute Growth and Time Horizon

01:08Potential Slowdown in Compute Growth and its Implications on AI Capabilities

Potential Slowdown in Compute Growth and its Implications on AI Capabilities

02:36The Assumption of Causal Proportionality and Software-Only Singularity

The Assumption of Causal Proportionality and Software-Only Singularity

03:34The Reliability of Log-Linear Plots in AI Forecasting

The Reliability of Log-Linear Plots in AI Forecasting

Part 2: Measuring Time Horizon and Productivity

04:07Challenges in Testing Time Horizon and the Need for New Measures

Challenges in Testing Time Horizon and the Need for New Measures

05:10Distinguishing Between Human and Model Time in AI Tasks

Distinguishing Between Human and Model Time in AI Tasks

06:18The Confounding Factor of Tool Familiarity in Developer Productivity

The Confounding Factor of Tool Familiarity in Developer Productivity

07:42J-Curve Explanations and the Accuracy of Time Estimates in Software Engineering

J-Curve Explanations and the Accuracy of Time Estimates in Software Engineering

09:27The Perceptual Aspect of AI Speedup and Expert Forecasters' Perspectives

The Perceptual Aspect of AI Speedup and Expert Forecasters' Perspectives

Part 3: Developer Workflows and Open Source Context

11:19Observations from Screen Recordings and the Importance of High-Level Architectural Decisions

Observations from Screen Recordings and the Importance of High-Level Architectural Decisions

12:22The Impact of Project Organization on AI Assistance in Open Source

The Impact of Project Organization on AI Assistance in Open Source

13:30The Value of AI in Legacy Codebases and the Question of AI Familiarity

The Value of AI in Legacy Codebases and the Question of AI Familiarity

14:34Analyzing Plots of AI Experience and the Small Sample Size Limitations

Analyzing Plots of AI Experience and the Small Sample Size Limitations

16:15Examining the J-Shaped Plot and the Conservative Coding of Hours

Examining the J-Shaped Plot and the Conservative Coding of Hours

17:36Addressing Bias and Generalizability in the Developer Study

Addressing Bias and Generalizability in the Developer Study

19:21Comparing Results with Independent Research and the Value of Expert Control Sets

Comparing Results with Independent Research and the Value of Expert Control Sets

Part 4: AI in Data Science and Specialized Domains

21:37Results from a Hackathon Randomizing AI Use on Greenfield Projects

Results from a Hackathon Randomizing AI Use on Greenfield Projects

23:04Exploring New Directions for Research Beyond Coding Tasks

Exploring New Directions for Research Beyond Coding Tasks

24:34The Goal: Understanding What's Going on in AI and Data Science

The Goal: Understanding What's Going on in AI and Data Science

25:31Challenges in Applying AI to Data Science in Corporate Environments

Challenges in Applying AI to Data Science in Corporate Environments

26:34The Role of Tacit Knowledge and the Potential for Specialized AI Models

The Role of Tacit Knowledge and the Potential for Specialized AI Models

28:21The Importance of Data Specs and Fixing Problems at the Source

The Importance of Data Specs and Fixing Problems at the Source

29:35The Potential for AI to Improve Data Science and Feature Curation

The Potential for AI to Improve Data Science and Feature Curation

30:21Defining Complex Data Science Tasks and the Limits of Current AI Systems

Defining Complex Data Science Tasks and the Limits of Current AI Systems

Part 5: Professional Standards and Quality

32:13Exploring AI Applications in Data Science, Law, and Medicine

Exploring AI Applications in Data Science, Law, and Medicine

34:25The Potential of AI to Transform Discovery in Law

The Potential of AI to Transform Discovery in Law

36:09Analyzing the Scatter Plot of Cursor Experience and Developer Performance

Analyzing the Scatter Plot of Cursor Experience and Developer Performance

37:55Quantifying Speedup and the Impact of AI on Time Estimates

Quantifying Speedup and the Impact of AI on Time Estimates

40:28The Live Coding Experience and the Quality Bar on Open Source Repositories

The Live Coding Experience and the Quality Bar on Open Source Repositories

42:45The Role of AI in Lowering Quality Standards and the Bias in Open Source PRs

The Role of AI in Lowering Quality Standards and the Bias in Open Source PRs

44:56The High Bar for PRs and the Professional Incentives of Open Source Developers

The High Bar for PRs and the Professional Incentives of Open Source Developers

Part 6: Future Capabilities and Safety Benchmarks

46:21Upcoming Research: Addressing Challenges in Measuring Capabilities Over Time

Upcoming Research: Addressing Challenges in Measuring Capabilities Over Time

47:39Measuring Time Horizon with and Without Close Monitoring

Measuring Time Horizon with and Without Close Monitoring

49:56The Importance of Safety and the Ongoing Trend of Capability Growth

The Importance of Safety and the Ongoing Trend of Capability Growth

51:23Aligning Benchmarks with Relevant Concerns and Projecting Future Trends

Aligning Benchmarks with Relevant Concerns and Projecting Future Trends

52:51Exploring New Angles for Capabilities Measurement: In-the-Wild Transcripts

Exploring New Angles for Capabilities Measurement: In-the-Wild Transcripts