AF - ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks by Beth Barnes | The Nonlinear Library | Podwise