07 Mar 2025

Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley

AI Engineer

This episode explores the potential of reinforcement learning (RL) in advancing AI agents beyond current chatbot and reasoner capabilities. Will Brown distinguishes between simple pipelines and more autonomous agents, highlighting the limitations of current systems that rely heavily on prompt engineering and human feedback. Against the backdrop of diminishing returns from pre-training and the limitations of synthetic data, Brown points to RL as a promising approach, referencing DeepSeq's R1 model as an example of how RL can unlock test-time scaling and emergent reasoning abilities. More significantly, the discussion covers the idea of "rubric engineering," which involves designing detailed reward systems to guide model behavior, emphasizing the need for creativity and caution regarding reward hacking. Brown introduces an open-source framework for RL within multi-step environments, aiming to leverage existing agent frameworks and API models. The talk concludes by framing AI engineering in the context of RL, suggesting that skills in building environments and rubrics are analogous to those in prompt engineering and evaluation, and that the development of monitoring tools and a supportive ecosystem will be essential for building truly autonomous agents.

Outlines

Continue

Preview

How to Get Rich: Every EpisodeNaval

Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley

AI Engineer

Reinforcement Learning for Agents: An Introduction

Model Trends and the Promise of Reinforcement Learning

Infrastructure and Opportunities in Agentic RL

AI Engineering in the Reinforcement Learning Era

Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley

AI Engineer

00:16Reinforcement Learning for Agents: An Introduction

Reinforcement Learning for Agents: An Introduction

04:46Model Trends and the Promise of Reinforcement Learning

Model Trends and the Promise of Reinforcement Learning

09:24Infrastructure and Opportunities in Agentic RL

Infrastructure and Opportunities in Agentic RL

14:24AI Engineering in the Reinforcement Learning Era

AI Engineering in the Reinforcement Learning Era