Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

The podcast explores the evolving landscape of software engineering with AI, particularly focusing on agentic systems in security. It highlights the hybrid nature of these systems, blending traditional software engineering with data science practices due to the stochastic nature of LLMs. The conversation emphasizes the challenges of prompt engineering, including the need for guardrails, testing methodologies, and managing context within LLMs. The discussion also covers the importance of domain knowledge, hybrid approaches combining LLMs with traditional machine learning, and the crucial role of UX design in building trust with users. The guest shares insights from building agentic systems for security, emphasizing the need for feedback loops, real-world data, and human oversight.

Outlines

Part 1: Value, Context, and the Shift to Agentic Systems

Part 2: Optimization, Context, and Hybrid Architectures

Part 3: UX, Debugging, and Prompt Engineering

Part 4: Evaluation, Failure Modes, and Frameworks

Sign in to continue reading, translating and more.

Open full episode in Podwise

MLOps.community

Part 1: Value, Context, and the Shift to Agentic Systems

The High Value of Cloud Code: A Software Engineer's Perspective

The Hybrid Nature of Agentic Systems: Data Science Meets Software Engineering

Balancing Creativity and Determinism in LLMs: Building Trustworthy Agentic Systems

Part 2: Optimization, Context, and Hybrid Architectures

Optimizing Context and Model Selection for Agentic Systems

Hybrid Approaches and the Importance of Feedback in Agentic Systems

Part 3: UX, Debugging, and Prompt Engineering

Debugging and Building Trust in Agentic Systems: The UX Challenge

Prompt Engineering as Code: Versioning, Testing, and Observability

Part 4: Evaluation, Failure Modes, and Frameworks

Evaluating Agentic Systems: Unit Tests, Staging Environments, and LLM as Judge

Collective Memory and Common Failure Modes in Agentic Systems

Framework Evaluation, Minimalistic Design, and the Role of LLMs

Software Engineering in the Age of Coding Agents: Testing, Evals, and Shipping Safely at Scale

MLOps.community

Part 1: Value, Context, and the Shift to Agentic Systems

00:00The High Value of Cloud Code: A Software Engineer's Perspective

The High Value of Cloud Code: A Software Engineer's Perspective

01:54The Hybrid Nature of Agentic Systems: Data Science Meets Software Engineering

The Hybrid Nature of Agentic Systems: Data Science Meets Software Engineering

07:58Balancing Creativity and Determinism in LLMs: Building Trustworthy Agentic Systems

Balancing Creativity and Determinism in LLMs: Building Trustworthy Agentic Systems

Part 2: Optimization, Context, and Hybrid Architectures

12:50Optimizing Context and Model Selection for Agentic Systems

Optimizing Context and Model Selection for Agentic Systems

17:41Hybrid Approaches and the Importance of Feedback in Agentic Systems

Hybrid Approaches and the Importance of Feedback in Agentic Systems

Part 3: UX, Debugging, and Prompt Engineering

25:33Debugging and Building Trust in Agentic Systems: The UX Challenge

Debugging and Building Trust in Agentic Systems: The UX Challenge

33:52Prompt Engineering as Code: Versioning, Testing, and Observability

Prompt Engineering as Code: Versioning, Testing, and Observability

Part 4: Evaluation, Failure Modes, and Frameworks

41:47Evaluating Agentic Systems: Unit Tests, Staging Environments, and LLM as Judge

Evaluating Agentic Systems: Unit Tests, Staging Environments, and LLM as Judge

45:04Collective Memory and Common Failure Modes in Agentic Systems

Collective Memory and Common Failure Modes in Agentic Systems

51:23Framework Evaluation, Minimalistic Design, and the Role of LLMs

Framework Evaluation, Minimalistic Design, and the Role of LLMs