20 Jan 2026
43m

Controlling AI Models from the Inside

Podcast cover

Practical AI

The podcast explores the challenges of AI model security and safety, particularly focusing on generative models. Ali Khatri, founder of RINX, discusses the limitations of current guardrail solutions that only analyze inputs and outputs, describing them as merely "checking IDs at the gate." He advocates for a new approach of "model-native safety" through mechanistic interpretability, which involves understanding and controlling internal model states to prevent the generation of harmful content. This method promises significant cost savings and improved performance compared to traditional guardrails. The discussion also covers the importance of customization in AI safety to address specific industry and company policies, highlighting the need for defense in depth by combining various security measures.

Outlines

Part 1: Introduction, Context

Part 2: Current Challenges, Limitations

Part 3: Technical Deep Dive, Interpretability

Part 4: RINX Solution, Efficiency

Part 5: Future Outlook, Conclusion

Sign in to continue reading, translating and more.

Open full episode in Podwise