The podcast explores the challenges enterprises face in deploying AI models due to inherent risks like bias, hallucinations, and vulnerabilities, which often hinder ROI. Ben Lorica, CEO of HIRONDO, introduces machine unlearning as a solution to make AI trustworthy by enabling models to "forget" undesired data and behaviors. He details how current methods like context engineering, fine-tuning, and guardrails are insufficient because they focus on external solutions rather than addressing the core of the model's knowledge. HIRONDO's approach involves "neurosurgery" on the model's internal representations, identifying and removing problematic behaviors and information like PII, vulnerabilities to prompt injections, and biases. The discussion also covers the distinction between behavioral and data unlearning, emphasizing the importance of benchmarks to measure the effectiveness of unlearning.
Part 1: Problem, Context, Origins
Part 2: Current Solutions vs. Machine Unlearning
Part 3: Security, Jailbreaking, Multimodality
Part 4: Technical Implementation, Open vs. Closed Models
Part 5: Future Applications, Agents, Programming
Part 6: Trust, Transparency, Visualization
Sign in to continue reading, translating and more.
Open full episode in Podwise
