LW - Anthropic announces interpretability advances. How much does this advance alignment? by Seth Herd | The Nonlinear Library | Podwise