LW - Polysemantic Attention Head in a 4-Layer Transformer by Jett | The Nonlinear Library | Podwise