LW - Adam Optimizer Causes Privileged Basis in Transformer Language Models by Diego Caples | The Nonlinear Library | Podwise