“Adam Optimizer Causes Privileged Basis in Transformer Language Models ” by Diego Caples | LessWrong (30+ Karma) | Podwise