LessWrong (30+ Karma) - “Adam Optimizer Causes Privileged Basis in Transformer Language Models ” by Diego Caples
Sign in to continue reading, translating and more.