“You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex | LessWrong (30+ Karma) | Podwise