LessWrong (30+ Karma) - “You can remove GPT2’s LayerNorm by fine-tuning for an hour” by StefanHex
Sign in to continue reading, translating and more.