The Nonlinear Library - LW - You can remove GPT2's LayerNorm by fine-tuning for an hour by StefanHex
Sign in to continue reading, translating and more.