LessWrong (30+ Karma) - “Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition” by cmathw, Dennis Akar, Lee Sharkey
Sign in to continue reading, translating and more.