Xiaol.x - Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free
Sign in to continue reading, translating and more.