Xiaol.x - Transformers to SSMs: Distilling Quadratic Knowledge to Subquadratic Models
Sign in to continue reading, translating and more.