LessWrong (30+ Karma) - “Decomposing the QK circuit with Bilinear Sparse Dictionary Learning” by keith_wynroe, Lee Sharkey
Sign in to continue reading, translating and more.