Arxiv Papers - Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Sign in to continue reading, translating and more.