Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory | Arxiv Papers | Podwise