
Zeyuan Allen-Zhu introduces the "Physics of Language Models, Part 4.1," focusing on architecture design and canon layers. The talk is divided into two parts: the first covers noisy real-life pretraining and building a versatile synthetic pretraining playground, aimed at theoreticians and methodologists. The second talk, geared towards applied researchers, will discuss architectural principles derived from the playground and introduce canon layers to improve model reasoning. The discussion covers the challenges of academic scale pretraining, the instability of accuracy gains due to random seeds, and the limitations of using perplexity as an evaluation metric. To address these issues, Allen-Zhu advocates for building a synthetic pretraining playground to decompose intelligence into atomic skills, enabling cleaner architectural comparisons and rigorous science through mini scaling laws. He details five synthetic tasks designed to challenge different aspects of model performance: reasoning depth, reasoning breadth, knowledge capacity, knowledge manipulation, and resolving structure ambiguity. Finally, Allen-Zhu presents initial results comparing different architectures using the synthetic playground, highlighting the potential of this approach for revealing architectural differences.
Sign in to continue reading, translating and more.
Continue