What Happens During the Loss Plateau? Understanding Abrupt Learning in Transformers | Xiaol.x | Podwise