Small Batch Size Training for LMs: When Vanilla SGD Works, and Why Gradient Accumulation Is Wasteful | AI Papers Podcast Daily | Podwise