Silent Errors in Large-Scale LLM training by Cyril Meurillon & Devin O'Kelley | @Scale | Podwise