This podcast episode explores how Meta is leveraging AI technology to improve their incident response process. They discuss the challenges they face in managing incidents in a scaled infrastructure and the need for a more streamlined and automated approach. Meta is using the latest advances in LLMs to onboard responders efficiently and provide real-time generated summaries. They also address the challenges of investigating incidents and pinpointing the root cause, leveraging heuristics and data analysis. The use of Lama 2, a fine-tuned model, is introduced for incident root cause analysis. The potential of AI technology in incident management is emphasized, with a focus on transparency, explainability, and actionability. However, they acknowledge that the journey of incorporating AI into incident management processes is still in its early stages.