Stable Reasoning in LLMs: A Novel Evaluation Metric and Benchmark | AI Papers Podcast Daily | Podwise