Gaming the Judge: Unfaithful Chain of Thought Can Undermine Agent Evaluation | AI Papers Podcast Daily | Podwise