Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces | AI Papers Podcast Daily | Podwise