Simple Token-Level Confidence Improves Caption Correctness | ComputerVisionFoundation Videos | Podwise