Arxiv paper - Describe Anything: Detailed Localized Image and Video Captioning | AI Breakdown | Podwise