arxiv preprint - Learning Video Representations from Large Language Models | AI Breakdown | Podwise