Multimodal Vision Transformers with Forced Attention for Behavior Analysis | ComputerVisionFoundation Videos | Podwise