A Transformer-based Late-Fusion Mechanism for Fine-Grained Object Recognition in Videos | ComputerVisionFoundation Videos | Podwise