Sequential Transformer for End-to-End Video Text Detection | ComputerVisionFoundation Videos | Podwise