SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, | Xiaol.x | Podwise