Ph.D. Vlog - ViLT:使用Transformer最简单的多模态模型,同时处理图像和文本,大力出奇迹!
Sign in to continue reading, translating and more.