Synchronized Audio-Visual Generation with a Joint Generative Diffusion Model and Contrastive Loss | Microsoft Research | Podwise