Learn How to Build Multimodal Search and RAG

This monologue podcast is an educational lesson on multimodal learning and contrastive representation learning in AI. The speaker first introduces the concept of multimodal data (text, images, audio, video) and its importance in building more human-like AI. The core of the lesson explains contrastive representation learning, a technique to unify different models into one multimodal embedding model by using positive and negative examples to pull similar vectors closer and push dissimilar ones further apart. A practical example using the MNIST dataset and a Python code walkthrough demonstrates how to train a neural network using this technique, visualizing the resulting vector space using PCA and UMAP. Listeners gain a practical understanding of contrastive representation learning and its application in building multimodal AI models.

Outlines

Sign in to continue reading, translating and more.

Continue

DeepLearningAI

Introduction to Multimodal Learning

What is Multimodal Data?

Multimodal Learning and Human Learning

Multimodal Embeddings

Contrastive Representation Learning

Contrastive Learning: Text Example

Contrastive Learning: Image Example

Contrastive Loss Function

Contrastive Learning for Multimodal Data

Challenges and Approaches in Multimodal Contrastive Learning

Contrastive Loss Function in Practice

Practical Application: Training a Multimodal Embedding Model

Lab Exercise: Training on MNIST Dataset

Defining the Neural Network Architecture

Implementing the Contrastive Loss Function

Training the Neural Network

Loading the Pre-trained Model and Analyzing Training Results

Visualizing the Learned Vector Space using PCA

Visualizing the Learned Vector Space using UMAP

Conclusion and Next Steps

Learn How to Build Multimodal Search and RAG

DeepLearningAI

00:00Introduction to Multimodal Learning

Introduction to Multimodal Learning

01:13What is Multimodal Data?

What is Multimodal Data?

01:58Multimodal Learning and Human Learning

Multimodal Learning and Human Learning

02:36Multimodal Embeddings

Multimodal Embeddings

03:30Contrastive Representation Learning

Contrastive Representation Learning

04:48Contrastive Learning: Text Example

Contrastive Learning: Text Example

05:22Contrastive Learning: Image Example

Contrastive Learning: Image Example

05:40Contrastive Loss Function

Contrastive Loss Function

06:03Contrastive Learning for Multimodal Data

Contrastive Learning for Multimodal Data

06:28Challenges and Approaches in Multimodal Contrastive Learning

Challenges and Approaches in Multimodal Contrastive Learning

06:59Contrastive Loss Function in Practice

Contrastive Loss Function in Practice

08:19Practical Application: Training a Multimodal Embedding Model

Practical Application: Training a Multimodal Embedding Model

08:30Lab Exercise: Training on MNIST Dataset

Lab Exercise: Training on MNIST Dataset

12:20Defining the Neural Network Architecture

Defining the Neural Network Architecture

13:19Implementing the Contrastive Loss Function

Implementing the Contrastive Loss Function

14:40Training the Neural Network

Training the Neural Network

16:48Loading the Pre-trained Model and Analyzing Training Results

Loading the Pre-trained Model and Analyzing Training Results

17:48Visualizing the Learned Vector Space using PCA

Visualizing the Learned Vector Space using PCA

20:42Visualizing the Learned Vector Space using UMAP

Visualizing the Learned Vector Space using UMAP

22:08Conclusion and Next Steps

Conclusion and Next Steps