
The podcast explores how Large Language Models (LLMs) function, particularly focusing on the mathematical models developed by Vishal Misra. Misra, professor and vice dean of computing and AI at Columbia University, details his matrix abstraction of LLMs, where each row represents a prompt and columns represent the probability distribution over the vocabulary. He explains how LLMs perform Bayesian updating in real-time, adjusting posterior probabilities as they receive new evidence through in-context learning. Misra introduces the concept of a "Bayesian wind tunnel" to mathematically prove that transformers perform Bayesian inference. The discussion differentiates between human and LLM learning, highlighting humans' continual learning and simulation capabilities versus LLMs' frozen weights and correlation-based learning, advocating for a shift towards causation and Kolmogorov complexity in AI research.
Sign in to continue reading, translating and more.
Continue