Ali Behrouz Nested Learning

Ali Behrouz discusses Nested Learning, a framework for understanding machine learning architectures and optimization processes as interconnected systems of nested optimization problems. He draws an analogy to human memory, particularly a condition resembling untrue-grade amnesia, to motivate improvements in LLMs. Behrouz explains associative memory, its relation to context, and how different architectures can be reformulated as associative memories. He also touches on the importance of considering the interconnectedness of architecture and optimization, suggesting that optimizers should be tailored to specific architectures. He then introduces the Hope Architecture, which incorporates multiple MLP blocks updated at different frequencies to manage memory and handle information effectively, and the M3 algorithm, an optimization algorithm combining Adam Optimizer and Muon Optimizer. Finally, he presents experimental results demonstrating Hope's capabilities in long-context modeling and continual learning.

Outlines

Sign in to continue reading, translating and more.

Continue

Phil Trubey

LLMs, Human Amnesia, and Associative Memory

Associative Memory as an Optimization Problem and its Relation to Architectures

Nested Memory, Optimization Problems, and Interconnected Systems

Gradient Descent as Associative Memory and Self-Referential Models

Momentum, Nested Optimization, and Interconnected Learning Systems

Nested Learning and the Importance of Context in Optimization

Transformer Architecture and the Hope Architecture

Context Management and Self-Referential Processes

Self-Modifying Titans and Continuum Memory Systems

Continuum Memory Systems for Optimization and Ad-Hoc Level Stacking

Experimental Results: Long Context Modeling and Continual Learning

Retraining, Continual Learning, and Future Directions

Ali Behrouz Nested Learning

Phil Trubey

00:00LLMs, Human Amnesia, and Associative Memory

LLMs, Human Amnesia, and Associative Memory

02:27Associative Memory as an Optimization Problem and its Relation to Architectures

Associative Memory as an Optimization Problem and its Relation to Architectures

05:18Nested Memory, Optimization Problems, and Interconnected Systems

Nested Memory, Optimization Problems, and Interconnected Systems

09:00Gradient Descent as Associative Memory and Self-Referential Models

Gradient Descent as Associative Memory and Self-Referential Models

12:21Momentum, Nested Optimization, and Interconnected Learning Systems

Momentum, Nested Optimization, and Interconnected Learning Systems

15:49Nested Learning and the Importance of Context in Optimization

Nested Learning and the Importance of Context in Optimization

19:33Transformer Architecture and the Hope Architecture

Transformer Architecture and the Hope Architecture

23:30Context Management and Self-Referential Processes

Context Management and Self-Referential Processes

27:28Self-Modifying Titans and Continuum Memory Systems

Self-Modifying Titans and Continuum Memory Systems

30:59Continuum Memory Systems for Optimization and Ad-Hoc Level Stacking

Continuum Memory Systems for Optimization and Ad-Hoc Level Stacking

34:09Experimental Results: Long Context Modeling and Continual Learning

Experimental Results: Long Context Modeling and Continual Learning

38:20Retraining, Continual Learning, and Future Directions

Retraining, Continual Learning, and Future Directions