12:["$","$L21",null,{"data":{"isPreview":true,"seq":7375070,"episode":{"Id":"1f86a16ad983591d9d57c93868e8cacb58efb08404d0f3ee294ba35d1eecebca","Seq":7375070,"PodId":"c2d6b50707f47c5b2af65a35314bc77065b579cc615d7f559bf53717cbc4938f","PodSeq":24594,"Title":"How Transformers Learn Causal Structure with Gradient Descent","PodName":"Best AI papers explained","Description":"

This research investigates how transformers learn causal structure through gradient descent, focusing on their ability to perform in-context learning. The authors introduce a novel task involving random sequences with latent causal relationships and analyze a simplified two-layer transformer architecture. They demonstrate theoretically that gradient descent on the first attention layer recovers this hidden causal graph by computing a measure of mutual information between tokens. This learned causal structure then facilitates in-context estimation of transition probabilities, and the model is proven to generalize well even to out-of-distribution data. Experiments on various causal graphs support the theoretical findings.

\n","Url":"https://podcasters.spotify.com/pod/show/ehwkang/episodes/How-Transformers-Learn-Causal-Structure-with-Gradient-Descent-e33ggsk","Link":"https://anchor.fm/s/1026675f8/podcast/play/103350612/https%3A%2F%2Fd3ctxlq1ktw2nl.cloudfront.net%2Fstaging%2F2025-4-28%2F401185104-44100-2-27e83302a41af.m4a","LinkType":"m4a","PublishTime":"$D2025-05-28T22:45:34.000Z","Img":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","EpImg":"https://d3t3ozftmdmh3i.cloudfront.net/staging/podcast_uploaded_nologo/43252366/43252366-1744500070152-e62b760188d8.jpg","Duration":"00:14:04","Language":null,"SampleDuration":null,"IsVBR":false,"Transcribed":false,"Indexed":1,"Deleted":false,"RedirectSeq":null,"Source":null,"Size":null},"prevAndNext":{"prevSeq":7375069,"nextSeq":7375071},"states":{"state":"not-login","extra":{"summary":"Best AI papers explained - How Transformers Learn Causal Structure with Gradient Descent","previewContent":{"summary":"Best AI papers explained - How Transformers Learn Causal Structure with Gradient Descent","chapters":[],"keywords":[],"highlights":[],"transcripts":[]}}}}}]