al computational unit of the neural network – the neuron itself – turns out not to be a natural unit for human understanding. This is because many neurons are polysemantic: they respond to mixtures of seemingly unrelated inputs. In the vision model Inception v1, a single neuron responds to faces of cats and fronts of cars . In a small language model we discuss in this paper, a single neuron responds to a mixture of academic citations, English dialogue, HTTP requests, and Korean text. Polysemanticity makes it difficult to reason about the behavior of the network in terms of the activity of individual neurons.

Source:
https://transformer-circuits.pub/2023/monosemantic-features/index.html

Narrated for AI Safety Fundamentals by Perrin Walker

A podcast by BlueDot Impact.

Learn more on the AI Safety Fundamentals website.

11:["$","$L20",null,{"data":{"isPreview":true,"seq":6154110,"episode":{"Id":"44beb291b0d57483baaf69f8ecc46c14a200b4fb58fa6d5939135f7a926c3419","Seq":6154110,"PodId":"3e542365ff1426ffa58325d43a34e8a37648135facd5ea3e737a74fc913657ef","PodSeq":21115,"Title":"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning","PodName":"AI Safety Fundamentals","Description":"$21","Url":"","Link":"https://www.buzzsprout.com/2334908/episodes/16381806-towards-monosemanticity-decomposing-language-models-with-dictionary-learning.mp3","LinkType":"mp3","PublishTime":"$D2025-01-04T22:00:00.000Z","Img":"https://storage.buzzsprout.com/03rohnsryl5w27ywlv5msbsbw2fk?.jpg","EpImg":"https://storage.buzzsprout.com/03rohnsryl5w27ywlv5msbsbw2fk?.jpg","Duration":"533","L