Discussion between Elliot Murphy and Michael Levin 1

Watch Episode Here

Listen to Episode Here

Show Notes

This is a a ~1 hour discussion between Elliot Murphy (https://scholar.google.com/citations?user=4AYNRj0AAAAJ&hl=en) and me on the topic of language, evolution, neuroscience, and language models.

Elliot's paper: https://www.tandfonline.com/doi/full/10.1080/17588928.2025.2523875

CHAPTERS:

(00:00) Intracranial syntax and LLMs

(19:00) Probing grammar across systems

(30:11) Evolution, thought, and grammar

(40:05) Non-neural communication and models

(50:18) Analog computation and freewill

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Podcast Website: https://thoughtforms-life.aipodcast.ing

YouTube: https://www.youtube.com/channel/UC3pVafx6EZqXVI2V_Efu2uw

Apple Podcasts: https://podcasts.apple.com/us/podcast/thoughtforms-life/id1805908099

Spotify: https://open.spotify.com/show/7JCmtoeH53neYyZeOZ6ym5

Twitter: https://x.com/drmichaellevin

Blog: https://thoughtforms.life

The Levin Lab: https://drmichaellevin.org

Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

[00:00] Elliot Murphy: I use intracranial recordings in human epilepsy patients to try and isolate language. I mostly focus on higher-order language, things like grammar and meaning. I'm not too focused on phonology, speech, those sorts of things. I care about when you parse a sentence: you're doing multiple things — speech processing, world knowledge, attention, situation model construction. I'm trying to narrow down on the moments of syntactic inference. In other words, not semantic inference, like when I say hello or goodbye, or I name a movie, you're doing semantic inference. But we've looked at minimal phrases, things like red boat, he ran, she said. Even though they're not sentences, they help mitigate many sentence-level problems. Sentences, even though I'm a linguist, I hate sentences because they're problematic. They bring with them non-linguistic processes that I want to isolate. I'm trying to isolate what happens when you make an inference about constituency structure. As is often said in linguistics, people like Otto Jespersen and people like Chomsky have emphasized that language is about constituency structure — symbolic structure. It's not just beads on a string, a linear sequence. It is about building tree structures: hierarchically organized nested constituents where relations between elements are not just based on linear adjacency. For example, there are no languages where you apply morphological inflection to the 5th word of the sentence or the 6th word; it's never based on those computations. It's always based on structural properties. Other aspects of cognition can do structure building too, but language is a particular kind. For example, in music, we have a paper where we looked at a professional musician, a brain tumor patient in the OR who played piano during surgery. We had electrodes on the surface of the auditory cortex, which closely abuts language regions. We did a task where he performed a musical task — comprehend and play music — and a language task — comprehend and produce language. What we found is that there are electrodes that were sensitive to linguistic structure. When I say structure, I don't mean lexical items. There are many parts of the brain that respond to words; it's common to find widespread activity for lexical access and lexical processing. By structure I mean what happens when you combine two words, as opposed to combining two other things, or things that aren't words, like pseudowords. Alternatively, we looked at different sentence types with varying grammatical complexity. In some of my work with Carl Friston and Emma Holmes, we've figured out that you can formalize syntactic complexity, grammatical complexity, in a way sympathetic to minimum description length and other compressibility metrics, where syntactic complexity essentially amounts to a type of compression, which is vogue in the working memory and attention literature. We did an experiment where we were able to isolate neighboring sites in PSTG, the posterior superior temporal gyrus, which is involved in speech, language, and music. It's one of those regions with a mosaic tessellation of functionally overlapping and specialized areas where you'll get music selectivity, speech selectivity, and sites selective for grammar. We showed that there's an important part of the posterior temporal lobe called the posterior superior temporal sulcus, which has an interesting organization I can talk about later. It has electrodes that are sensitive to lexicality, whether a word is real. There are electrodes in PSTG that care if I say "hello" versus "hellog," a fake word. That's lexicality. But there's a more interesting component, which is whether I can derive a constituency structure or not.

[04:44] Elliot Murphy: For example, if I say red boat versus rogue boat, or if I say red bulg, in those instances you're still passing phonological information. Except the phonological information is illegal with respect to searching for a lexical item. There's no lexical concept that Bolg refers to. What we found is that there are electrodes that specifically cared about phrase meaning but not lexicality, and those were closely abutting the lexical-sensitive sites. There's a subset of electrodes that care about constituency structure, or at least the types of meanings that you can only derive from multi-item structures. There are forms of conversational meanings that you can't get from single words. Language allows you to make claims, express beliefs, and make inferences that you can't get with single words. There's something weird about language. Music tolerates musical structure; melodic complexity tolerates symmetrical structures readily. Same with mathematics. Symmetry is tolerated. Human language hates symmetry. The syntactic computational procedure that Chomsky calls "merge", other people call phrase structure building, is this kind of ability to compose together into structured units. It turns out that it reduces to a type of symmetry breaking. For example, whenever you have two words being merged together, one of them wins. If I say red boat, what does that mean? A red boat is a boat that is red. The meaning of red boat is not there is a red-like quality and it just happens to exhibit boat-like properties. John ran is a verb phrase, not a noun phrase. The meaning of John ran is that there was an event of running and John was its participant. It doesn't mean that there's a special kind of John who exhibits running properties. Whenever you merge things together, there's always a winning node. There's always an item that projects what's called the head of the phrase. Even though other cognitive processes, higher-order forms of symbolic thought like math, music, and morality, have symbolic properties, language seems to be symbolic in a very unique way. There's a specific algorithm that, for whatever reason, seems to work across languages, whereby units are assembled into trees which are labeled. You don't get that with other non-human cognition or non-linguistic systems. If that's the definition of language that we're taking — that it is a computational system with this symmetry-breaking algorithm — maybe we could use that as an indication of how the brain might solve language. There's no necessary connection, but I think there probably is. Some of these algebraic properties of syntax that I'm talking about include hierarchical organization. Other properties include non-associativity. For example, "old men and women" could mean old men and women, whether women are young, or it could mean old men and women where "old" takes scope over men and women and modifies the entire conjunct phrase. This is a very simple but mysterious example, where you have a linear set of objects but multiple structurally distinct interpretations. The mind has to have some kind of algorithm that enforces non-associativity. Another property is commutativity, which is probably more related to the underlying knowledge of language. I'll briefly mention this in respect to how it functions in language. We know that across languages, the way Italian speakers, Spanish speakers, Japanese speakers, and English speakers express the meaning of certain concepts.

[09:29] Elliot Murphy: may differ in terms of their morphophonological realization in terms of the surface form. Maybe the noun comes at the beginning, maybe the verb comes at the beginning, maybe the object comes at the beginning. But the underlying coordinates in conceptual space are identical across languages. The Japanese speakers know what a red boat is, English speakers know what a red boat is, even if they express it in completely different ways. There's some kind of underlying algorithm that doesn't really care about the order in which words are placed underlyingly, even though you may express it in the surface form in a completely different linear way. There's a distinction between the surface form of language, like the corpus or the behavior, the outward behavior, and the underlying computation. There are many examples of that. In some of my work, I've been probing to what extent large language models respect these algebraic properties of language. Maybe they don't respect them. Me and Gary Marcus, Savilina Levada, Victoria Dentella, and a bunch of other people have found that these large language models don't really respect that distinction. His brain very much respected that distinction. These large language models don't really respect a distinction between meaning and structure, whereby in language you have structural configurations that often inform the meaning, but we can make a distinction between those two things, like old men and women. Large language models are not very good at that. They're very good at generating coherent prose, but if you probe them in delicate ways, they're often unable to make a distinction between this is a structural configuration, and there's a separate semantic interpretation that is informed by the structure but is, in principle, separate. So large context models don't really show a distinction for that. They also don't show these alignments between LLM embeddings and neural recordings. What's a bit suspicious in the last few months is that despite the last three years of celebration about how large language models align with brain behavior, more recently a lot of work has shown two things. I can give you two examples. One example is they resolve best and they predict activity best in non-language cortex, which is a bit of a problem. You would expect alignment with language cortex if these were large language models and not large corpus models, which is what they actually are. Also, a lot of the previous work that's shown alignments between the vector math of LLMs and the neural activity profile in fMRI or medical intracranial recordings is driven by overlooked confounds. Things like word rate presentation, the rhythm at which words are presented, turn out to explain a lot of these effects, which I think is a big problem. To give you a more general perspective, if these are the algebraic properties of language, maybe the brain has a potential mechanism or quasi-mechanistic type of causal structure, some kind of process in which the brain can neurally enforce these sorts of properties like non-associativity during parsing to ensure that a particular parse is read out correctly, as opposed to just linear chunking. There are lots of proposals in the literature to do with cross-range decoupling and brainwaves and oscillations, whereby people have tried to explain memory and sequence processing in terms of those dynamics, where a low-frequency component might be able to coordinate and dictate the firing patterns of local assemblies. That can provide some insight into how the brain manages linear sequences. Language is forced to be a linear sequence, because when I'm speaking to you, I'm not speaking in parallel. I'm speaking in serial. The serial order is reconfigured and parsed by the brain into a structured form. There are many examples I can give to emphasize this. I asked ChatGPT a sentence. I gave it the sentence: the mechanic who fixed the car carefully packed his tools. Then I asked, what did he do carefully? The word "carefully" can modify two things: he carefully packed his tools, or he carefully fixed the car. Large language models, because they're driven by statistical patterns, say no, you would never say that. It's not free.

[14:14] Elliot Murphy: People just don't commonly do that. Therefore, the interpretation has to be that he carefully practiced tools. It cannot mean that he carefully fixed the car. Even though structurally, when a human being parses that sentence, it's true that when I posted this on Twitter, Gary Kasparov retweeted saying that I was wrong because human beings take into consideration context and nuance. That's true in parsing. But when I just give you the sentence, "the mechanic who fixed the car carefully packed his tools," with no additional information, you can make the inference that carefully can modify two things. Adding a comma, a pause, intonation, standing in an auto shop, or standing next to a mechanic would allow you to converge on the meaning more efficiently. At its purest form, the sentence string can be parsed in multiple different ways. I've tried to think of what are the relevant scales of neurobiological organizational complexity that can provide adequate explanatory power for distinct levels of linguistic structure? I have this, what I call my rose model, where I try and negotiate these levels of linguistic structure from phonology to meaning to grammar across single-cell level, local field potential level, global interneuronal synchronization properties like phase synchronization and traveling waves. I've arrived at the conclusion similar to what people like L. Miller have come to, which is that there really is no syntax cell. There's a lot of evidence for this. Cells don't really causally anchor a lot of properties of information processing, such as what drives these global, more complex syntactic inferences. These inferences are made after you've assembled multiple features from multiple concepts. You don't just go straight to syntax. In order to get to syntax, you have to go through phonology and morphology and other properties of words to get to that syntactic grammatical inference. The evidence is really pointing towards a system of phase–amplitude coupling and cross-frequency coupling codes, which coordinate and enforce algebraic properties like non-associativity. In my Rose paper, I detail some ways to derive that. There's really good evidence. I can give you one example. There was a paper that came out in December from Standard Haynes' group in France, and they had 21 acute, awake neurosurgical patients with single units. These are patients in the OR awake doing language tasks just before or after surgery. The language tasks were language comprehension, sentence repetition, auditory naming, where you make people attend to linguistic or semantic structure. Across these 21 patients, they had 1,000 good units, meaning activity from 1,000 cells. Of these 1,000 units, hundreds showed sensitivity to phonological and acoustic-phonetic information. Hundreds showed sensitivity to semantic features. But zero units out of the thousands showed any sensitivity to syntactic or structural information. There's a big rush in the field; everybody wants access to single units in humans. It's an important part of the puzzle. But I don't think we'll find a Chomsky neuron, for example. That's what I'm saying: I don't think we're going to find that. That's the motivation for where I am right now. I've done these intracranial recordings, I've published the experimental reports, but now I'm trying to take a step back and think theoretically how we can organize this research in a way that accords with neighboring literatures and that objectively enforces and respects the known algebraic properties of human language. As a caveat, if I'm wrong about these algebraic properties of language, then these other theories will also be wrong. I think the empirical evidence is pointing toward where in the neural organizational hierarchy we're going to find sensitivity to signatures of syntax, and it looks to be not at the single-unit level.

[19:00] Michael Levin: Super interesting. So I have a whole bunch of questions. First of all, when you said you analyzed LLMs, what exactly do you analyze in an LLM? Is it just the text output or do you put electrodes in it, look at inner layers, or how do you actually do it?

[19:20] Elliot Murphy: Many other groups have done the latter to try and causally probe and lesion particular aspects of the models to see when they break down. The best work that I know of is at NYU with Tal Linzon. Tal has shown that these systems have, unlike humans, an immediate bias for imposing linear rules when they try and resolve dependencies between non-local elements in a sentence: "the keys to the cabinet are on the table" versus "the keys to the cabinet is on the table" is a violation. It turns out they will always try and enforce a linear rule to figure out those paths, whereas humans do the opposite. Since infancy, there's a lot of evidence that almost as soon as possible, even in the womb, infants are trying to make these structural grammatical inferences as opposed to linear inferences. There's a controversial question: maybe these large language models do at some point learn something like a rule of grammar. If they do, they do it in a very different way than humans do. Humans have the opposite bias. We have an immediate bias to impose structure; the large language models really don't. I'm very doubtful and skeptical that they do learn anything like an actual linguistic rule. A lot of the things that look like rules turn out to be heuristics that can also satisfy very different types of structures that a human being would not count as grammatical or legal language, but which an artificial model would be able to learn no problem. So they can learn things that no human being would be able to parse.

[20:57] Michael Levin: One reason I ask is that we study diverse intelligence and all kinds of weird substrates, so cells and tissues and embryos and things. One of the things that I've often said is many, if not most, of the things that people study in cognitive science we found in other substrates, but not language. I don't have any evidence for anything that's language. But it occurs to me that we haven't looked very hard and I don't really know how to recognize it. I'd love to think that through with you a little bit. What would be a signature if you had a data stream, and this might be calcium signaling from a Zenobot, or a bioelectric signature from a cultured organ or whatever? Do you think there are any signatures to look for that would suggest there's a structured grammar here as opposed to simpler things that aren't considered language?

[22:01] Elliot Murphy: That's an excellent question, Em. In my Rose model, I emphasize this oscillatory, top-down code that involves coordinating units into structured ensembles. We really don't know enough about how individual representations, like lexical items, are represented. I suspect there's some kind of spiking code, some kind of local code where individual neurons express certain properties that conspire into a concept. So John is composed of male, animal, human, et cetera. All those features are probably stored locally and maintained in cells. Let me give you one example of how I can equate this. Matilde Marcolli, a mathematician at Caltech, and Robert Barewick, a computer scientist at MIT, had a paper a couple months ago where they looked at necessary features of language, such as non-associativity, structured expression parsing, and the recursive nature of language. They cast this mathematics in category theory instead of set theory. This is a necessary feature of language. The problem with the LLM approach and other approaches like it is that if they do arise at a language-like representation, it's not necessitated. It's by accident. It just happens to emerge in that way. That's not what we want, because we know that grammatical inference emerges very early on. They showed that the mathematics of formalizing syntactic structure allows you to draw a direct line of communication between the way they formalized syntactic inferences and things like phase synchronization. It was much easier to draw a correspondence between those neural measurements than between those measurements and lower-order spiking syntax. I think that's the perspective I take. I am not a fan of this data-driven, so-called theory-neutral neuroscience, where we just look at the brain and language will emerge if we understand lower-level properties well enough. There is evidence for downward causation and other causal structures in the brain. People like Lauren Ross have emphasized things like cascades, pathways, topological structures, and causal and non-causal entities that can potentially provide better explanatory power for human language than lower-order physical neural mechanisms. I'm not averse to lower-level physical neural mechanisms. What I'm trying to argue, with the example I just gave with the category-theoretic definition of phrase structure, is that the mathematics points towards particular modes of neural organization. I think that's really exciting. So in other words, if that's what you want from a computational theory, that's what David Marr tried to achieve. If you have the computational, algorithmic, and implementational levels, and you can formulate the computational level in such a way that it helps you narrow down the searchlight for the implementational level, then that's fantastic. If you're able to do the opposite as well, that's also fantastic. But I don't see evidence that focusing on lower-order properties of neural function will lead to examples like "red boat" emerging. I think we have to start from this higher-order algebraic description. Mathematical models are the only real way ultimately to capture these things properly and then use that as a window into how the brain might respect it, too. That's an assumption. It follows a long tradition in philosophy of science. It could also be wrong. What I'm trying to say is that, based on what we know about human language, we can begin to narrow down potential neural candidates, which would potentially help you figure out if non-human animals also express those properties. Behaviorally, the evidence is fairly strong that you're unlikely to find that. There was an interesting conversation between Richard Dawkins and Steven Pinker about a year ago.

[26:06] Elliot Murphy: Dawkins is an arch neo-Darwinian synthesist. Pinker is too. But Pinker was saying that it's likely that this property emerged gradually, whereas other linguists in the field have argued that you either have this computation or you don't. So this computation of being able to form an unbounded array of hierarchically structured expressions, this merged computation in syntax. You either have that or you don't. There's no half merge. There's no 75% merge. It's a discrete operation that requires a phase transition in the way that the brain does computation, which I think explains a lot of the evolutionary evidence related to the Great Leap Forward. Pinker was trashing this saltationist-type formalist perspective. He said, no, it has to be gradual. Dawkins said, no, it doesn't actually. Mathematically, there's no real way to show that the way humans make recursive syntactic inferences is a gradual thing. How do you do that? Do you have a proto-proto version of syntax? Do you start with one word and then two words and three words? That's not how it works because syntax doesn't care about linear structure. It's difficult to think how that could have emerged gradually. I think the evolutionary texture of that neural code can be relevant to how we do neuroscience here. In other words, which types of ancient or non-ancient areas of the cortex we can target based on how the brain is organized. There's one interesting piece of evidence from the arcuate fasciculus, the white matter tract that connects posterior temporal to inferior frontal. In non-human primates, the arcuate fasciculus is just this shape. It just goes from one curve to another. In humans, it's almost an entire loop; it's a loop structure where it cycles back in on itself. Some linguists have argued that that physical component — the nature that the arcuate fasciculus is now organized — could potentially help with this looped recursive operation, where you're embedding phrases inside of phrases inside of phrases, instead of just doing one particular search. In some of my papers previously, a lot of non-human primates can do basic morphology. They can do hawk and hawk ooh and crack and crack ooh, where you add an ooh morpheme as an emphasizer. That looks a little bit like morphology; they're adding an element and then another small element. But it's not compositional morphosyntax. It is just linear morpho. They are just combining one thing and saying here's another element to emphasize it. There's an eagle, there's a land-based predator, there's an airborne predator, he's ever close to me or he's far away. You don't really get anything beyond that. Whereas in humans, you go from that minimal compositional scheme to the full expressive power of David Foster Wallace, James Joyce, Shakespeare. There's no middle ground, there's no gradualist perspective here.

[30:11] Michael Levin: That's very interesting. My next question is to what extent can this be derived or even defined in analog computation? This stuff is somewhat beyond me, but I found one paper where somebody was trying to derive this hierarchy on top of analog computation. I wonder if it is binary, as you say, we literally then think, as we look at the evolutionary lineage, there were some parents who couldn't do it, and then they had a kid who could do the whole thing. That's the claim.

[30:58] Elliot Murphy: There's no other way around that, I think. In the literature, some people have called that person Prometheus, the first human that had this major capacity. I had a paper in 2019 called "No Country for Older One Men", where I chart a lot of the anthropological and paleoanthropological evidence. What's interesting is you see the emergence in the historical record of music, math, art, language, different punctuated moments. What probably happened is that you had this human being, and he was able to execute this merge type operation endogenously. It was originally a conceptual game. Later, there must have been a way whereby this computation was up to external systems, hooked up potentially to sounds or signs. Some people think song came first and others think sign language did. Maybe mathematics, morality, theory of mind, things like that. I know that he knows that she knows, that type of thing. Whatever the case, there was a way in which the genuine language, the core language faculty, which is just this ability to create an unbounded array of hierarchical expressions, was interfaced with different extra-linguistic components over that 200,000-year period. Now you have modern human beings, whereby we have this symbolic capacity to use language to improve and enhance all sorts of different cognitive aspects. There are some people in the literature who disagree with that. They say you have all these stroke patients, and my wife is one of them. When they lose language, they seem to do just fine with thinking, meaning all sorts of general cognitive tasks. That's perfectly true. But nobody would claim that vision is not a sensory system. If somebody's blind, they're still able to taste and hear. Vision is a sensory system, even if it's knocked out. And language is still a thought system, even when it's knocked out; not all of thought is gone. There's no reason why knocking out language should knock out all of human cognition. That would be a very poorly designed system. There are certain forms of higher order thought, mainly to do with epistemological inferences like belief and attitude and mood and those sorts of things that you can make with language. Language very much helps with that kind of self-referential, hierarchical, organizational aspect of language. Even if aphasia patients technically pass all of these general cognitive batteries, there are still potential deficits there. That's the general perspective I would give. It also segues into a secondary critique: if that's how language evolved, and if this is how language is then pointed in the brain as well, then communication must be a tertiary component of language, not even secondary, because secondary would just be externalization, expressing language in some way, like putting language out there in the world. Communication is a subset of externalization. Most of our use of language, the way we use it every day, Chomsky has made this point very well a number of times. When we use language, we often use it to think to ourselves, meditate, plan, reflect, brew over thoughts, strategize. We talk to other people, but it's not the overwhelming use of language. Even when we talk to other people, it's often just small talk and social grooming. Using language to communicate with another human being is maybe not as common as people intuitively think. That's important. I had a paper in 2020 in the Journal of Glosser, where I looked at what are the predictions of that. Are we doing philosophy or are we doing science? The predictions are that if you look at the formal structure of language, the way that the syntax and semantics interface, that would suggest that languages may be optimized for constructing or providing instructions to conceptual systems and not for expressing unambiguous, transparent, rapid thoughts to other human beings. Turns out that is the case; other people disagree, but you can show that quite nicely with the nature of certain grammatical rules and processes that are terrible for communication. There's all sorts of ambiguity and terrible things, but it's excellent for thinking. It's excellent for organizing structured symbolic thoughts.

[35:55] Michael Levin: What's the current thinking on how much of that structure is learned versus inborn versus determined by facts of mathematics? Where do you think it comes from?

[36:16] Elliot Murphy: Exactly what you just lined out is the right way people are thinking about it. When you say determined by mathematics and laws of physics and things like that, that's an excellent question. This is what my paper with Friston and Holmes was about. There's some indication that whatever the computation is, it seems to be highly efficient. There's an efficiency to the structured formation of thoughts. That takes care of the nature of the computation. What else is in it? People like Charles Liang at UPenn and others who study language development and acquisition see two camps right now. One camp says that there's basically nothing in it: it's all just domain-general learning and a constellation of generic non-linguistic factors that could just accidentally converge across all human infants, across all history, to the same syntactic inferences. It's definitely possible. The other camp says that there is a unique component to language. In fact, my Rose paper is meant to be the beginnings, the tentative suggestion is that it forms a universal neural grammar, as in there's a neural code for language that gets you the way to neurally enforce the non-associative and commutative aspects of this structure-building computation. But what exactly is a name? People like Charles Yang think that there is a structure-dependent rule bias, whereby when you parse information that happens to be relevant for language, you realize that the rules are not going to be linear. You're going to infer a structure, and then the rules operate on that structure as opposed to the rules operating on the linear surface elements. The example I gave about the mechanic fixing the car, the way you parse that is through structural information. Or if you say, I watched a movie with Jim Carrey, that could mean you watched The Truman Show, or you sat next to Jim Carrey and you physically watched the movie. The way that you parse these structures is from a very early age. In fact, there was a paper in PNAS in 2021 which showed that even up to 18-month-old infants show some evidence that they are sensitive to extremely complex syntactic rules, not basic ones like Red Boat, or John ran versus John runs, but non-local syntactic rules. So it's like, which book did Mary say that John read, where there's multiple intervening elements and the relationship between non-local distal elements. Eighteen-month-old infants show some sensitivity or some kind of recognition of those structural violations. And that's pretty wild. It goes back to what Chomsky talked about with the poverty of the stimulus: the kids just don't have enough positive evidence to arrive and converge on that. So, to answer your question, there's a structure-dependent rule or bias. The machine learning people would call it an inductive bias. And then there are also some very specific domain-general learning rules. Charles Yang talks about a few: the tolerance principle and the subset principle and some of the mathematical learning rules which are efficient domain-general learning rules. Even Stephen Piantadosi agrees here, where there are fixed and rigorous domain-general learning rules, in addition to at least one minimal language-specific rule, which is structure dependence. I don't see the evidence empirically, and I don't see the mathematical way to negotiate the structure-dependent hierarchical recursive rules in a domain-general framing of learning.

[40:05] Michael Levin: Very interesting stuff. I think we're studying the communication both within and between some very unusual and different creatures. We make some synthetic life forms. We look at cells, tissues, organs. Some of what we're doing is developing interfaces to communicate with them via language. Using AI to talk to your liver. What does it know? It knows about all kinds of physiological states and various other problem-solving things that it does. We can put a language interface on top of that that could give great access both to what it knows and to actually ask it to do things.

[40:52] Elliot Murphy: So you're trying to figure out a way to generate a more structured grammar to communicate with the endogenous organization of these systems.

[41:02] Michael Levin: These are two separate things which may or may not go together. One set of projects is to find out what the internal structure is that the system uses to store information and communicate inside itself between its parts, which may or may not have anything to do with the grammar. The other part is using various AI tools to put something on top of them that may not be endogenous, but that would allow humans, whether the device's owner or a doctor or whoever, to exchange information in a format that we can understand.

[41:39] Elliot Murphy: What I find cool in this connection is it may seem intuitive. A lot of these things seem intuitive. All animals communicate. All organisms have some motive of communication. I think the emphasis in the literature has been a mistake to emphasize the communicative role of language over its inferential role, as I've said. But also, this applies to everything. All of our intuitions—this is what Newton proved—our folk psychology intuitions about mass and motion were, of course, completely wrong. I think that's also true of language. Our intuitions about language: a lot of people get into the field of neurobiology of language and have intuitions about what language is and how it should operate. But those intuitions turn out to be true. And that was also the case with the three major branches of philosophy. Our intuition is about epistemology, that no statement can be true or false at the same time. That turns out to be true. Our intuition is about ethics, that reducing pain and suffering and increasing happiness is the primary guiding principle of living a good life. I think that's also wrong. That's too simplistic. Our intuition is about metaphysics, that everything must have a sufficient cause or reason, Leibniz's principle of sufficient reason. That also turns out to be false, as people like John Conway and Simon Kosher might have shown with their free will theorem and things like that. So I think all of these big intuitions that we bring to the table, we often convince ourselves that we no longer have them and that we're doing good science. But I think the proof is in the pudding and the proof is in the mathematics. If you can show that all these computations that I've been talking about are not in fact needed to be binary or necessary features of language processing in the brain and acquisition, then fine, then we can talk more about large language models and how they can help. How they can help me is very different from how they can help you. It's a tricky issue. One paper that came out recently showed that large language models, when they manipulate information creatively, tend to manipulate noun-related information and not verb-related information, which is not what humans do. Because nouns encode some lexical semantic features, but verbs encode the core properties of grammar. With verbs, you get mood, tense, aspect. You can establish predicate arguments and things like that. Verbs do the actual syntactic, grammatical inferences in sentences.

[44:37] Elliot Murphy: Nouns don't really do a lot of that. They contribute to agreement. But it's a bit suspicious that large language models are syntactically creative with respect to nouns, but not VIPs. We have to realize that they are large corpus models; they don't know language, they know how to make what Apple AI called in June "sophisticated pattern matching." They're very good at sophisticated pattern matching. One good use that they do have is in clinical linguistics. LLMs seem to be very good at diagnosing and predicting language deficits in clinical populations. That's a pragmatic, practical clinical engineering goal, which is not the same thing as LLMs helping narrow down causal mechanistic neural functions. That would be my skeptical take on how to use these things. For language, I never would consider replacing my linguistic theory with large language models, which is what many people in the field are actively trying to do. They're actively trying to say, "Get rid of linguistic theory. Large language models do linguistic theory better than linguists." These are literal quotes. They argue we should replace our understanding of how the brain and mind execute language with whatever LLMs give us. I think that's a big mistake. They also fail at compositional things. We have papers showing that if you ask DALL-E to generate images that have compositional meanings, they break down. For example, if one woman has glasses and the other does not, it may show a picture with two women wearing glasses. Do you have much with respect to the neural space? I'm interested in the power of electric fields in the brain and how they causally anchor a lot of information processing, and how so-called slow oscillations actually drive many of these inferences. Increasing evidence suggests that spikes don't causally anchor information processing. Electric fields travel at high speeds. When I talk to people about oscillations, a common reaction is, "What causal evidence do you have?" But you could also say the same thing to the spike people. What causal evidence do you have that spikes drive linguistic, semantic, syntactic inferences? For example, stimulating an electrode in posterior temporal cortex may render a patient unable to speak, or stimulating frontal cortex may move the monkey's eye. You obviously disrupted that aspect of the brain, but you also disrupted local field potentials and oscillations by doing that. Do you have intuitions? Are you as sympathetic as I am to downward causation?

[47:35] Michael Levin: What we study are the kind of evolutionary precursors to what goes on in the brain. Cells have been doing those kinds of things long before there were recognizable neurons or brains. This is super ancient stuff. Starting from the time of bacterial biofilms, using bioelectrics as a kind of cognitive glue to bind competent subunits together into a collective that is navigating problem spaces that the individuals couldn't do. There are some spiking phenomena, but the vast majority of it is not. These are slow analog changes, and there are spatial patterns that encode. We have this tool. I used to have my students do this by hand, but now we made a tool to do it. You can take almost any neuroscience paper and put it into Microsoft Word and just do a find, replace. Anytime it says neuron, you just say cell. Anytime it says millisecond, you say hour. You get yourself a developmental biology paper because almost all the same stuff carries over. It's just a different problem space: instead of three-dimensional space, they're navigating anatomical morpho space. We find memory and learning, mistaken perceptions, active inference, perceptual bistability, and rewritable goal set points that are actually counterfactual representations. We can make a worm that has a representative. We can see it because we can do the imaging now of what it would do if it got injured in the future, and we can change it and what it would do is not what it's doing now. It's a fictitious, two-headed pattern, whereas the worm only has one head. This is a pattern that it will use in the future if it needs it, but right now it's a latent memory that doesn't match. I see it as a very primitive beginning of this kind of mental time travel idea that you can represent states that are not happening right now. It already does that. What the cells are reading out are spatial patterns of slowly changing analog. There are stable physiological states that sometimes move around. Sometimes they hang out in the same area. All the stuff, the kind of things that you were talking about in Earl Miller and those folks are absolutely relevant to what we do. I think that's where they come from. I think these things have this ancient evolutionary history.

[50:18] Elliot Murphy: The brain is a computer. The brain is an analog computer. It's way more efficient to do computation with waves than anything else. I wonder what you think about tying in large language models in the brain. I wrote a paper arguing that large language models don't have any agency. They don't refer to things. When an LLM spits out a fact about Einstein, it's not actually speaking out a fact. He just doesn't have knowledge. But there are people like Sam Harris who weirdly think humans don't have free will. He thinks that large language models do have free will, which I don't know how he came to that conclusion. I liked your paper with Chris Fields about the strong free will theorem very much. I'm of the mind that I bring to the table an anti-physicalist Galen Strawson-type inspired philosophy of methodological naturalism, whereby I see things this way: before Newton, you had a very clear definition of what is possible through the physical. Cartesian mechanics, contact mechanics, intuitive. I've argued more recently that I think neuroscience is still in that pre-Newtonian framework, but we're desperately looking for neural mechanisms. In order for it to explain language, there has to be a mechanistic type story. Mechanisms are great, but there may be other more mysterious causal and explanatory structures that can get you the things I told you about. So before Newton, there was a clear concept of physical. We knew what physical meant. After Newton proved that there really is no such notion as physical. Physical just means whatever physicists happen to agree on. Science became not about generating theories of the world, but just about generating explanatory theories that humans can interpret. Human interpretable explanatory theories. I don't know if you would go as far as I am in terms of this Frustonian perspective that science is really in the game of just generating interpretable theories. It's not really in the game of trying to figure out what the world is really like. Maybe the best we can do is just generate coherent theories. I wonder if that pertains to things like free will. The reason I say that is because it's very common for people to say free will is just incompatible with biology. It's incompatible with known biology. Is it incompatible with known physics? No. There are loads of things. The reason I'm emphasizing this is because I have the same problem, not with free will, but even with things like language, because people will make the same argument for language. They'll say the brain does a Bayes optimal Q integration, and that's it. It's all about prediction. I agree with the people in that part of the world. The brain does do prediction. But based on our current known physics of the brain, I don't think we have any reason to sharply claim that what the brain can do is necessarily incompatible with things like syntax or free will.

[53:40] Michael Levin: I agree with that. I actually have a weirder view where we've been, but I would love to send you a couple things on that question and see what you have to say. The bottom line is that we've been looking at extremely minimal models. I'm talking about non-biologicals, for example, bubble sort. Something as simple, five or six lines of code, completely deterministic, completely transparent. What I found is that if you treat that as a behavioral system and if you relax the assumption that everybody makes, that things do exactly what the algorithm tells them to do.

[54:27] Elliot Murphy: Yeah.

[54:28] Michael Levin: Then you find all kinds of things that it's doing that are totally recognizable by any behavioral scientist, but are in fact nowhere in the algorithm. Even something as simple as bubble sort, it has delayed gratification. It does these weird side quests that are not forbidden by the algorithm. It still sorts the numbers, and that's what everybody's been focused on. But it does all this other stuff. This other stuff, I'm starting to call intrinsic motivation in the sense that the things it does are not the things we forced it to do via the algorithm. It does other things that we didn't know. In fact, 60 years that people have been looking at sorting algorithms and nobody noticed. I think all of the things that these language models are saying are possibly a complete red herring, having nothing to do with what's actually going on in there as far as what the system actually wants to do. I'm much more interested in what these systems do that we didn't force them to. So language use, I feel like that's the algorithm. We're making them do that. I'm much less interested in things that we're making them do. I want to understand what they are doing that we're not good at noticing yet. It's a weird thing where I'm an anti-computationalist. I think these kinds of systems may well have what we're interested in as far as true cognition, but it's not going to be because of the algorithm. It's going to be in spite of the algorithm, I think. So that kind of gives another, different view on free will. People usually focus on determinism or indeterminism. I think neither of those gets you what you want from free will. Randomness doesn't help with that.

[56:08] Elliot Murphy: Exactly.

Michael Levin: But I think there's this third thing. This third thing is the stuff you do that is neither prescribed nor forbidden by your materials and your algorithm. I got to run, but I'll send you a link. We have an asynchronous symposium on Platonic space. I've gotten together a bunch of people, computer scientists, philosophers, mathematicians, some physicists. We're talking about the notion that the hypothesis is a structured latent space from which these kinds of ingressions come. So they're not random, not just complexity or unpredictability, but, akin to Platonist mathematicians, the idea is that there's a space we can study that is where these things are drawn from: forms of behavior, forms of anatomy, and so on. Take a look at that and maybe let's chat again because I'm interested in what you have to say about it.

[57:16] Elliot Murphy: One final quick note. That sounds a lot like what's up my sleeve because there are properties of the universe that might not be easily deterministic or fall into the randomness category. Language seems to be one of them. There are a lot of aspects of language that don't clearly fit into that account.

Discussion between Elliot Murphy and Michael Levin 1

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Related episodes

Platonic Space discussion 3

Conversation with Darren Iammarino #1

"The Bioelectric Interface to the Collective Intelligence of Morphogenesis" by Michael Levin

Discussion between Elliot Murphy and Michael Levin 1

Watch Episode Here

Listen to Episode Here

Show Notes

Transcript

Related episodes

Platonic Space discussion 3

Conversation with Darren Iammarino #1

"The Bioelectric Interface to the Collective Intelligence of Morphogenesis" by Michael Levin

Thoughtforms Life Podcast