Skip to content

Discussion #1 with Elan Barenholtz

A 57-minute discussion with Elan Barenholtz on whether language can be autonomous in brains, AI models, and other systems, touching on virtual governance, mathematical structures beyond physics, emergent patterns, and future experiments.

Watch Episode Here


Listen to Episode Here


Show Notes

This is a ~57 minute discussion with Elan Barenholtz (https://mpcrlab.com/people/Elan-Barenholtz/) about language and its possible autonomy in brains, AI models, and other systems linking to our lab's work on active data and the thoughts-thinkers continuum.

CHAPTERS:

(00:00) Epiphany about Ungrounded Language

(21:14) Virtual Governors and Data

(30:37) Mathematical Structures Above Physics

(40:40) Free Lunch in Language

(47:38) Minimal Systems, Emergent Patterns

(53:30) Defining Language, Future Experiments

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Podcast Website: https://thoughtforms-life.aipodcast.ing

YouTube: https://www.youtube.com/channel/UC3pVafx6EZqXVI2V_Efu2uw

Apple Podcasts: https://podcasts.apple.com/us/podcast/thoughtforms-life/id1805908099

Spotify: https://open.spotify.com/show/7JCmtoeH53neYyZeOZ6ym5

Twitter: https://x.com/drmichaellevin

Blog: https://thoughtforms.life

The Levin Lab: https://drmichaellevin.org


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

[00:00] Elan Barenholtz: I'm about a year and a half now since my epiphany, and it really was this kind of thing. I was primarily — I would have called myself a vision scientist for most of my career, and it was computational vision and it was Bayesian, broadly speaking. I've spent most of my graduate career learning, hearing about how connectionism, as it was called at the time, wasn't going to work. What happened was, circa 2013–14, I really started paying attention to what was going on on the neural network side, CNNs, and the fact that these things were doing what looked like something at least approximating a really important piece of what we call cognition. It was limited; it was labeling. It was application-wise; it was certainly not entirely clear where things were going, but these things were proto-brains in a way that re-steered my interest, refocused me. And so at that point I started to transition away from some of the Bayesian approaches to actual modeling, because now the models could do. When the large language models hit, I was flabbergasted, truly flabbergasted. I had a pre-LLM moment when the image generation models — people didn't quite notice — but they were getting so good that it was obvious that they were understanding language, even though they couldn't speak yet. The generation was all on the image side. Not only did they do a great job of creating images from text, but they seemed to have an understanding of subtext. They had a sense of humor. What's going on here? These things not only seem to understand how to generate images, they seem to understand language, they seem to understand the cultural implications of language. That was when I was like, okay, these things are intelligence. There's real intelligence here. It's a non-human intelligence, but it's there. When the large language models dropped and I specifically understood the autoregressive nature of things — they were just doing next-token generation — that's where I had my key epiphany, what I realized, in almost a literal moment, is: language itself is intelligent. Language itself has the capacity to generate itself, because that's all it's doing. It's astounding how simple in some ways and almost devoid of what we would think of as the complexity of cognition and what we have always assumed to be this underlying cognitive modeling. That's not what's happening. What's happening is there's a statistical distribution of tokens, but in the models these are just vectors that represent text. They represent text as these vectors. And then all these models are doing is representing the distributional structure of the corpus and using that to predict the next token, doing this autoregressive thing and then saying, okay, here's a sequence. How should it continue? It's not the right terminology, it's not really prediction, but how should it continue? And it's using just the internal topology of the data itself. The epiphany was, oh wait, that means language just continues itself. That means it doesn't need anything else. It doesn't need a world model. It's not an inferential process where it's trying to guess what's true about some external system. Bayes — all of that — basically the core foundation of the way I'd been trained to think about what cognition is doing.

[04:14] Elan Barenholtz: And it was a modeling of an external process. It's not modeling an external anything. It's simply using its own statistical structure to continue itself. And so this was this crazy moment of, wait, language is its own informational organism. The data itself is doing its own work. What does that mean for a couple of things? Number one is, what does that mean for language? The first thing that I was wondering was, so what do words mean? They don't mean anything, or not in the way we've historically thought about it. The word "red" — we think it picks out a color. There's this idea of reference, going back to the philosophy of language. It's about historically how humans have thought about language: that words kind of refer, that there's a deep connection between the world out there as it comes to our senses and the way that language is an extension of that, directly picking out or referring. But language doesn't need that, and it not only doesn't need it, can't have that if it's running based on this kind of next-token internal distribution; then there's no room for other kinds of computational mechanisms to be doing the generation. I'm not suggesting that there isn't interaction: when I'm looking at you and I say, "Hey, there's Michael Levin." I'm not saying that language is just running on its own. How did it find out that Michael Levin's there? It's like prompting. It's just like the multimodal language models: you can take images in and that information can be projected into the linguistic space, but the language then still is running on its own logic. I think there are really solid computational reasons to believe that this is true for language models and it's true for our language too. That was the epiphany: that's totally different from the way humans do language. I think the obvious conclusion is that if this is true about language and it has this property and it's able to do this, then that's how it's operating in us as well. It's a very controversial statement. I have a more elaborated version of why I think this is true. I have some recent computational results that strongly, indirectly, point to this. But basically, that was a big epiphany. The epiphany was, Language seems to be this ungrounded, parallel system that has its own informational structure. That's what it's doing. Somehow those properties are very useful for things like communication coordination. There is interaction with the rest of the system, but language is this ungrounded thing. That was the primary or initial epiphany. And then from there I got to thinking, what on earth is going on here? I haven't sorted out what's going on. I still don't understand.

[08:29] Elan Barenholtz: It's much more mysterious now how language actually works. What's weird about it is, in some ways, it solves a riddle. How does language have reference? And the answer is it doesn't have reference in the way that people have traditionally thought about it. At the same time, it works. It does all this coordinative behavior, but it's doing it in this much more probably indirect way than we've thought about it. But that got me to thinking about, if language has this information, this computational structure of you take sequence, and based on distributional structure, you generate the next thing, tack it on, and then run it again. Beautiful, beautifully elegant computational property. How did we get this? Where did language come from? It was the big question. How can we have this crazy computational structure that does this thing and does it usefully? And that leads me to the more conjectural, but I think a pretty solid conclusion in my own mind that there was autoregressive cognitive infrastructure already there, that the brain works this way. And so what I mean by that is that we're probably learning the distributional structure in general of how sensory perceptual information comes into our system, and then what the impact of behavioral decisions is on that stream. And we're probably doing something like autoregression more generally in cognition. So instead of thinking of predictive processing, where what you're trying to do is predict the next input, what you're really doing is probably just generating the next useful output. The sequential thought process is that the cognition itself may more broadly be understandable as this kind of autoregressive or at least auto-generative. I use this term auto-generative. Auto-generative just means you're using the—it's a system, or you can even think of it as a property of data. When data has the capacity within the data, the distributional structure of the data is sufficient to generate more data, generate more of itself, but keep things "within distribution." So it's auto-generative. So language is the paradigmatic example of this. Language has, we now know, this property that the corpus itself is finite; it's inherently finite, but you can get infinite out of it. This is the productivity we've known—language has this productivity, capacity to generate novel utterances—and we see it every day: every conversation is a novel conversation, and with the LLMs, you can ask it about anything and it will produce a reasonably coherent response. And so the idea is that the dataset is this finite thing, but it has within it the capacity just internally. There's no external rules, nothing external to it that allows it to generate. You're given a model that's able to do it. So it doesn't necessarily have to be this autoregressive mechanism. I think that's the right one. I think humans do operate autoregressively. When we talk, for example, we're producing the next word based on what came before. But it's a more general property that within the data, it has this structure. You could do it differently.

[12:44] Elan Barenholtz: You could do it diffusion-wise, for example. It doesn't have to be that you guess next token. You could guess a bunch of tokens all at once, but it would still be based on the internal structure. That's one thing that's incontrovertible from the large language models. Language has this structure to it, this auto-generative structure. So the further thought is that maybe perception, and in general, the world has this predictability to it. We might even call it physics, but it's a very different kind of physics than we think; it's not formulating in terms of here's a set of rules, a symbolic description. It's that the data, as you collect data about the way things move and the way things interact, is actually sufficient to say, under this circumstance, this is how they're going to move, this is going to interact. And that our brains have captured that, or our brains in some way are meant, they're designed, what they are, is these continuation machines. What they learn is how things continue. How do processes, what is the predictive structure such that you can take a given sequence, the last few words I've said, the way I'm moving my hand, the way this is going to fall if I let go of it, all of this — our brains are built to actually, I don't like the word predict, but to extract this or somehow represent this; you can call it predictive structure. I struggle with the word predict. The reason I don't like the word predict is because prediction assumes that there's some external thing that you're trying to model. You're saying, what's it going to do? You're matching that, and asking, okay, did I do a good job of predicting what was going to happen? I don't think the brain is necessarily doing anything like that. So I do disagree with what we typically call predictive processing. I don't think your brain is saying, "I think fundamentally this is what's going to happen next." It's modeling the internal structure of the way things happen to do stuff, as opposed to predict stuff. Prediction is something that people who play the stock market or who try to predict the weather want to know about some external process. What we're trying to do is navigate the world. We're trying to make, in real time, the right choices as to what to do and how to continue our own internal process such that it's optimized somehow for survival or for reproduction, whatever your evolutionary model is. But in any case, that's the key idea that I've been working on here: that the brain is fundamentally doing this kind of generative process, just like the large language models. People often will say, "Well, you're just glomming on to the latest technology. Once it was steam, and then it was digital computing, and now it's large language models." But this is really a computational argument. It's not about a technology. It's about the fact that technology revealed something about the underlying structure of language in this case. So these models, the transformer models, they're not smart of themselves. I think of them like jello: you put them in the mold and they will take the form of whatever structure is there. What they did was find the structure that was already in language. The revelation was that language has these properties, not that the models can learn it. And then from there, you just have to say, well, okay, given that we're the ones who created language, the brain seems to have the capacity to learn this kind of structure.

[16:58] Elan Barenholtz: Then presumably it's learning that kind of structure more generally. Very unlikely that language ex nihilo just emerged, the brain is suddenly able to do this kind of structural learning, it's distributional structure learning and continuation. Maybe that's the whole thing. That's what cognition is. Then the final piece of this that I think a lot of people see when they hear this theory and go talk to Mike about this, is the autonomy and ontology of information, of computation, that what the heck is language? It's not the words, it's not the sounds. It's an informational structure. And the fact that you can do it in a completely different substrate with a completely different representation. I'm not suggesting that the brain is literally doing tokenization in the same way that the models do, but yet they're able to capture all of the basic performance. This really suggests that language has a substrate and even any manifestation, it's independent of any of that. It's not about sounds or words as we think of them. It's about this relational structure. It's math in the most abstract way in some sense. It's about relational structure between tokens, right? Relational structure between vectors, that gives you language. So it's just that language, it doesn't live in the particular manifestation that we think of it as living in. It's not really about this, we turn it into sounds. That's how we get it out of our brain and into somebody else's brain. That's just our way of doing it. But now we know you can do it completely differently. And so that kind of speaks to almost a little bit of some sort of computational essentialism or something like that, the mathematical structure is what's doing the work. And it's manifesting, it's running in our brains in this particular way. But there's, I don't want to use the P word, "Platonic," but it certainly has hints of that; there's this kind of sense in which the information precedes, that the mathematical structure precedes any particular instantiation of it, and that it can be instantiated in this way or that way. This shadow or that shadow. These are in some ways like different projections. I don't like to use the word projection because that assumes a certain dimensionality. But these are very different manifestations, I guess you could say, of some of the same mathematical structure. And it's not just theoretical, it's happening in us. This is what's running in us. And it's very hard to say exactly, we don't have the right terminology, but that is "it from bit," that the mathematical structure is what's imposing itself on our brains and making them do the things they do rather than the other way around. So I think that's maybe the most broad, philosophical perspective here on all of this. But it definitely, I think, has some resonance with the kind of ideas that I know you've been discussing in relation to Platonism and things like that. So yeah, that was a big download. You were nodding enough; I hope it made some sense. But that's basically what I've been going crazy about for a year and a half now.

[21:14] Michael Levin: There are a few points of intersection with us. In particular, why I'm interested is because this is another amazing example of some things we're very interested in. The first concept that we deal with is this notion of a virtual governor. This goes back to Norbert Wiener: the idea that in certain systems you can compute the existence of a virtual object. Object I use very loosely. The only reason I say it's an object is because it's a target of intervention. It's an actual thing you can intervene on and relate to. The idea is that this virtual governor doesn't exist in the physical world. It isn't physically there. But once you've defined its properties, if you aim your intervention strategies to control, communicate, interact with this thing, you gain way more control of the physical stuff that is there. The original example, although he didn't write much about it, was a set of dynamos that were coupled. Because they're coupled, they're very hard to control; the whole thing is unstable. If you try to control each one, the whole thing flies out of control.

[22:44] Elan Barenholtz: That's the proper system stuff.

[22:45] Michael Levin: What's that?

[22:46] Elan Barenholtz: It's a classic complex system.

[22:47] Michael Levin: What he showed is that you can define a virtual governor that isn't there. It's like a virtual dynamo. The idea is that when you manage that one, you get much more control of the system. So it breaks us out of this idea that the targets of your intervention are physical objects. They can be these complex, and I don't want to say "state" because that's too passive. It's not actually a state. Neither is it an object in the passive sense. It's an object the way we are objects, which is not really objects at all. For us, the idea is that we study these things in physiology. The idea is that when we're looking at embryos or bodies, we've got the biochemistry and the pathways, we're fine. There are lots of people working on controlling those things, but there are persistent states that might be bioelectric, metabolic, or stress-related. They exist in different spaces. At minimum, they have their own persistence. That's minimal because you can track them across time. The matter comes and goes, but the state stays. That's minimal persistence, but more than that, they have policies of behavior. So you can make models of what these things are doing. Depending on which things you're looking at, they might be higher on the Rosenbluth hierarchy where they may have homeostatic states that they're trying to achieve—specific simple ends. Sometimes they are predictive; sometimes they have self-preservation properties. This ranges from bioelectrical patterns through persistent and repetitive thoughts that are hard to get rid of. Sometimes they even do niche construction in the tissue to make sure that they can hang around. A lot of things that aren't physical have important reality and, functionally, can be drivers of what happens. We do a lot of that in physiology. The second thing is this notion of active data. We have work, especially with Chris Fields and some new stuff, trying to dissolve this boundary between thoughts and thinkers, or between data and machine. The standard paradigm is you look at an agent, like a Turing machine: that thing is the active agent, and there's passive data it operates on. You can flip that and say no, the machine is the stigmergic scratch pad in the world. It's the data that drives everything. It's the patterns in the data that are the boss. You can start to think about how to dissolve that distinction, and you can make a whole spectrum of it. It's not a binary; it's a spectrum. This gets us to the idea you mentioned about the machine being jello: the physical embodiment—whether an evolved body, a hybrid, a robot, an LLM—is kind of an interface. It's not the main show; for this reason, a lot of the theories we have about computation and the thermodynamic costs of computation and what's possible and not possible are good theories of the front end. But in an important sense, these are thin clients through which the active—I don't even want to call it data, because even that's too passive, as I think you said as well.

[26:42] Michael Levin: It isn't just data. There are these patterns that are higher on the hierarchy. You can't just assume where they are. As I was listening to what you were saying, I had this analogy in mind. What if we think about language or text — let's say language — any instance of it as an instance of a developing organism? What I mean by that is here are some things; probably the analogy blows up in many places, but here are some things that I think are cool. First of all, it has an interesting trade-off between independence and interaction with the real world. It's independent in the sense that it is going to develop across; it has a developmental sequence. And in language, what's the next word? Well, in development, what's the next developmental stage? Part of it is exactly what you said: as an embryo, I exist in and of myself. I don't need you to tell me what's going on or for you to run my program or any of that. I'm developing on my own. So there is a first-person perspective there, whether it's a high level or not, but it's there and it's going to develop in a certain way. And yet the specificity of how it develops does have a historicity to it. The reason that you develop into a snake or a tree or a human or an anthropod or something else is that your genome is actually compatible with a huge number of outcomes. So there is plasticity, but there is some influence. There is some interaction with the outside world. Whether that's for mammals, whether that's the mom or for a lot of organisms, that's the outside world or the history — the evolutionary history of what happened. There are some inputs into this process, but there also is the first-person driver: I'm going to develop in a certain way. Whatever else happens — the metabolics, the thermodynamics, all this other stuff — the physical machine that embodies it is in many ways a side product of what's happening. I think an important part of that is that, to whatever extent anybody who goes along with this idea, they usually think of it as: Okay, there's this anatomical morphospace. There are different shapes that you could acquire. Some of them have been built by selection. Some of them — anthropods — who knows where that comes from. But let's say there are some attractors. You are the physical agent that is navigating this space. I think that is useful in some cases, but I really want to flip it and say no, you're the interface through which patterns in that space come through. In other words, the space isn't passive and these attractors aren't passive and they don't just sit there waiting to be found. Chris Fields and I are writing something else now around this idea: to what extent can you say for any problem-solving system — an intelligent system choosing what to do or looking for answers in some problem space — to what extent are the answers reaching out for you as well? To what extent is it a symmetrical thing? We have some examples of these things that we study, and I love this idea that language is one of them, and that it can do. I wonder, in terms of practical collaborative opportunities going forward, if we can. We did a little bit of it with Francisco Sacco. I'll send you a paper of sentences navigating that kind of space, but I think we can do a lot more in thinking of these things as a developmental process with its own inner perspective, and to what extent the machines that embody these things — whether they be the brain or LLMs or whatever else is possible — are interfaces through which an important pattern tries to manifest itself. I think lots of areas of intersection.

[30:37] Elan Barenholtz: I think this was when we were talking with Andreas; we had that conversation. It was a virtual conversation. I was at University of Toronto with the three of us, and I remember you had said something right along these lines: there's the idea of these computational structures being primary and then they're manifesting in these ways. Hell yeah, because that is the craziest insight from all of this. At the same time, it's very difficult: it's because we're using language and we're talking about representational schemes and we're using our representational schemes to do that. I don't want to get caught in philosophical morass. In some ways this is a lifeline to pull ourselves out of solipsistic thinking about thinking. But I do think that there's a dramatic, paradigm-shifting way of thinking about things differently that our brains are not built to do. Because of the illusion of reference or illusion of objecthood being primary as opposed to bits, that's how we've evolved to think this way. We're trying to step outside in some ways and reframe. So it's inherently very difficult to come up with ways to capture this in a way that is understandable within our framework. Because we're talking about our own framework here. We can't, in a Gordian way, really get out of our own framework. So it's challenging to think about these things empirically. If there's work to be done here, it's to inch our way towards this bigger perspective shift as opposed to individual phenomena. The individual phenomena would be towards trying to capture this core idea. What I've been doing is trying to prove in some limited form that language has this structure independently and that in people that is how language has to be operating. It's towards this broader point. Coming up with precise experiments is inherently very challenging in this context. But this is worth it. Biology is really cool. Language is really cool. But what's even cooler? Metaphysics, or whatever this is. That word could rightfully turn people off. I think both you and I are not trying to say we're anti-physical; it's not exactly an anti-physicalist story. I don't know where you fall on that. It's not the old wooey version. It's not some other substance, like spirit, which I think was really just crappy materialism. There's this other thing that operates just like the physical but is not physical. It's different. It's more abstract than that.

[34:55] Michael Levin: I have some ideas for making it practical. But it is wooey in the sense that math is wooey. So when we have this successful research program over thousands of years, where we start with set theory. I have this empty set and then I can put one thing in it and that's it. There's your start. And then before you know it, you get handed a very specific value of E — it's 2.7. And then there's all these different truths of number theory and of topology, these things that you just find; you're committed to them as soon as you do something as simple as starting with very minimal assumptions. You get all this stuff that is handed to you. And all this stuff matters in the physical world, it matters for physics. If you ask why the fermions do this or that, you find out it's because there's this symmetry group. Eventually, everything ends up in the math department, whether you start in biology or whether you start in physics. And so, what's that?

[36:15] Elan Barenholtz: It's all woo.

[36:16] Michael Levin: I realize there's people that disagree with this, but I've been talking to a lot of mathematicians recently about this, and I think there's not really any way to escape the fact that some physical facts do not fix all the facts. If you know everything there is to know about the Big Bang — you can twiddle the constants at the beginning of the universe — you are not going to change e and you are not going to change Feigenbaum's constant and all these other things.

[36:44] Elan Barenholtz: It's not in there.

[36:45] Michael Levin: It's not in there. It's somewhere else.

[36:47] Elan Barenholtz: And so it's above it.

[36:50] Michael Levin: That's right. We don't have to think about it as naive objects. That's fine. Nevertheless, these are things that matter. These are things as an engineer, when we talk about I don't know what's woo and what's not, except that as an engineer, whatever I need to worry about or whatever I can exploit, those are real. That's how I see this. If they're found in physics, great. If they're found in the math department, also great. My simple metaphysics here is that if I need to worry about these things and if I can exploit them and relate to them in a functional manner that helps me discover and build new things, then it's real. And if that's not woo, then great, then many other things aren't woo either, or maybe it is, and I don't care, it's fine. But there's something here. The thing that Descartes got beaten over the head with is: how can a non-physical thing interact with the physical body? You need conservation of anything. We already had this, never mind the biology in the brain. Pythagoras already saw this, that there are these totally non-physical facts that somehow make it through and constrain physics. And then I say they enable biology. We already had this interactionist problem. If you're into math already, you're an interactionist. I don't think there's anywhere to go from there other than that. The only thing that I think is what's important then is to figure out what we can do with this and what useful, interesting things we can do that would help us flesh it out. The direction that I'm going, I wonder if we could do this with language as well. There might be ways to do this. What the research that we're doing now in this direction is looking for what I call free lunches or heavily discounted lunches, let's put it that way. Things where you get out more than you put in. I think that's already; math is already telling us that that's the case. You constantly get out more than you put in. I think the evidence for this kind of thing that you and I are both talking about is strongest when you can definitively measure: this is what I did, and I didn't need to micromanage, and yet here we are. The standard theory says you're supposed to pay for computations, and every bit costs, or erasing a bit costs a certain amount of energy. So there's effort that you need to put in, and if you're going to have a controller that's going to do something useful, you either have to learn — you have to train on data so that you become good at whatever the task is — or you either learn or you're selected for it, part of a big set; somehow you evolved. But you have to have had history with a problem. To the extent that our conjecture — that language and some of these other things are generative and that you didn't have to put in — the prediction is that they can solve problems and navigate these spaces in ways that do not match either of those two standard criteria. They were never selected for it. They were never engineered for it. They didn't train on it. We have some stuff coming out in the next few months showing exactly that: you can use this kind of thing to solve problems even though you've had no — it's like 0, I've called it zero-shot transfer learning. It's transfer learning from where you've never had contact with the problem, at least in the conventional sense. I wonder, just to generalize this issue of getting more out than you put in, if we can do something like that with language. I don't know exactly what the measurables are going to be, but that's where I'm looking for quantifiable evidence of this stuff.

[40:40] Elan Barenholtz: Would you see the actual language models as they exist? And it's not our experiment, but are they an example of that inherently, in the sense that they end up doing all this computational work by virtue of the structure of language coming to the models for free? I'm not a little bit precise about this because this is exactly where I'm at right now in terms of actual experiments I'm doing. There are two possible interpretations of what happened with these language models. You have this big corpus, and what the models could have been doing is: I need to come up with some sort of mathematical topological structure that's going to allow me to do the task of next-token prediction. I'm going to use the data to do that, but I have to come up with this scheme. That's one way to think about what they've perhaps done. The alternative is more jello-like: they are simply reflecting the computational structure that's there, and that structure ends up doing the work — the language already has it. I've done some simple experiments showing that transformer models don't come up with their own schemes. They can't come up with their own schemes. If the data doesn't have this distributional structure already — for example, simple rules like Fibonacci or Tribonacci sequences — the models collapse. They suck at doing it. That's because the mold isn't there. There's nothing to mold to. There's not this kind of distributional structure. There's some external rule and they can't learn it. In this case, that's pretty strong evidence that language models are simply taking advantage of an existing distributional structure that's ready to do the work. From a clarification standpoint, is that an example of that? You've got the computational model itself: "I've got to do this task." It's like, "There's this thing called language. I've got to do it. How am I going to do it?" If it had to really figure out how to do it, it would have taken probably far more computational resources — on the order of the age of the universe. There's no way you would come up with a scheme to do this if the data didn't already contain this capacity. So is that a free lunch?

[43:59] Michael Levin: In general, I think that the ability to interact with, to be colonized by, or to be cooperated with by patterns from this space does provide extra compute that you didn't pay for. I think measurably so, and we have some examples of that. But the thing is, much like living tissues and brains, language models are very complex and structured; there's a lot going on, there are lots of...

[44:35] Elan Barenholtz: But they're actually not complex, really.

[44:37] Michael Levin: Understood, but there are enough details where it's hard to have a clean argument because people will say it's implicit in the structure of the data that was put in, and it's this and that. I make the argument about some of our biologicals too. When did you — we know when you paid the computational cost to design a human or a frog: it was in the millions of years of bashing against the environment and selection. How about anthropots and xenobots? When did you pay that cost? Never. There's never been any. They're like biology; these things are complex enough where people say you just haven't found — there's some, we'll get it. There's some mechanism under there somewhere that we'll get; it's too complicated. So my suggestion is, and this is where we've gone, we've started looking at first sorting algorithms, which are extremely simple and transparent and deterministic. And now we have something even simpler than that. This thing is one line of code. But with those very minimal systems, I think it becomes really, first of all, the shock value is bigger. When you see them doing things that are not in the algorithm, you can see that right away. That's even more surprising, which I like. It becomes immediately more obvious. When somebody tells you it has a trillion dimensions, you're like, who knows what it's going to do? The surprise value is almost right.

[45:57] Elan Barenholtz: I got you. know what I mean?

[46:01] Michael Levin: Something really small. I have a suggestion — see what you think of this, and probably there are better suggestions, but this is just one that I'm aware of. Remember the old Hofstadter and Mitchell "Copycat" system, the analogy-making thing? It's a soup of concepts and they're floating around and sometimes they attach and they make analogies that fit or don't fit. I think that is extremely minimal: it doesn't have an enormous amount of data going in, it doesn't have tons of code, it doesn't have a trillion parameters. Some kind of very simple system where we could quantify exactly how much extra you are getting and what's the surprise value. Chris and I are writing this paper called "Finding and Binding," and it's trying to span the range from molecules finding their binding partner to concepts binding to things out in the world or to each other, and trying to see and understand the symmetry of all that, but in really minimal systems. Maybe it's not "Copycat," maybe it's something else, but I feel like the measurability of it and the degree to which it becomes obvious to people that there's something more going on here than you would expect from a standard paradigm — we need something simple. Maybe there's some version of language.

[47:38] Elan Barenholtz: I wasn't a linguist or a psycholinguist or a computational linguist. Language is just the one that smacked me in the face. It smacked all of us in the face. And if it hasn't, then you just didn't notice the blow. So I'm not necessarily tied to language per se as the domain of research. It's just the one where I think, assuming you agree that it is ultimately an example, and you don't have to commit to that, I agree with you that massive models are doing all kinds of stuff we don't understand. It is a black box in the end, even though it's extremely simple fundamentally: it's just annealing in the parameter space, finding an optimum for this function. There's all kinds of crazy stuff happening in there. I agree that many people will never be convinced that it's necessarily an example of this. So I guess the question is, are there — maybe language. Language — we somehow ended up with this thing we call language that has these properties: it's able to self-assemble in this way. Could we think of sequences that have this kind of generative structure but are simpler? Maybe it is, and in some ways, some of the experiments I'm doing are trying to do that. What's the simplest rule for generating sequences? But those don't have this property. And that's the point. If you use a simple rule like Fibonacci, you don't get extra.

[49:46] Michael Levin: I don't know that that's true because we've been—I don't want to make too much of it until you see the paper and the data—but even what we've been finding is that even extremely simple, deterministic generative rules have these weird side quests that we didn't know about. We found them in the sorting algorithms. This thing's even simpler. I think these interesting, I've been calling them aggressions for lack of a better word, even affect very simple systems. I would not rule out. I don't know about Fibonacci. I haven't tried Fibonacci. These are all empirical questions; we have to try it and see. The really simple thing that we tried has interesting properties that you would not expect. I don't know. If it's not copycat, there are a couple of other things. Janet Wiles has done some amazing things. There's a talk by her on my channel. She's done this cool work on robots evolving languages to communicate with each other. We're doing stuff now on creating artificial interfaces for various types of systems to create composite entities. Take two things. We provide almost a corpus callosum; we provide the communication interface: Xenopod and bacteria, plant and cellular automaton, whatever, and see if we can make a composite entity out of these things. The communication, once we get the tech off the ground, is a very rich data set for looking for kinds of language. Andy Adamatsky was saying that there's some evidence of language in the signals from the fungi. Of course, people are debating that. We can start looking for this stuff. I don't think it has to be incredibly complicated.

[51:53] Elan Barenholtz: Yes, exactly. Language is probably not this crazy special case. It's what led me to say that, more broadly, cognition, but there are probably manifestations of similar kinds of things happening all over the place, maybe even cellular. And so capturing something like that in a natural phenomenon. If there's a fork in the road here, a question for myself, for you, for potential collaboration is, do we go after natural phenomena? I consider language an example of a natural phenomenon where we discovered this kind of principle. Then do we look at other phenomena, or do we try to construct or wire together novel systems in a virtual system? I'm still thinking about doing something like agents, communication, things like that, exactly to see, although my suspicion is they'll never come up with the language. I'm on record saying that, because they think language is too freaking good. If you give them a billion years, non-organisms would never come up with this, but maybe I'm wrong about that. Maybe nature came up with this stuff, but nature's really old and smart. You don't get it in a beaker, in a contrived environment, or you do. I don't know.

[53:30] Michael Levin: I suspect it's easier than we think because not because the physical stuff we build is so smart, because the patterns — they're under positive pressure, I think. That's my conjecture: when you make an interface, they come through. They're pushing out as much as you're trying to; if you're trying to implement something in this language, you only have to meet it halfway, or I don't know what the balance point is, but there's stuff that you get more out than you put in. So I think these kinds of minimal constructed systems are probably good. What I'd love to know — I simply don't know the answer to this — what do people use as a criterion for language? In other words, here's a data set: whether it's SETI, I got a signal from outer space, or whether it's I'm recording espionage. How do you know that a set of exchanges back and forth has what's formally known as language? Are we testing hypotheses for specific grammars?

[54:28] Elan Barenholtz: We're hitting on a big issue: the field is not very serious. It's an engineering field and its criteria are benchmarks. That's a very, very high-level goal that you can't do at a minimal system. Can it actually generate the right answers to these questions or right responses to these questions that are useful? I don't want to go on record saying it's not serious. It's serious engineering, very serious engineering, but it's not serious science. It's not science at all. People are not generally thinking about questions or criteria for being able to say when something's a proto-language. There's no such thing as a half language. You can get half as good on a benchmark. With large language models, there was a qualitative shift. It became language because it was almost useless on benchmarks, at least from a practical standpoint, and then it became an overnight phenomenon of real utility. But that's not what you and I are talking about exactly. That's really about functional utility for doing the stuff that people do. We're saying, what constitutes an example of this kind of structure imposing itself, doing work, what kind of work? I don't think there is a clear set of criteria. The linguists were always doing their thing on human behavior, human language. The computational folks were trying to engineer. There really isn't a lot, at least in my reading of the literature, that tries to establish these kinds of criteria, so we might have to invent that along the way. But it's okay. Shannon invented information theory.

[56:30] Michael Levin: I apologize.

[56:35] Elan Barenholtz: Sorry, I didn't realize.

[56:36] Michael Levin: One thing that I do think we should invent is some kind of analog computing version of the Chomsky hierarchy. This is something that people say: they're discrete and there are different power levels to these formal systems, but they're all kind of digital. I wonder if we could do some sort of analog computing version. Does it end up also as a hierarchy of discrete power levels, or can we do this? Is there a physiological sort of analog version of these kinds of things that doesn't have to be the digital model?

[57:28] Elan Barenholtz: We ran out of time just when we were going to get our hands dirty.

[57:33] Michael Levin: Let's do this. Why don't we go off and think about this for a little bit, specifically about some of these potential models, and we'll have another meeting and we'll talk specifically about what to code up.

[57:51] Elan Barenholtz: Own stuff, but also where we actually rubber me through it.


Related episodes