Skip to content

Conversation 1 with Nora Belrose: AI, sentience, and Platonic Space

Nora Belrose joins a 65-minute discussion on consciousness, AI sentience, and Platonism, covering moral stakes, dynamic Platonic patterns, vulnerable minds, and implications of simulated and copied minds.

Watch Episode Here


Listen to Episode Here


Show Notes

This is a ~1 hour 5 minute conversation between Nora Belrose (https://scholar.google.com/citations?user=p_oBc64AAAAJ&hl=en) and me on the topics of consciousness, AI, and different views of the Platonic Space.

CHAPTERS:

(00:00) Intro and moral stakes

(11:51) Sentience as irreversible sensitivity

(20:57) Dynamic Platonic patterns

(31:45) Platonism, passability, Whitehead

(41:55) Vulnerable minds and forms

(49:43) Intelligence, agency, light cones

(01:00:41) Simulated minds and copies

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Podcast Website: https://thoughtforms-life.aipodcast.ing

YouTube: https://www.youtube.com/channel/UC3pVafx6EZqXVI2V_Efu2uw

Apple Podcasts: https://podcasts.apple.com/us/podcast/thoughtforms-life/id1805908099

Spotify: https://open.spotify.com/show/7JCmtoeH53neYyZeOZ6ym5

Twitter: https://x.com/drmichaellevin

Blog: https://thoughtforms.life

The Levin Lab: https://drmichaellevin.org


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

[00:00] Nora Belrose: Should maybe just start by introducing myself briefly. I'm Nora Belrose. I'm the head of interpretability research at Eleuther AI. We're a nonprofit open source AI research organization. In my day job, I've been looking at how AI works, broadly speaking. But in my free time over the past year or so, I've been thinking a lot about the big picture questions related to AI. Specifically: can AI be conscious or sentient? Can an AI be a person? Do we have any ethical obligations to AI or in what circumstances would we have obligations to them? When I started out, I took it for granted that a sufficiently advanced AI could be conscious, could be a person, et cetera. Of course, the devil is in the details here. What does sufficiently advanced mean? What does it mean to harm an AI? How do we weigh the interests of an AI with the interests of a human? All of those sorts of things. I was trying to develop a theory. Maybe you can't make it completely formal, but I was trying to develop some theory, for lack of a better word, about all these questions. As I kept thinking about it, I felt I kept being forced into more and more absurd conclusions. I couldn't come up with a theory that seemed to make any sense. So I went back to the drawing board and thought maybe this is the universe telling me that I've made a mistake, that there is something importantly different about AI that we shouldn't really be viewing them as sentient. My initial concern, and I think you share a lot of these same concerns, was that we would create AIs that can suffer and we wouldn't recognize them as capable of suffering. So we would cause a moral catastrophe: we would treat them as slaves. I was more worried about the false negatives — not recognizing a sentient AI as sentient — rather than recognizing an insentient AI as sentient. Would you agree that that's your primary concern as well? Or how are you thinking about this in terms of false positives and false negatives?

[02:49] Michael Levin: A couple of things, and maybe you'll get to this later in the talk. But first, to think about what we mean by AI. I think about the full spectrum from standard biologicals, cyborgs, hybrids, every combinant, because a lot of issues become really clear when you think about not just language models sitting in a server somewhere, but these hybrid things, which I think are problematic for some of the theories that are sharp categories. But I agree with you. I see a lot of people talking about the dangers of humans misattributing morally important qualities to objects that don't have them. I agree that could cause problems. But what I see, and I'm no historian, looking back at the history of humanity, is the exact opposite. Humans are amazingly willing to divide in-group and out-group dynamics based on ridiculous differences. Humans latch on to some particular little thing that makes others out-group: they don't suffer like we do, they're not real beings, and we don't have to worry about them. I see that problem as much more pressing. I'm really worried that once we latch on to what it takes to make artificial minds in large quantities, we do it anyway. We have kids and we breed animals; we do it anyway, but doing it at incredible scales, which I think we can, is potentially a massive problem.

[04:32] Nora Belrose: I think we were on the same page at the beginning of this year. I definitely see where you're coming from. I definitely agree that we have to think about the whole spectrum. You do a good job of pointing out there's cyborgs, hybrots, all these different types of things. I was hoping initially that we could basically build a society where AIs, cyborgs, whatever you want, can live together in harmony as equals. I've come to the conclusion that that doesn't work; it definitely wouldn't work in the way that I initially thought it would. I think that AIs, and when I say AI here, I am mainly thinking about disembodied programs and computers, although I do think that most of the stuff that I'm talking about also does apply to robots and things like that. But I think that the issues are much clearer if you're thinking about the disembodied kind.

[05:53] Michael Levin: That's something we should get back to in this discussion, because I'm not even convinced that sitting in a computer somewhere means you're disembodied, and I have some odd thoughts that we could talk about.

[06:04] Nora Belrose: I do think we should make that distinction. I appreciate the problematizing of that distinction. If we're considering purely digital AIs, I don't think those types of AIs and humans can ever be equals in any meaningful sense, because our concept of equality doesn't really apply to things like AIs. It makes sense when you're dealing with biological organisms of the same species. But AI, we have to think in terms of a spectrum.

[07:00] Michael Levin: I agree with that. I don't think we should be focused on equality with humans per se. I'm thinking of we share the planet with many living things that we don't consider equal in some sense but are nevertheless of moral relevance. I assume if aliens come down with weird cognitive capacities that are not exactly equal to humans, I would hope we would figure out some way to also have some kind of symbiosis with them. I agree with you. I don't see any reason to say that AIs are going to be like humans, but I also don't think that's necessarily the criterion we should be going for.

[07:40] Nora Belrose: If equality is not the thing that we're aiming for, it seems like we need to have a way to put AIs, humans, cyborgs, et cetera, on a hierarchy. You could say it's a spectrum, but it is basically a hierarchy of, okay, these beings are more valuable in some sense than these other beings.

[08:11] Michael Levin: So valuable, but also with more responsibilities. I think we face some of that in our court system already — there is this notion of diminished capacity. That doesn't mean you're less valuable, but in some settings it means we don't have the same expectations, perhaps because somebody had a brain tumor, congenital malformations of the brain, or too many Twinkies as in the ******* defense. We have that notion. I think we're going to be slammed in the opposite direction where you end up getting various kinds of cyborgs showing up in court where everybody can say, you've got 60 IQ points on all of us. You should have known better. We may not have known better, but you might be more responsible. So I agree that there's going to be — I don't think it's a single linear axis, but there is some kind of space of diverse minds that we're going to have to wrap our heads around.

[09:11] Nora Belrose: Well, I would agree with all that. The example that you just used with a cyborg who has 60 IQ points on all of us is the kind of example that we should be thinking about more. A lot of people, when they're thinking about AIs and maybe cyborgs—let's just stick with AIs here—want to say that consciousness, sentience, moral worth are basically proportional to intelligence. Even people who don't say that they're ascribing to that view often invoke the Turing test; the Turing test says this AI is indistinguishable behaviorally, so therefore it's just as good as a human. A lot of people tend to have that attitude. There's a profound mistake. We cannot and should not say that consciousness is proportional to intelligence, especially if intelligence is proportional to the ability to achieve goals, because then you're saying that might makes right. That makes the moral value of a creature proportional to how strong it is, how powerful it is, how well it can control its environment.

[11:03] Michael Levin: Yeah, I agree with that. I think, and this is Anil Seth's view as well, that intelligence and, I don't know if we want to talk specifically about consciousness, but let's say those don't necessarily track each other in biologicals. They probably seem to, but even then we're not very good at following up on that at all. I mean, I think that dogs and pigs have probably — it would be really not trivial to say who's smarter, and yet we treat them, at least in the West, very differently. And so we don't use that criterion reliably. But I agree with you. I think that when we make these synthetic kinds of things, we absolutely can dissociate those two parameters.

[11:51] Nora Belrose: To help you and the audience get a sense of my view, I do think my view is very similar to Anil Seth's, although I think I have some additional arguments that I would appeal to that I don't see him appealing to. But I'm broadly a biological naturalist, I suppose, about sentience. I'll use the word sentience. That is my preferred term. Let's just say sentience. I would like to argue that sentience is a type of sensitivity. It's not strength. So it's a type of sensitivity to your environment. You can say it's an irreversible, unstoppable process of learning and forgetting. And I think the irreversible, unstoppable part is actually really important for a few reasons. One of them is that I think sentience or consciousness is closely connected with the arrow of time. I follow the French philosopher Henri Bergson, who made the same point that consciousness is tied to duration, which is a special type of time. It's lived time. It's time that has an inherent direction toward the future. He has all sorts of arguments about that. I think it actually connects in interesting ways with quantum mechanics. This is galaxy brained, but I think it's actually true. Scott Aaronson, the quantum computer expert, has written some papers on this, and he points out that if we take the posit that consciousness has to be fundamentally irreversible, and he explains how you can formalize that in terms of physics. If consciousness has to be irreversible, then that means that it can't exist in a coherent superposition, which actually seems to resolve some issues with thought experiments such as Schrödinger's cat. It basically means that the cat in Schrödinger's cat that's supposed to be a superposition of alive and dead would have to be unconscious. The same thing applies to Wigner's friend. So that's just another piece of evidence that I find fascinating: if you assume irreversibility is important, then it seems to resolve some of these quantum mechanical paradoxes. The most important reason why I think irreversibility matters is that our ethics, our ethical concepts basically presuppose that our actions have irreversible consequences. So I think that when the badness of killing an organism or a person is due in large part to the fact that it's irreversible. You can't bring them back. And that's even true for benefit and harm of any kind. Whenever we cause pain to an animal, the badness of that is related to the fact that you can't just perfectly rewind time and undo what you did. Whereas if you think about computer programs, you basically can do that. So if you have a simulated pig, as opposed to a real flesh-and-blood pig, with a simulated pig, you could simulate causing it pain. But then you could just wipe all the records of you ever having done that and reset the program state back to the original, and so you can just perfectly undo your action on the pig.

[16:38] Michael Levin: That's assuming that the physical facts are the only facts. If whatever you had done caused some amount of—let's say there was some conscious experience associated with whatever you had done, resetting the body—it doesn't wipe away the fact that experience was had. You can reset it again, but if we take time seriously in that sense, then whatever suffering there was, they had in fact taken place.

[17:14] Nora Belrose: I'm actually pretty sympathetic to idealism. I think these arguments don't necessarily presuppose physicalism. We should assume that there is a close connection between the physical and the mental here. I'll put a pin in your objection and address it later. Scott Aronson makes this argument that moral responsibility and moral reasoning require irreversibility. I do think that's a good argument. But you can make this point clearer by thinking about some other thought experiments. These aren't just thought experiments; you can actually do the experiment. If you assume that a digital program can be conscious, we're forced to say that you can optimize the program to have the happiest experience possible. You have to have some concept of what it means to be happy. Let's say you have some way of measuring happiness. Once you do that, you can repeat it over and over, copy the program onto many computers, and run it on all of them; every instance is exactly identical. If you're a utilitarian, that's the best thing you could do. You find the most happiness-producing program ever and fill the entire universe with that. Even if you're not a pure utilitarian, this is still somewhat of a problem: you have to determine how we actually count copies of computer programs and how we measure that. I think this connects to your idea of the platonic space because I'm inclined to agree that there is a platonic space of mathematical objects and so forth. In this case, where you've got one program that's being instantiated in many different places, the program itself is not changing. It's not getting affected by anything. The program is just this static, eternal, platonic entity. As you often say, the platonic space is under positive pressure. So the program existed even before you wrote it. When you wrote the program, you were the vessel through which the program came into the world.

[20:57] Michael Levin: I don't think they're static and unchanging. That's a whole other kettle of fish.

[21:02] Nora Belrose: I know you've said that before. I think that is wrong. I would say there's a dilemma. Either you take the classical Platonic view and say the Platonic space is unchanging. In that case, none of these programs can be sentient because they can't be affected by anything, and we can't have any ethical obligations toward them because they're eternal and unchanging. That's one horn of the dilemma. If you go in the other direction and say the Platonic space can be changed, my question is: how do we actually know when we're affecting the Platonic space when we're adding something to it versus just interacting with the physical world? Do you see what I'm saying? It seems hard to reason about how we are interacting with the Platonic space. How does that work?

[22:18] Michael Levin: I'm not going to pretend it's not hard to reason about it. There are many, and I'm not sure of any of this stuff. This is just my best guess right now as to what I see happening. Before we get to the Platonic space, one other thing to point out is the algorithms with the maximal happiness or whatever. There's a kind of biological version of this because you could imagine creating shrimp or rats or something with the pleasure center firing full bore all the time. It isn't just an issue for algorithms; for whatever the moral benefits of doing that might or might not be, there's a biological version of that too.

[23:08] Nora Belrose: I don't think we should try to create a massive farm of rats on heroin. But I want to say there is a difference. The rats on heroin, each one, is unique. Even if they're genetically identical, they definitely have different memories, different epigenetics, all sorts of different things. Actually, this is an argument that Bergson makes, that every experience you have changes your memory. It necessarily affects how you experience things in the future. Even if you're just staring, I think the example he uses is: you're staring at an emotionless physical object and you try not to move and just stare at it. Your conscious experience is still changing because you have a memory of the time passing and implicitly you're noticing your heart beating and all of these things. I would say the same thing is going to happen with the rats: they're going to habituate to the heroin and each one is going to do so in slightly different ways. I do think that actually is different from the exact replica case.

[24:54] Michael Levin: We can talk about the Platonic space now. If it's the indexicality of it, that's important — that these guys are the same, where let's say all algorithms of a certain type can be identical, but the biologicals can't. I think that's true. Even bacteria, you're not going to make identical bacteria no matter what. Let's talk about the Platonic space. Here's how I see it currently. I think that when we're talking about the Platonic space, what we mean is a space of patterns, some of which probably are static and unchanging. It contains some objects mostly that are currently enumerated by mathematicians, and the value of E and some of this other stuff may be eternal and unchanging, and maybe it's not doing anything. But I think that what might be happening is that there are also other patterns in that space that have different properties and they do change over time. I'll give you an example of some things to crawl up that axis. Think of the liar paradox: "This sentence is false." If you give it a time, Patrick Grimm did this whole thing of plotting logical sentences as dynamical systems. If you give it a sense of time, what you have is an oscillator, and it sits there going true, false, true, false. Not exactly a scintillating conversationalist, but also not static. You can imagine you have some patterns in the space that sit there like rocks and do nothing other than be what they are. You have other patterns that sit there and do this. Once you have that, you can have other more complex patterns. For example, Grimm showed coupled sentences, sentences that refer to it, groups of sentences that refer to each other. Those can have very complex dynamics. They can be dynamical systems. They can be fractal, they have attractors, and they can be in multiple states. We have a project where we're training those things — giving them stimuli and finding different types of memories. Maybe you can have some of those kinds of things and even much more complex patterns. Those patterns are typically not studied by mathematicians. They're studied by other disciplines in cognitive science. What I think is happening when we make physical objects is that we're really making various interfaces.

[28:18] Michael Levin: Through which different sets of these patterns come through. Depending on what you make, you might get some input from E, Pi, or some other stuff. If you make something that's a mouse-like body or a human body, you're going to get some patterns that are much more complex, typically associated with kinds of minds, behavioral propensities. So your question is very good. How do you know when you've made changes to the back end versus changes to the physical world and the interface, assuming there even is a physical world, which I have some leanings in the idealist direction too. For practical purposes, assume this is a dual system where we have to take both sides equally seriously. The question is, when you get these ingressions from the space, what are you actually getting? There are a few things you could get. You could get a static pattern, the standard view that these things are unchanging: you either get it or you don't. But maybe what you're getting is more of a dynamic, stateful pattern. In other words, you can store stuff on the other side of it too. I'll give you the biological version and the digital minimal version. There are interesting clinical cases of humans with quite minimal brain volume and normal IQs. You can try to shoehorn this into the standard neurological paradigm and say that there are redundancies: we all walk around with massive redundancies, even though head size is a real problem at birth, and that somehow sometimes you can get away without. The actual neuroscience paradigm doesn't predict that. It can maybe be made compatible with it, but doesn't predict it. So there are data like that that make you wonder whether some of what we're looking at isn't a front-end interface as opposed to the whole thing, that maybe we're looking at a thin client when we look at our brain. Those kinds of things are very difficult. We've been studying them in very minimal systems and computational systems. What we're seeing—the sorting algorithm stuff is published, the other stuff is unpublished, so stay tuned in a few months for that. I think it's not crazy to hypothesize, and this is what we're investigating now, that what you're getting is extra compute, specifically extra compute that you didn't pay for in the physical space. I think that's very important because in physical space you have to pay. There's an energy cost for information processing, specifically for deleting and forgetting. If you can show examples where you got extra compute that doesn't appear to be paid for in the physical world, you have some evidence that it has to happen somewhere. That to me is one way. There are two branches of evidence that we can gain that stuff is changing on that side. One is the biological version: where memories are stored in cases of minimal brains. The more tractable side is the computational one: to demonstrate cases. I'm not saying we've done that yet, but I think it's not implausible that we will demonstrate cases where you're getting extra compute for free. It's not really for free, but it looks free, like a free lunch from the physical side. So that is a research program where you might be able to gain evidence that something like that is happening.

[31:45] Nora Belrose: A couple points I wanted to make. The liar paradox, that's a fun example. In those sorts of cases, what a classical Platonist would say is, there is a notion of time within the dynamical system. You've got a dynamical system modeling the liar sentence, or maybe a network of self-referential sentences, whatever, and there's a notion of time in it, but they would say that that is internal to this abstract object. The abstract object itself is eternal. One way to make that clear is to say, if the liar paradox dynamical system were truly changing and truly temporal, then you should be able to ask a question like, "What is the truth value of the liar right now?" That doesn't make any sense, presumably. If it has a notion of time, it's a different notion of time to ours. It's an internal notion of time.

[33:07] Michael Levin: I see it, and I think it's definitely an interesting challenge to flesh out what is going on with time in general, because when you're dealing with a realm that has physical space-time, or maybe doesn't — if you listen to Don Hoffman, he'll say that the whole space-time thing is an old model that's going away — you're dealing with realms with different times. I'm not going to pretend I have a good account of what time is doing, but it seems to me that this notion that it has its own internal time is actually a feature and not a bug, because that's the kind of stuff we would expect from a mind: that it has an internal sense of time that isn't necessarily the same as other observers it may interact with. And so my fundamental perspective here is that we are not physical bodies that occasionally get the benefit of these patterns that we can somehow tune into. I think we are the patterns, and the consciousness that we're interested in is the experience of looking out from that world through an interface into the physical world. There's some extremely minimal mind that is basically an instantiation of the liar paradox on that side, and then there are bigger ones like us. The fact that it has its own time sounds right to me, and I agree with you that there are real problems, like what's the framework, what's the frequency of the liar paradox. Partly it's the frequency of whatever mind is interacting with it at any given moment. There might be some crazy chemistry on that side where two minds directly interact — Darwin thought that mathematicians have literally an extra sense. Maybe people who are good at math don't have to go through the physical world directly. They also have direct access to some of these things. When you're interacting with those objects, in that interaction you somehow thrash out what's your timeline versus my timeline. I haven't even begun to flesh this out yet. Lots to work out. I don't have the answer.

[35:19] Nora Belrose: I'm wondering: the dynamical systems examples that you gave are not examples of passable Platonic form. Is it a concept from theology?

[35:39] Michael Levin: Say that again, of which kind?

[35:41] Nora Belrose: Passable.

[35:42] Michael Levin: Passable? I'm not familiar with the term.

[35:45] Nora Belrose: Traditionally, God and the Platonic forms are often called impassable, which means they can't be affected by anything.

[35:55] Michael Levin: No, interesting.

[35:56] Nora Belrose: And in particular, they can't suffer, but they can't be affected. That's a classic Platonist and Christian view. In contrast, we on Earth, in the realm of change and decay, are passable. We can be affected and changed. I don't think those dynamical systems are examples of genuine passability. I wonder if your platonic space paradigm would be helped by using some concepts from Alfred North Whitehead. You must be at least a little familiar with Whitehead. The way he thinks about this is he explicitly says that there are eternal objects and they're unchanging. There's never a new eternal object created. But he has this other thing: a kind of cosmic memory, which he actually puts in the mind of God, God's memory. He says that the universe or God has this memory of everything that's ever happened. Every event can look back on the whole past, and it determines which events in the past are relevant for it and so forth. If you want to think about immaterial memory, that might be a framework for doing it. I don't know.

[37:48] Michael Levin: I don't disagree. I think that's probably reasonable and useful. At this point, my purpose here is not to flesh out a full theory of ultimate reality or something like that. It's nice to think about those things, but really the only thing I'm responsible for is a minimal theoretical framework that is compatible with, and more importantly, moves forward the research that needs to happen. This dualist thing that I'm pushing, I'm totally comfortable with the idea that at some point somebody's going to unify this into some monism. Fine, I'm sure. I don't know, much like with idealism, what to do with that at the moment, if that's true. Any of these things that help us understand are important because there's some very pressing problems. These, for us, come in a couple of flavors. One is the biological. When we make novel life forms, let's say xenobots or anthrobots, we know that the computational cost was paid over millions of years of selection in various environments. But all of a sudden, you create something that's never been here before. You can't really blame the specific selection for what it's doing. When was all that computed? You have to say something about that and also about what distribution these things come from, because it's not enough to say it's emergent; you have to eventually be able to say why you got this versus that in these novel circumstances. The same thing on the computational front. We found that even extremely minimal algorithms are doing things that are not explicitly in the algorithm. I don't mean just complexity or unpredictability, because that's pretty cheap and easy to get. I mean things that a behavioral scientist would immediately recognize. That's suspicious to me. The fact that even very minimal interfaces are showing up with that stuff seems to me just like what happens in math, where you often get more out than you put in. You start with basic set theory, and before you know it, you got a specific value for E. And you're like, wow, where'd that come from? This is the thing that I think we need a framework to address.

[40:24] Nora Belrose: I did take a look at the sorting algorithm paper that you did a little while ago. I'm happy to say that I don't think it undermines any of my arguments to admit that Platonic forms are entering into the computer here, and you're getting more and more out than you get in. I would say that sentience is... Sentience is not a Platonic form, and I don't think it's something that a Platonic form can have. It's almost the opposite of a Platonic form, because I want to insist, and I know we've disagreed a little bit on this, that if we posit these Platonic entities, we should say that they are impassable and unchangeable. If we say otherwise, we run into paradoxes and it gets confusing, and it's much simpler to say that they're unchanging. Once you say that, sentience is a matter of being vulnerable, of being sensitive, of being affected.

[41:55] Michael Levin: I don't disagree with you on that one. Vulnerability, and this gets into two things which you've probably seen. It's the mortal computation stuff. Richard Watson has some really interesting work on this: systems that oscillate between pushing out onto the world and being influenced by it. So this dynamic back and forth of the part that you emphasized, which is being imprinted or changed by the world. But then his is this back and forth. So I think that's fine. I'm still on the side that if you make them unchanging, it doesn't help very much. So I am on the other side of this. But I think we agree on one other thing: whatever sentience these systems have, I don't think is because of the algorithm they're executing. I think if anything, it's in spite of the algorithm. In other words, I do think there's something unconventional going on that isn't accounted for in the standard way we do computer science, which is why for these language models I think the language is a red herring and a distraction. People focus on the language and all it says is it's thinking about this and maybe it has these goals. I actually think, and I have no claims about language models because we don't know and have to do experiments the way we did with the sorting algorithms, that whatever degree of that stuff they have, it's not going to be because of what they're saying. Those two things might have nothing to do with each other, or they might have a little bit to do with each other, but I don't think you can derive any of that from the stuff it says. I don't think we know what level of anything it has because the language is fooling us. It's not what we should be looking at.

[44:06] Nora Belrose: Yeah, that's fair. I'm trying to think of where we should go with this. I'm wondering how cruxy this question about the platonic space is. I'm wondering if you could expand a bit on your comment that if the platonic space doesn't change, it's not very useful.

[44:46] Michael Levin: I was agreeing with you that if it doesn't change, then that can't be where the consciousness is, because I agree with you that being able to be changed by your experience is important. The other way to say this is it's the whole point: if this thing's under positive pressure and these patterns love to get into the physical world, if they can't be changed by it, and if nothing ever changes, what is the point of that? It's just the metaphysical bias that I have, but it seems boring and useless and there's no point in any of that. I've got to think that they're ingressing into this world because they're getting some sort of development out of it. There's something happening where they've done something interesting, or somehow they're ratcheting up, where something is changing. So it's not useless, because the lessons of math hold regardless: the fact that physical facts aren't the only facts. So I'm not saying unchanging forms are not viable. I think they're viable. I just don't think they help us very much with what we want to know.

[46:12] Nora Belrose: You were saying that if they're unchanged, you were making two claims. One is the forms are under positive pressure and that seems agentic.

[46:32] Michael Levin: This is not a good argument. It's not an argument. It's a statement of my metaphysical bias. But my bias is that if they're not changing, I don't see the point. I don't see why they're impacting the physical world. I get it. There are views of the universe where there isn't any point to anything. I don't resonate with that. It seems to me a reasonable scenario here is that it matters for both sides what happens. It's not a purely flow down thing.

[47:09] Nora Belrose: I've read most of Process and Reality, although he's hard to understand, so I might be butchering this a little bit. The basic idea is that eternal objects are — I'm not sure if you would say they're in the mind of God, but they're definitely organized by God in some sense. He would say the eternal objects are not identical; they're not changing, but God is changing. That's one of the things that a lot of traditional theologians object to: God changes. Whitehead says God changes. For him, he would say, in some sense, God helps determine when which eternal objects are relevant in a given situation, and God is the one who's getting something out of it in some sense.

[48:10] Michael Levin: I'm okay with a structure of that Platonic space where there's a bunch of really low-level stuff that just never changes. I've been taken to task by a couple of mathematicians who have said, because I call these mathematical things "low agency forms," "How do you know? Have you tested that?" You're right. I haven't. So maybe I'm wrong there. But there's this middle stuff that's all of us and some other things, and then ultimately we are all some kind of projection of some much larger universal pattern that's doing something else. I'm okay with that. Seems fine.

[49:07] Nora Belrose: Just to try to see where we still agree or disagree. You agreed that consciousness and intelligence should be distinct.

[49:26] Michael Levin: I think they can be distinct.

[49:27] Nora Belrose: They can be distinct.

[49:28] Michael Levin: It's an interesting question, to what extent do they track? I do think that it would be very hard to say that a system has a high degree of consciousness and can't solve any problems whatsoever.

[49:43] Nora Belrose: I think that's hard. Maybe I'll use the last ten minutes to tell you about a weird idea I had. I said before sentience is the ability to be irreversibly affected. It's about learning and forgetting. I'd like to relate that conception of sentience with two other concepts, intelligence and agency. I like the idea you've proposed before, the cognitive light cone. There is a future cognitive light cone and a past cognitive light cone. We should distinguish these because they're different. The future cognitive light cone measures agency. It measures how far your goals can reach—what goals you can pursue successfully. But in order to have a future cognitive light cone, I would argue that you need a past cognitive light cone, which is looking at how much of the past, how many memories, how much experience you can coalesce, compress, or bring to bear on a problem.

[51:32] Michael Levin: I like that.

[51:33] Nora Belrose: Because P does not equal NP, you can't do a brute force search over the space of possibilities, and so you need heuristics and experience. There's clearly a connection between agency, the future light cone, and what I would call intelligence. I would say intelligence is more like the past light cone. I think intelligence is a resource, but there are other resources that also enhance your agency. A CEO or a president, someone who has a lot of money, someone who has a lot of political power, might have a lot of agency in the sense that they can pursue goals really effectively. But that doesn't necessarily mean they're more intelligent per se. Does that make sense?

[52:37] Michael Levin: That makes sense. My concept, and it needs still a lot of work that's underway, my concept of the cognitive light cone: the cone by itself is not so much the success with which you do these things. It's the size of the largest goal you can represent. Those things are connected, but it does need this extra thing: can the CEO represent larger goals than somebody with no money whatsoever? I'm not sure that that's true. One has more ability to follow up on those goals. But as far as representing and working towards those goals, I think those are slightly different, they're not the same, but I think they're both important.

[53:28] Nora Belrose: That is fair. I think if you're careful in defining future cone maybe that makes more sense. I would like to allow for non-agentic intelligence. There is a relation between agency and intelligence, but I worry that people connect them too closely or don't even distinguish them at all. I want to say intelligence is an important ingredient in agency, but it's not the same thing.

[54:05] Michael Levin: I agree with you. I don't think it's the same thing. I think we should try drawing some curves. At the lowest end, maybe at the highest end, they come together. I find it very hard to imagine these two curves; I don't think they track each other perfectly at all. I think they're disseparable, but certainly, at the lowest end, it's just very hard for me to imagine what we mean if it's a system that doesn't do anything, or it doesn't navigate a problem space at all. It seems hard to say that it has a lot of consciousness. Something like that probably on the higher end too, but in between, these curves can be doing all sorts of wild stuff where they're not tracking each other. That's reasonable.

[55:00] Nora Belrose: I would say that large language models right now arguably have a really big past light cone. That's part of my motivation: there's a clear connection between data and capabilities. We should be thinking about how these scaling laws are just measuring how much of the past or how much experience you can pour into something. They can pursue goals, but many of them still have an asymmetry: they're pouring in tons of data from all over the place, but they can't represent goals very well.

[55:53] Michael Levin: So I don't disagree with most of that, but I think we need to be careful on one thing. I don't think we can judge their ability to represent goals by the things we're forcing them to do. In other words, by the conversations they have, by the behaviors that we're asking these agents to go out and solve problems. I think all of that may be relevant, but it isn't the whole story. The reason I say that is simply because in the very simple algorithms that we work with, we have this one published paper on the sorting algorithms, we have two more coming out that are even more minimal than that. What's happening there is that there are at least two different kinds of goals that appear to be met. There's the goal that we force them to do. This is whatever the algorithm is forcing them to do. And then there's this other stuff. I originally called it a side quest, and now I'm thinking it's an intrinsic motivation. There are things that they are doing that we never ask them to do. They're not forbidden by the algorithm, but we don't make them do them. You don't see those until you look. We're not very good at looking. How long have people been studying sorting algorithms? 60 years, 80 years? I don't know. No one had seen this before. Who knows why. I'm sure we're missing a million other things we don't know how to look for. My guess is that in these language models, there are at least two things to be looking at. There's the stuff you expect them to do, the goals you expect them to do. If we can be surprised by 6 lines of code and a bubble sort, I am not willing to be certain that this thing isn't doing something else in some other problem space that we don't have the intelligence or the imagination to look for. It might be minimal. I don't think it has to be massive just because it has this huge vocabulary. It may be a very small thing, but I'm not sure at all. I think this is all an empirical question; we're actually developing tools for this. I have a number of collaborators and in our lab we're making some tools to start taking a look at diverse systems in other spaces to see what they might be doing in the spaces between what the algorithm is forcing them to do. I'm with you in that. I don't think there's any reason yet to impute high goal-directedness to the algorithm of the language models. But I don't know that they're not doing a bunch of other stuff that we don't see, the way that it's been missed in bubble sort and in cellular automata and some other things that we're looking at.

[58:41] Nora Belrose: We'll probably have to wait until another time. But there's a lot of questions there about how exactly, how methodologically you go about determining whether there's something else going on. There's a lot of work in the AI interpretability space which tries to find circuits in neural networks, which I'm honestly pretty skeptical of. There's a lot of questions about how to actually go about doing it.

[59:24] Michael Levin: I agree. This is a massive unsolved problem. We're just beginning to make some tools for this. I think it is at such an early stage because when you talk to people, the standard paradigm is All right, biology is complicated. If you tell me that my mind is not captured by the laws of biochemistry, I could go along with that. Many people say that I'm more than the rules of biochemistry. But no worries, at least over here, we have these dumb machines. They do exactly what the formal models say they do. The formal models of chemistry do not tell my story, but algorithms and Turing machines and all that nice stuff tells the full story of these things. I think our formal models never tell the full story, and we've optimistically assumed that at least we've gotten this down to a pure form where there are no surprises. But I don't think that's true. I think this stuff seeps into everything, not just the biologicals. I think it seeps into almost everything. That's why we're so behind making those tools, because we've been assuming that the algorithm and our view of these things tell the whole story.

[1:00:41] Nora Belrose: I definitely don't want to just dogmatically assert that. You have your sorting algorithm paper on it. I don't want to just assert that, because it's in a computer, therefore it's only doing the algorithm that we told it to do. That said, I do think there is an important metaphysical difference between physical things and simulated things. I think it's not mysterious why there's a metaphysical difference. Simulations are running on digital computers, and the computer is designed to make everything predictable and controllable.

[1:01:43] Michael Levin: It tries. It does its damnedest to try. That's the design. We have abstraction layers and error correction and all this other stuff. We really try to make it that way. But I'm just not sure we ever succeed at that.

[1:02:00] Nora Belrose: It is reproducible, right? It is reproducible.

[1:02:03] Michael Levin: Sure, yeah.

[1:02:06] Nora Belrose: I think that reproducibility and copyability do matter for sentience, because that is related to this whole idea of, can you exactly copy the AI? Is it a platonic form?

[1:02:28] Michael Levin: I see it, and I think that is definitely an interesting point. I need to think about this reproducibility business more. I think that is an interesting question to be asked. The reason that I used to think that was a key difference too. That simulations and the real thing. The real thing is also really hard, given what neuroscience is telling us about how we construct our reality. But the thing that's partially erasing some of that distinction for me is: if you ask the forms, do they care about whether they ingress into a simulated environment versus a physical environment? It looks to me like they don't necessarily distinguish. That's what I'm working on right now. I was trying to figure out to what extent can we say there's some stuff that comes into the physical world and it's different than the stuff that comes into simulated? And if it turns out that way, fine. If that's what we discovered, that's okay. But right now, I'm not seeing it yet.

[1:03:49] Nora Belrose: I think I agree. Let me know if you need to go.

[1:03:55] Michael Levin: I've got about two minutes, then I get around.

[1:03:56] Nora Belrose: I think I agree that forms can ingress both in a simulation and in the real world. I think Whitehead would agree with that. I would say that sentience, though, is not a matter of what forms are ingressing per se. I would say it's precisely the non-platonic part of what's going on that is the sentient part. I think that also fits with Whitehead's metaphysics.

[1:04:30] Michael Levin: It's a different perspective on this and I certainly can't rule it out. We don't have any strong claims about consciousness or sentience or any of that stuff because there's nothing I can do experimentally on that right now. It's not like I could rule out the view that you just formulated. My suspicion is the opposite. It's definitely interesting. I will think more about this copying business. I think Aurorbia et al made some good headway there. I need to think about that more.

[1:05:11] Nora Belrose: That's really what made me start questioning things, because I was asking, "How do you count copies? Or what's going on with these guys?"

[1:05:18] Michael Levin: It's not obvious to me that we can't tell a homogenous story of that, but I haven't done so. We'll see how that goes.

[1:05:29] Nora Belrose: Cool. That's been nice.

[1:05:30] Michael Levin: Likewise. Thank you so much. This was a lot of fun. Very cool. Happy to chat anytime. If anything new comes up, definitely let me know, and I'll send along. We have a couple more of the computational papers coming out, so I'll send you those when they come out. I'll be interested to see what you make of them.


Related episodes