Skip to content

Conversation between Donald Hoffman and Richard Watson, #1

Cognitive scientist Donald Hoffman and evolutionary biologist/computer scientist Richard Watson discuss cognition-first evolution, resonant agency, Markov kernels, trace logic, and how these ideas relate to physics, time, and consciousness.

Watch Episode Here


Listen to Episode Here


Show Notes

This is a ~1.5 hour discussion between cognitive scientist Don Hoffman (https://sites.socsci.uci.edu/~ddhoff/HoffmanPubs.html) and evolutionary biologist/computer scientist Richard Watson (https://www.richardawatson.com/), on evolution, physics, time, and the nature of reality.

CHAPTERS:

(00:00) Cognition-First Evolution Idea

(07:15) Resonant Agency In Biology

(23:37) Markov Kernels And Traces

(29:44) Trace Logic And Observers

(42:28) Songs, Harmonics, And Logic

(51:00) From Kernels To Physics

(59:30) Headset, Learning, And Evolution

(01:12:38) Mass, Sampling, And Consciousness

PRODUCED BY:

https://aipodcast.ing

SOCIAL LINKS:

Podcast Website: https://thoughtforms-life.aipodcast.ing

YouTube: https://www.youtube.com/channel/UC3pVafx6EZqXVI2V_Efu2uw

Apple Podcasts: https://podcasts.apple.com/us/podcast/thoughtforms-life/id1805908099

Spotify: https://open.spotify.com/show/7JCmtoeH53neYyZeOZ6ym5

Twitter: https://x.com/drmichaellevin

Blog: https://thoughtforms.life

The Levin Lab: https://drmichaellevin.org


Transcript

This transcript is automatically generated; we strive for accuracy, but errors in wording or speaker identification may occur. Please verify key details when needed.

[00:00] Richard Watson: I'm pitching for a cognition first theory of evolution.

[00:08] Donald Hoffman: Okay.

[00:09] Richard Watson: The usual orthodox understanding of the relationship between cognition and evolution is that cognition is a product of evolution, an evolution where natural selection comes first, and it sometimes, albeit rarely, produces things which are cognitive. I'm considering the reverse, that cognition comes first and that in special cases, sometimes, perhaps rarely, produces natural selection processes. That has some resonance with a consciousness-first theory of things. But I don't want to end up with a model of things which has a strong physical-nonphysical distinction between things. I would rather have a theory where there's a graded scale of things which are more physical and things which are less physical. There are multiple different levels of causes, each of which is slightly real to the levels either side of it.

[01:46] Donald Hoffman: Hanging in there. I'm interested to see how this goes.

[01:49] Richard Watson: I'm trying to keep things tied to biology and answer questions that need answering in biology. In biological systems, the orthodox view is that genes control everything at the bottom and everything that organisms do above that — the regulatory activity, the cellular activity, the tissues and organs and organisms — are all just products of the genes, and none of the processes which are going on at those higher levels of organization matter except insofar as they are consequences of genes which affect genic selection. Everything at the bottom is the only level that matters. Whereas in reality, we know that there are self-sustaining causal processes going on at all of those levels of organization. The cells are at the level of organization of cells, real things that interact with other cells with certain kinds of signaling that create self-sustaining conditions for cycles that recreate the conditions for their own origination at that level of organization. It's great to be able to say something like that and have somebody nod. That's true in physical systems too, from quarks to cosmos, but the thing about biological systems, organisms, is the connections between those levels and the relationships between them. The levels below, smaller scales, create the entities that get involved in the relationships at the higher level. The levels above create the context or boundary conditions in which those entities move or interact with each other. So the level above and the level below is needed to define whatever is going on at the focal level that we're talking about. But what's the relationship between those levels? We're not satisfied with a story where the bottom level determines everything, because it doesn't determine the boundary conditions under which those entities move.

[04:30] Richard Watson: And we're a bit uncomfortable with starting at the other end, as though it seems to imply that there was a plan into which everything should fit. A plan that created all of the parts that was necessary to make the high level thing that was intended in some sense. I want to suggest that the way in which those levels interact with each other has a specific general form. That's to do with compression and expansion between levels. And that when levels are linked in that way, the interaction between levels is what agency is. The interaction between levels is what cognition is. And that kind of multi-level structure arises spontaneously and naturally without presupposing any of this particular biological machinery, that it's a natural property of physical systems, and that it's cognitive in nature, so that it's capable of holding memories, doing learning and using learned knowledge to act in the world in a way which constitutes intelligent problem solving. I drew a little picture today, riffing off a picture from your work, Don. Let's find that document. These are hot off the press because I drew them with a pen and paper and took a picture of them. I haven't turned them into an electronic form yet. Can we see that?

[07:13] Donald Hoffman: Yes.

[07:15] Richard Watson: In figure A, we have an agent separated from the world. The detail of the world is sensed by the agent into percepts. Some sort of decision process, cognitive process happens, which turns that into a decision. And then the decision organizes the actions back into the world. That's inspired by your way of decomposing things, Don. Now I'm going to turn that into a slightly more computational way of thinking about those relationships. Out there in the world, we have some system of entities with interactions between one another, which is all sorts of complicated. That detail is compressed into a lower dimensional model of what's going on in the world. You run that model a little bit; things happen. And the results of running that model are then expanded back into the world. So you have an information integration, run the model, and then collective action transferred back to the world. From a computational cognitive science, cognitive intelligent agent way of thinking about things, we would say that a system that did that was doing something essential. What you mean by being able to do something essential is that you have an abstracted view of the world, that you can run forward in time, that you can run forward in time faster than the real world, and then take actions in advance of the things which were going to happen in the real world, so you anticipate what happens in the world rather than reacting to it. So now I want all of that to happen spontaneously for free. That's not something that requires you to build a neural network or a deep order correlator or something like that, or to evolve machinery that does it, but that's just something that happens for free in the natural world. And the way I'm going to get there is through harmonic resonance. In figure C, we have one causal process at the top of the figure, a cycle, just a cyclic attractor. Something's happening in the world that's a cyclic attractor. And the thing that's happening in the world looks like there's different things. There are black particles and white particles and they're not in the same space at the same time. There's stuff happening in the world. But it turns out there are symmetries in the world that, in the sense, the black particle and the white particle are different from each other. They're different particles in different points in space and time. But really, they're not so different from each other because they're also both particles, both in the same space of entities. They're, in a sense, a reflection of one another. They might be not the same, but they're also not different in every respect. And that symmetry means that the cycle, the causal loop that they are in, can collapse, twist, collapse, twist, and fold in such a way that it creates a loop of half the size where everything is going around twice as fast, a doubling of the frequency. That's the model where the symmetry between the black particle and the white particle has been collapsed into just particles. That's just a general model of particles. It doesn't care about whether it's a black particle or a white particle.

[11:21] Richard Watson: I can run that general model through time to see what will happen in the next time step faster because it's a higher frequency model. I go around that once. I've actually gone around in the real world twice. As that loop then unfolds to unpack the symmetry that was in it, to split that symmetry back out into an unfolding and connect it back to the world, I've made something which is influencing the real world in a way which you can think of as anticipating what would happen. You can also think of it as these are just two symmetries in the world which coexist at the same time and influence one another in both directions simultaneously. Now I'm going to do that through multiple levels in figure D. You start with many different things in the world. You fold them up once. That creates a sort of lower-dimensional model. You fold them up again. That creates a lower-dimensional model. You fold them up again, all the way down. All the way down to what? At the bottom of everything, everything is the same thing. There's only one electron in the universe and time isn't even really a thing. Nothing really changes. Everything's the same. There's nothing actually happening. As you come back up the other side, unfolding, all of the symmetries are broken and all of the complexity and diversity of physics, the universe and everything, is recreated back again. All of those levels are there simultaneously and connected. They're connected in a specific way, which is to do with folding or removing one dimension from the space at a time, folding it in a line of symmetry that retains as much of the local integrity you had at one level as you fold it into the next level. In musical terms, each fold is a two-to-one octave, a two-to-one relationship. I don't think that this is just a model of what happens in physics, supersymmetry stuff that happens in quarks. I think that stuff that happens at that level of physics is the same as the stuff that we were talking about in figure A. That arises spontaneously, even when you do something simple like bow a violin string. The violin string is imbued with agency when you do that.

[15:27] Richard Watson: And its ability to act essentially on the world is not very much, but it's not nothing. I can tell you the story about how it does that if you're ready. So let's say that the violin string is not vibrating to start with and we start dragging the bow along the string. What's happening? There are microscopic disorganized interactions between the bow and the string. They're not in phase with one another. They're not at the natural frequency of the string. They're far away from the natural frequency of the string, but they're putting a little bit of energy into it, almost just heat at that stage. Now they are going to start resonating in the string. The resonant frequencies that build up in the string, notice that they involve top-down and bottom-up causation at the same time, because the frequencies that build up in the string depend on the macro-scale geometry of the string, its length. And they also depend on the micro-scale properties of the material that it's made of, the elastic bonds between the molecules at that particular tension. And those bottom up and top down determine which frequencies build up in the string and which frequencies don't. And the energy which is going into the string is converted into organized oscillations, which stack up and form simultaneously at all of those levels. What's interesting, though, is that in so doing, the string pushes back on the bow and it changes the nature of the interface between the bow and the string. How does it do that? Well, what you needed in order to play the fundamental frequency, this massive long wavelength frequency that you get when the stroke of the bow is in full flow, as it were, when the string is singing loudly, what you need is to convert the linear motion of the bow into a reciprocal motion of the string. How do you do that? The bow is just pushing it, but it needs to push the string when the string is going that way, and it needs to not push the string when the string is coming back. So the string is organizing the stick and slip dynamics with the bow. It's allowing the bow to push it when it's stuck, and once the tension builds up, it slips. It joins to a different part of the bow, a little tick backwards, which then drags it forward again in the right direction. So that you have, in a sense, a percussive motion of the bow. The linear drawing of the bow is turned into a percussive motion of the bow, almost as though the string doesn't see the whole bow. It only sees intervals on the bow that are the right distances apart for it to drive the fundamental. So the string is organizing the way the interface that was providing the energy to give it energy at the right intervals that sustain the fundamental. Once the fundamental is going, it's obvious that it does that, but it had to create the fundamental in order to push back in that specific organized way. The disorganized energy is converted into an organized energy that controls in a quantized way the interaction between the string and the bow in such a way that it sustains the fundamental, which wasn't even there to start with. And I think, but I haven't heard anybody say it, that the agency of the string is of the, or rather of the stack of harmonics in the string is non-zero because it has the ability to compensate for small fluctuations in the stick and slip dynamics that are needed to drive the fundamental. So if a slip, if one of the slips was a little bit too long or a little bit too short, the next slip would be just right to compensate for that because the string will take energy from the higher frequency dynamics and push them, convert them between those levels to do the right amount of compensation for the next stick and slip.

[19:35] Donald Hoffman: Sounds good so far.

[19:39] Richard Watson: What I'm suggesting then is that living things are resonators like the string is, but they have better sustain. They have better sustain because they're more agential; that's what it means to be more agential: to have better sustain. It means that you have the ability to convert energy between different levels of organization. That converting energy between different levels of organization is the agency diagram that we started with in A. It's the ability to abstract the world, to take something happening at one frequency and convert it into another frequency, run that model of the world forward in time, and then act on the world based on the results of running that model. The reason that organisms have better sustain than strings is because you can set up multiple different harmonics in an organism and not just the octave relationships. When you skip levels with other intervals, like the third or the fourth, you're making bindings between different levels, which converts energies from one level, skipping a few levels, to another level more easily and quickly, which makes them more powerful, more agential. It also gives them more internal geometry to hold onto and more internal energy to deploy in maintaining that geometry.

[21:36] Donald Hoffman: Yes.

[21:37] Richard Watson: One living organism, which we notionally think of as being created from a genotype, is really an unfolding of compressed information into multiple different instantiations in a distribution of population. Selection we can think of as the folding process, which collapses the information from that distribution back to genetic modifications. But that expansion and compression in conventional biology we think has feed forward up to phenotype, and then feedback just in one step. The organism lives or dies and that's it. But it's not like that. Feed forward and feedback are happening in the construction of the intracellular organelles from the genotype, but they are already pushing back on the genotype to say which genes should be expressed. And in the construction of the cell from those organic molecules, the cell is already pushing back on the organic molecules, which are pushing back on the genes. From the cells into the tissues, which are differentiating, the organ is already controlling how the cells differentiate, which is controlling how the intracellular components, metabolites, and proteins are operating, which is controlling how the genes are being expressed, which is even modifying what the gene sequences are. All of those different levels are agents in the process of development, just as much as the organism as a whole is. The feedbacks are happening at multiple time scales and multiple organizations, not just in that conventional loop.

[23:37] Donald Hoffman: That's wonderful. I agree. I can make a few comments. At the very start, you talked about we needed to have something that was going to be effectively computationally universal, so that it can do all this stuff. If we want to turn these ideas into a mathematically precise theory, we need some kind of precise but minimal mathematical structure to try to capture your ideas. The simplest and yet completely general, except for one little exception we can talk about, mathematical model for these kinds of probabilistic interactions that we're talking about would be Markov kernels. They are the minimal mathematical way to talk about these probabilistic interactions. It's trivial to show that Markov kernels are computationally universal. So networks of Markov kernels can do anything that neural networks can do. The reason I've been nodding is that we're writing a paper on this using Markov kernels. This is the paper I've been writing the last few weeks, and I'll probably be at it for another month or two. We're going to call it "Traces of Consciousness." There's a key idea that really is a mathematical concomitant of what you're saying here. So what we've discovered is a new property of Markov kernels. They are partially ordered. There is a heterarchy of these kernels, just like you have a heterarchy of these interacting agents. The heterarchy has a very clean definition. So suppose I have a kernel that's a 10 by 10 matrix. To be a Markov kernel, each row consists of numbers between zero and one, and the sum of the numbers in that row is 1. So it's a Markov kernel. Very simple. Square matrix, each row sums to one. They're all positive numbers between zero and one. If I can run one of these kernels, I can see if it's on 10 states, if it's in state one, what's the probability it goes to state two through 10 or if it stays at state one. I can look at its dynamics. Ultimately, if it's an ergodic kernel, I can find a stationary measure. I can see the long-term probability of being in state one, two, three through 10.

[26:40] Donald Hoffman: But I can also do something else. So I'm running the bigger kernel, but I'm only attending to states one, two, and three. If I only do that, what is the Markov kernel that I would see involving one, two, and three, which would be induced by the bigger one? In technical terms, it's called a trace chain. There's a formula for it. The mathematics involves looking essentially at infinitely long sequences of the big chain and seeing how often they leave this three-state set and then return. But it turns out you can give a closed-form solution that captures that infinite sequence, so you don't have to do anything infinite. So here's the partial order. It's defined by: a kernel M is less than or equal to another kernel N if and only if M is a trace chain of N. That's it. That simple. It gives you a non-Boolean logic on every possible dynamics. In other words, all possible dynamics are now given an order relationship and a logic. It has no global greatest element, and many elements are incomparable. But if you take one kernel and look at all of the kernels that are less than that one kernel — just that subset of dynamics — they form a Boolean logic. So you get a Boolean logic. So what this allows is when you take a kernel like this 10-by-10 and look at the 3-by-3 that comes from a trace, the transition probabilities on those three states are utterly different from what they were on the 10-by-10. If you were thinking about the probabilities as free will, the free-will decisions of the 3-by-3 have utterly different probabilities than the free-will decisions on the same states of the 10-by-10.

[29:44] Richard Watson: They can be, not they have to be, right.

[29:47] Donald Hoffman: In fact, they have to be different in general.

[29:49] Richard Watson: All right.

[29:50] Donald Hoffman: In general, they will be different. With probability one, I think they'll be different. I haven't proven that, but intuitively, it seems like with probability one, they will be different. What you get is this really, from one trivial definition, N less than M if N is a trace of M. That's it. This whole beautiful logic falls out. Now, of course, these kernels have, if you look at their eigenfunctions, you are going to find their vibration rates. In fact, in the paper that we're publishing, I'm looking at what you can do is take these kernels, a regular Markov kernel, and add one little feature to it, which is standard in Markov chain theory. You add a little counter that increments every time you take a step of the chain. You start off at zero and it goes one, two, three, four. It's called a space-time chain. You have the normal chain, but you then have this time parameter which you add to it. And you can now take the eigenfunctions of that space-time kernel. When you do it, what you get is a function which is identical in form to the quantum mechanical wave function for a free particle. It's a momentum eigenstate for free particles. We're going to show this, and we'll give examples. If you want vibrations, you've got vibrations here. You've got all possible vibrations. You have an organization, a logic that ties all of them together in one beautiful symphony. And there's no single greatest at the top. This is a heterarchy. It's really quite interesting. If I was to go spiritual on it and say it, instead of saying there is the one, say one consciousness, there is, I would have to say the whole, because there isn't just the one. There is no greatest element. There is something far richer than that. I was nodding as I was going through because as you were going through all this I was ticking off on this mathematics. I said there was one proviso. Here's the proviso. Markov kernels are universal for any probabilistic process that has a finite memory. Now that finite memory can be as big as you want. In other words, when we define a Turing machine, we say there's a tape and the tape is as long as you need it to be. There's no restriction here. It's the same thing with Markov chains. It's finite, but it's as big as you want it to be, just like the Turing tape can be as long as you want it to be. In practice, it's universal. We're working on this; it's called the trace order and the trace logic. And then there's a beautiful thing that comes out of it. What we propose is there's been a big open problem in science, and that is: what is an observer? In Newtonian physics, the observer was aloof, didn't affect what it was observing, and you didn't need a model of it because you could ignore it. We now know that that's too simplistic.

[34:02] Donald Hoffman: In Einstein, there is the observer, you have to use the notion of an observer, but it's really just a reference frame. It's a system of clocks and coordinates, that's all. There's no real deep theory of an observer. In quantum theory, the observer comes front and center, because the evolution of state in quantum theory is linear when the system is not observed. It's the Schrödinger equation. It's linear. When you observe, the change of state is non-linear. You go from a superposition, a complex superposition of eigenstates to a single eigenstate. That's non-linear. That means you cannot use quantum theory to give you a model of the observer. Can't do it. The linear quantum apparatus cannot model it. Decoherence doesn't solve it at all, because all decoherence does is it maps from a complex superposition of eigenstates to a classical mixture of eigenstates, but it does not take you to a single eigenstate. So decoherence is a red herring. It doesn't do the job. There is no job that can be done using quantum theory. That's why they've been pulling their hair out about the observer. Wolfgang Pauli said in 1954 this is a big problem. We need a theory of the observer. Two years ago, Frank Wilczek said basically the same thing in an interview. Quantum theorists are still asking, what is an observer? I went to a physics conference a few years ago in Banff where it was all about the role of the observer in quantum physics. Saying "we won't talk about observers, we'll talk about measuring apparatuses," which is what Heisenberg and Asher Peres and a lot of people do, solves nothing because the measurement apparatus must embody a non-linear process. You have the same problem. Whether you call it an observer or not, you can't do it with quantum mechanics. A measurement apparatus has no reductive explanation in quantum theory. That's one of the big open problems in science. Here's what we propose. We've talked about these agents being represented by Markov kernels and so forth. One agent observes another if it's a trace. The trace operation: if one kernel is a trace of the other, it is observing the bigger kernel. Notice what this does. It's truly remarkable. We wanted to have observers that were independent, aloof. This is different. If you observe, you are an organic aspect of the very thing that you're observing. You are intimately and organically, not just looking at it, you are part of it. That's what this is saying. So this gives you a theory of observation. Now, what are the outcomes of observations?

[38:14] Donald Hoffman: I've said what an observer is and what an observation is, but now what are possible outcomes for this and how do I model that? One interesting thing about Markov kernels: if they're ergodic and aperiodic, and you have a single communicating class. We can talk about the others, but let's just talk about ergodic ones because they are the case where I'm on clean mathematical grounds and can say precisely what the answer is. For ergodic ones you can talk about the stationary measure of the Markov chain. For example, for the 10 by 10, the stationary measure is a probability measure on the 10 states. It describes the long-term probability of being in state one, state two, all the way up through state N. It solves the equation: if the kernel is P and the stationary measure is mu, then mu P = mu. That's the equation you have to solve. It gives you the long-term probability of what you're going to see, the outcome of your observation. We have this non-Boolean logic on observers, the trace logic. We propose there is a map from observers to the stationary probability measures. We need to ask whether there is a logic on these probability measures. We would like a logic homomorphism between the observers and the probabilistic beliefs that observers might have as a result of observation. There is a logic. A couple of collaborators and I published it 30 years ago. It's called the "Lebesgue logic of probability measures." One probability measure mu is less than or equal to a probability measure nu if mu is a normalized restriction of nu. So mu is on a subset of states of nu, and when you restrict nu to that subset of states, you get mu. That gives you an incredibly beautiful logic. It's non-Boolean. There is no greatest element. It's locally Boolean: if you take a particular probability measure and look at all the probability measures that are less than it, they form a Boolean logic. It's more general than the orthomodular complemented lattices of quantum logic theory. We show in this paper that the map from the trace logic of the dynamical systems into the "Lebesgue logic of probability measures," which is the probabilistic beliefs that come out of it, is a homomorphism. The whole thing ties up unbelievably beautifully. There is a homomorphism between observation and probabilistic belief. Harmonic behavior is the core of it. When you look at these Markov kernels, you look at eigenfunctions of the space-time chains or of the regular kernels themselves. It's all about harmonics. That's why I was nodding the whole time. I could say, here's the piece of mathematics that models that. That's what models that. I'm completely on board with what you're saying. There's a wonderful dialogue to go for. Now, where to go, where we're trying to go with this?

[42:28] Richard Watson: Can I reflect some additional symmetries, which I noticed? The word that I'm using for a stack of frequencies in harmonic relations is a song.

[42:40] Donald Hoffman: Yes.

[42:41] Richard Watson: The finite memory that you're talking about means that the song is always eventually cyclic, that it's going to come around again?

[42:52] Donald Hoffman: Not necessarily. Chaotic stuff can happen out of finite systems.

[43:05] Richard Watson: Okay, we'll come back to that.

[43:07] Donald Hoffman: Yeah.

[43:08] Richard Watson: The trace in my mind relates to what it means to resonate one song with another. So when one song meets another song and they resonate, that means that the two songs have to share some frequencies in common, otherwise they can't resonate. Or have a harmonic relationship between those frequencies.

[43:32] Donald Hoffman: Absolutely.

[43:35] Richard Watson: And that means that in order to observe, in order for one song to be sensitive to another song, they both have to be drawn from the same super song. They have to be harmonically related to one another, otherwise they can't be observed. And the ability to observe the detail or agential nature of one song requires an agent, an observer, which is just as agential, which has the same harmonic relationships between them.

[44:06] Donald Hoffman: Yes, that's right.

[44:07] Richard Watson: And it's not just a question of saying, can you measure all the frequencies which are in this song, like a Fourier decomposition does, because you say it's got a lot of this frequency and it's got a lot of that frequency there, I measured it. No, you didn't really see it yet, because that frequency and that frequency are not really different from that frequency and that frequency that's moved a little bit, but that's a completely different thing. Why is it a completely different thing? Because these were in the octave relationship and those weren't. You're not just seeing the frequencies that are in it, you're seeing the relationships between the frequencies. See that you need to be the octave, not just have the two frequencies in it.

[44:48] Donald Hoffman: The way that we capture that kind of intuition in this trace logic is when we have a logic, there's what's called the meet and the join of two entities. The "and" and the "or." So the meet is the "and", the join is the "or", but they call it meet and join. Now, if you have two Markov kernels and they both have 10 states, but seven states are different, then they only share three states between them. Most of the time, if you give me two random kernels, they are incomparable because they do not agree on the states where they overlap.

[45:41] Richard Watson: Which means they can't observe each other.

[45:43] Donald Hoffman: They can't observe each other. They can't form a join. They can't form a meet. They are incomparable.

[45:48] Richard Watson: Apples and oranges.

[45:50] Donald Hoffman: The apples and oranges, and most of them are. The probability of that is very high.

[45:54] Richard Watson: Yeah.

[45:55] Donald Hoffman: But if you have the two 10 by 10s, they overlap in three, but they do have a meet. So they do agree and there is a trace that they both share on those; they both share the same trace chain on those three states, then you can take their union. You can take their join. And what that does is there are new transitions because there were no transitions before. The seven extra states of the first guy and the extra states of the second guy never had any direct communication before. But when you make the join, the new kernel has all the right connections between them to make the whole thing harmonious, to make the right song.

[46:40] Richard Watson: A chemistry that makes a new thing.

[46:42] Donald Hoffman: It's unique in general. There's a unique right song that melds the two original songs if they overlap. That's why I was nodding all the time. Your intuitions are really being captured.

[46:54] Richard Watson: So crudely, this meet and join, it's the inner and outer product. And if you do it properly, it's the Clifford algebra with the wedge product.

[47:06] Donald Hoffman: That would be; I would love to prove that. I've been thinking about that. I wrote down an order relationship on geometric algebra entities as well. I think that it may be homomorphic to these logics as well. There may be; I've been looking in that direction. A few months ago I gave a paper to Chetan, my mathematical collaborator, with a partial order on Clifford algebras, which I should go back and look at because it may be homomorphic to this, which would be really funny.

[47:51] Richard Watson: My intuition says that it is because through the songs and the frequencies you can get to the Lissajous figures, and from the Lissajous figures you can get to the shapes and geometries. The introduction of two songs is the geometric algebra that converts one shape into another.

[48:12] Donald Hoffman: Yes, I'm completely on board. I should pull out that piece of paper again that I sent to Chetan and go over that logic. I might even want to include that in the paper we're doing right now, but just point that out.

[48:27] Richard Watson: The computational part in my mind is that when you strobe a song at a particular frequency and it looks discrete. If you've got the right frequency, it looks like a nice orbit or it looks like another Lissajous figure. It's stationary if you strobe it at the right combination of frequencies. That structure has a correspondence with the lambda calculus too. You can describe what a song is. The discrete stationary structure of a song when viewed at a particular frequency is a program, and the interaction of two songs can be described as the application of one song to another, the application of one lambda calculus expression to another to create an output.

[49:22] Donald Hoffman: I would be very interested to see that. That would be lovely and new to me. Anything that you could send me on that, I would be most interested.

[49:30] Richard Watson: Yeah, just intuitive at the moment.

[49:33] Donald Hoffman: But also intuitively, if you look at a Markov kernel and you look at one that is periodic, every row is all zeros except there's a single element that has a one. Then you cycle through the N states in some order, one through N. That has a specific clean frequency. Suppose you take another Markov chain that's also 01, but it's also on 10 states, say, and it's a different order. You add the two and weight them. Maybe 0.3 of one and 0.7 of the other. Now you have a Markov kernel which is ergodic, but it actually has two basic frequencies. If you keep doing this, you realize that all these ergodic kernels are really just sums of these frequencies of basic kernels, and you're just weighting them. Each one of these complicated kernels is a complex harmonic score of all possible frequencies that are going on. You could have sub-frequencies. They don't all have to be a frequency of 10. It's quite rich that way.

[51:00] Richard Watson: Awesome. So how do you speak to the interest in whether the Markov kernels and their ability to interact or relate to one another, interacting is really just seeing what the relationship is between them? How do you speak to the origination of those kernels and how they come to have information that's intelligent, capable of producing intelligent action?

[51:43] Donald Hoffman: The first thing to note is that the Markovian kernels are computationally universal.

[51:50] Richard Watson: They could be anything. How do they get to be something specific?

[51:55] Donald Hoffman: My guess is that anything that's possible is actual. Why not? I think this entire trace order and all the possible kernels are — I think reality is no less complicated than that. My own attitude about scientific theories is everything we've thought of so far is trivial compared to reality, including the current theory that I put out there. But I would say that reality, whatever it is, is at least as complicated as the entire trace logic and all the possible kernels on it, and we're seeing just a little piece of it.

[52:44] Richard Watson: In our little corner of the universe where we have some shared history, and we can talk about the same space-time and the same entities and the particles in it, we're interested in agents that know stuff about the world and we're interested in the processes by which they came to know it. And what it means to be able to act in the world intelligently, right?

[53:16] Donald Hoffman: That is a very high priority in what I'm doing right now to try to answer that question. Here's the direction I'm going. What I want to do is try to solve that kind of problem by showing that if I start with only this logic, the trace logic of Markov kernels, I can build space-time and quantum physics and general relativity out of it as a headset that certain of these conscious agents use to interact with others. To do that is a non-trivial thing. I want to use the architecture of these Markovian kernels as a computational architecture, a neural network, to actually build space-time as a user interface as a way to answer your question. I am going to have to get a mapping from the Markovian dynamics that I've been talking about into a model of space-time with quantum field theory and the whole bit. Fortunately, we have some help in this from high-energy theoretical physicists in the last decade. Here's what they've done. They realized a few decades ago that space-time cannot be fundamental. It falls apart at the Planck scale. David Gross wrote a paper in 2005, the centennial of Einstein's discovery of special relativity. It was in honor of Einstein and said, "Thank you, Einstein, for space-time." David Gross then went on to say, "Space-time is doomed. Thank you, Einstein, for giving us space-time, but space-time is doomed. It cannot be fundamental." In the intervening almost 20 years, they've gone at it. In the last 10 years, they've found new structures entirely outside of space-time. These are not structures curled up inside space-time as in string theory. These are structures utterly beyond space-time, utterly beyond quantum theory. There are no Hilbert spaces here. The new field is called the Field of Positive Geometries. The European Research Council just a few weeks ago launched a 10 million euro multinational collaboration and had their first conference a couple of weeks ago. There were over 100 mathematicians and mathematical physicists there, and it's called Universe Plus. If you go online and look up ERC Universe Plus, you can read their very ambitious statement. We're saying we're going outside of space-time, beyond quantum theory. We're using positive geometries for a new foundation for physics. Physicists have realized space-time is not fundamental, and in the last decade they've stepped outside of space-time. The positive geometries are things like amplituhedra, sociohedra, cosmological polytopes.

[56:34] Richard Watson: Those are just words I've heard, but yeah.

[56:36] Donald Hoffman: They're positive geometries. They're like polytopes. In some cases, they are polytopes, but the amplituhedron is not a polytope, but it's polytope-like. They've found these combinatorial objects that they can use to classify these positive geometries, and in particular, decorated permutations. So these are permutations with a little twist. If you're interested, I can tell you what the twist is.

[57:12] Richard Watson: I'm not going to say that's not a decorated permutation.

[57:17] Donald Hoffman: It's not decorated enough for me. Who ordered that and why? Here's what we're up to. We have these Markovian dynamics, we have this trace logic and the Lebesgue logic. We've already made a connection with the decorated permutations. We actually said: if decorated permutations classify these positive geometries, can they classify our Markovian dynamics? I can send you a paper. We published that. They do. When we did the classification, the things you want to look at are the recurrent classes. That one step already told us where to look in making this map from the Markovian dynamics to particle representation in space-time. These recurrent communicating classes correspond to particles.

[58:12] Richard Watson: So if you're going to build intelligent agents, you're going to have to build space-time particles and all the rest of it first, and then build intelligent agents out of that.

[58:24] Donald Hoffman: We want to do it in what looks like a physicalist manner. I think of these conscious agents as already intelligent agents. So it's building a space-time as a way of giving a physicalist instantiation of this intelligence. The reason we have to do that is that's where we can make experimental tests. It's only inside space-time that we can test our theories. So I have to project this theory of conscious agents into space-time. Otherwise, it's just airy-fairy mathematics, untestable. So our goal is to precisely predict the momentum distributions of quarks and gluons inside protons using this, to actually show how space-time arises so precisely in what a quark and a gluon are from our theory that we predict exactly to 10 decimal places the momentum distributions of quarks and gluons. Then the theory is probably still not right, but at least it should be taken seriously. So that's what we're up to.

[59:30] Richard Watson: I see. That does sound hard.

[59:34] Donald Hoffman: You got to do it. Nothing less is science. You have to go big or go home. There's no reason for anybody to take this theory seriously if we can't make a prediction that you can test at the Large Hadron Collider, for example. And the reason we're going there is because those are the simplest predictions that we can make. Single quarks and gluons are small numbers. If I look at the brain, now I'm talking about quadrillions of quarks and gluons. Why should I start with quadrillions? Let me start with one or two and then work my way up to quadrillions.

[1:00:07] Richard Watson: Maybe it's just as easy to start from the other end.

[1:00:12] Donald Hoffman: I hope some people try to use the idea. I have a short life. I've got to pick; you have to pick what you think is your best bet and go for it. My best bet, 3 or 4 gluons as opposed to a quadrion, is where I'm going to get the testable predictions.

[1:00:33] Richard Watson: Rather than building, it seems like you went from "there's only consciousness." There are special kinds of geometries and mathematical relationships out of which I can build space-time and particles. Then in space-time and particles, I could build complex assemblies of particles which eventually would look something like consciousness. It would look something like an intelligent agent. It's not the consciousness that created it all, but it's something that looks like it, that resembles agency and intelligence, right?

[1:01:17] Donald Hoffman: It's the headset we build, and the headset gives us more or less insight into the consciousness behind. So on this point of view, I would argue that the distinction we make between living and non-living is not principled.

[1:01:37] Richard Watson: Sure.

[1:01:38] Donald Hoffman: Right now we're on a Zoom, and I'm seeing you only through a screen. Some of the pixels are pixels of your face, and there are other pixels of a wall and a picture behind you. The pixels on the wall give me no insight into consciousness whatsoever. The pixels of your face give me quite a bit of insight into what you're thinking and your expressions and what you understand or don't understand or agree or disagree. If I were to say that means that there are some conscious pixels and some unconscious pixels, that's a really dumb mistake. It's the same mistake that we make when we distinguish between conscious physical objects and unconscious physical objects. It's exactly the same mistake. It's not a principled distinction we're making. So we're always interacting with consciousness, but a headset dumbs things down. That's what it does. That's what it's for. Sometimes it reveals less about the consciousness and sometimes more because it's dumbing things down. We then make a category error and say, oh, a rock is not conscious and the human body is. No, that's just the wrong way of thinking about it. No pixel in the headset is conscious or unconscious, living or non-living.

[1:02:49] Richard Watson: I could communicate with you as a conscious agent through a communication channel that only allowed Morse code, that only allowed a single bit at a time. Or I could communicate with you as a conscious agent through a rich, multi-dimensional interface in many frequencies simultaneously.

[1:03:05] Donald Hoffman: Exactly.

[1:03:08] Richard Watson: I think about learning and intelligence in ordinary physical Newtonian systems in a very simplistic way. I have this model that I call natural induction, where you have a system of particles connected by springs. The interactions of the particles with one another or with an external environment create tensions in the springs. If those springs are slightly plastic, the springs deform in a way that changes the energy function of the particles. And you can show that it changes in exactly the same way that you would expect Hebb's rule to modify the connections of a neural network.

[1:03:51] Donald Hoffman: Oh, wow, okay.

[1:03:52] Richard Watson: So that the network becomes a model of its own history in such a way that it can then anticipate the original energy function and find better solutions to the problem of constraints that were originally put in the weights. So if all that's really happening is that you're allowing the forcing on the system to deform its internal arrangement, how did it get smart? It was pushed. It was just pushed. Well, that's not smart. I can make an imprint in clay and it's a record, but it's not smart. The thing that makes it smart is the folding of the space. That the compression of the symmetries in which it was pushed, whether that's over time or over space, is folded into an idealized, compressed representation of what happened. And then when that pushes back, it looks like it's doing something smart because it's doing coordinated action, which is informed by that past history. It's not really anticipating anything. I like to think of it as it's just reacting, not anticipating, but when everything is circular, when all activity is circular and periodic, being just the right amount of late is the same as anticipating. So smart entities, entities that are intelligent, are modified by their history and push back on the world as a reaction with just the right amount of late so that they look like they're anticipating. You can't really tell whether time is going forwards or backwards. But I think you can do that with ordinary particles and springs at the macro scale without any quantum funniness going on. And that's actually, the violin string is intelligent in the same way, on the same scale, but not the same amount, in the same way as other kinds of, shall we say, intelligences we find more relatable.

[1:06:19] Donald Hoffman: Exactly right.

[1:06:22] Richard Watson: So that I don't have to build it from the quarks up. I can start at any level of organization. And the thing that makes it smart is the relationships between a few different levels connected together.

[1:06:35] Donald Hoffman: It absolutely does. There are a couple of levels I would think about. One is that I can model this with just the Markovian kernels outside of space-time. I could say I've got this big 10 by 10 again, and I have this other five by five, and they share three states. The 10 by 10 is more complicated than the five by five, but if they share three states and they're compatible, when I join them I can get a resonance: the big dynamics gets resonated into the smaller one, and it has a more compact representation. Within this trace logic, I can begin to formally do with mathematical precision the kind of thing that you're doing. That's one direction. But now, looking at things inside space-time, first I should say: outside of space-time, the dynamics of these Markovian kernels need not have increasing entropy. The entropy can be constant at each step, which means there need not be an arrow of time in the basic Markovian dynamics beyond space-time. But it's a theorem.

[1:08:02] Richard Watson: When you take a slice of them, there is an entropy.

[1:08:05] Donald Hoffman: When you lose any information in a projection, you get as an artifact of that loss of information, the entropy increases. In evolution, the fundamental limited resource is time. If you don't mate in time, you don't reproduce. If you don't eat in time, you die. If you don't breathe in time, you die. My guess is that the arrow of time that we see inside space-time is not an insight into a deeper reality at all. It's entirely 100% an artifact of loss of information. That means that our entire picture. I love Darwin's theory. I've done a lot of work on Darwin's theory. It's the best theory that we have of biological evolution. There's nothing close to it. Every scientific theory has its limits. My claim is that all of Darwin's theory is an artifact of the loss of information in the projection into space-time. That means that the distinction that we make between organisms and resources, competing nature, red in tooth and claw, all of it is not an insight into a deeper reality beyond space-time. Every bit of it is an artifact of the limitations of our headset. Nothing — no insight.

[1:09:42] Richard Watson: Yeah.

Donald Hoffman: That's why looking inside space-time for the evolution of intelligence may be the wrong thing.

[1:09:52] Richard Watson: But I'm not looking for the evolution of intelligence.

[1:09:55] Donald Hoffman: Good, We're on the same page.

[1:09:58] Richard Watson: Evolution, sorry, yeah, no, evolution is a product of intelligence, not the process that creates it.

[1:10:06] Donald Hoffman: It's a product of the loss of information about the way intelligence really works.

[1:10:12] Richard Watson: When harmonic relationships are set up in a resonator, they're already cognitive. You didn't need any natural selection for that.

[1:10:22] Donald Hoffman: Exactly right.

[1:10:23] Richard Watson: When you view it at a particular time slice, a particular strobe, when you look at it with another song, it'll look like a discrete object that's reproducing. When you view one octave with another octave, what you see is, instead of a big loop that twists and folds on itself and then unfolds back into a big loop, it appears to do a big loop that folds and twists and divides and creates two.

[1:10:52] Donald Hoffman: Yes.

[1:10:54] Richard Watson: When you view one octave with another octave that's a little bit off, you get this continuous expansion, creating stuff out of stuff out of stuff. It has this weird property that it looks like you took something and you broke it in half, but the two halves that you have, they're not halves, they're wholes. How did that happen? It's because the whole is already folded inside. It feels pre-formationist, but that's what harmonics are: the whole is already folded into all of the parts.

[1:11:27] Donald Hoffman: I think those intuitions can be cashed out with this precise mathematics. I have not done that, but I would agree it is a very fascinating direction.

[1:11:44] Richard Watson: I wonder what's left for me to add if you've already done all the math. I'm working at an intuitive level, but you've already done all the math. I'm wondering whether there's something in the notion of how a physical system can come to have knowledge of an environment just through an ordinary Newtonian deformation of its internal structure — a ball running downhill, local energy minimization, which puts knowledge into it if it has this folded structure. There are two levels of architecture happening at once. It's not just a language in which you can write intelligent things, but a description of the process that puts the intelligence into it as well.

[1:12:38] Donald Hoffman: First of all, I don't want to give the impression that we've solved everything. That's not the impression I want to give. I think that we've taken a first step in what's going to be, I think, a really interesting and long journey. But just a first baby step is the way I look at it. For example, I can't tell you yet how to model even a quark in our theory outside of space-time. All the fun work is ahead still on that. We have hints. I can say what some of the hints are to try to make this kind of connection into space-time. One is we propose, and we do this in the paper that I'm writing right now, that the mass of a particle corresponds to the entropy rate of the recurrent communicating class of the Markovian kernel that is a projection of. The entropy rate is — a kernel: each row is a probability measure. You can talk about the entropy of each row. Each row has its own entropy. If it's an ergodic kernel, you have a stationary measure. So you have a probability measure for each row. You can add up all the entropies weighted by their stationary measure, and it's called the entropy rate. It's a very simple, nice clean notion of the entropy of the entire kernel. We propose that is what corresponds to mass in physics. The entropy rate is telling you how much each state — the entropy rate of the system is telling you effectively how influential it is. If you have all zeros and ones, then you hardly influence anybody else. You only influence one thing. That's going to be zero entropy. If you have a bunch of zeros and a one in a row, that row has zero entropy. It's going to have no influence. It has no mass.

[1:15:05] Richard Watson: You're only ever seeing the entropy of a particular projection.

[1:15:16] Donald Hoffman: Right.

[1:15:18] Richard Watson: You're not seeing the true entropy or the true mass, right?

[1:15:21] Donald Hoffman: And that's going to be really important when we do empirical tests of our theory because we need to actually understand the statistics of this partial sampling process that's going to happen. The trace chains are assuming an infinite trace, but we will have finite traces. And what we plan to show is that makes a difference in what we get in our physics. Get the quarks and gluons inside the proton. When you look at the forces spatial and temporal scales inside the proton, what they call Bjorken x, which is the temporal scale, and Q-squared, which is the spatial scale, you see 3 valence quarks, 2 up quarks and a down quark. As you start to get finer and finer spatial and temporal resolution, you see a bunch of quark-antiquark pairs, what they call a quark sea, and an ocean of gluons. As you continue to go even further down, you get just an ocean of gluons. It's seething gluons, and that's all you see. How are we going to explain that kind of thing? Quarks are fermions, they're massive particles, and gluons are massless. They have no mass. They're traveling at the speed of light inside a proton. It's frenetic. The inside of a proton is frenetic because there are particles traveling back and forth at the speed of light inside this tiny, tiny little thing. This is truly a seething thing. What is a massless particle in our theory? It corresponds to a matrix, a Markov matrix, that has only zeros and ones. Given our definition of entropy rate being mass, we now know what massless particles are. The massless spin one particles are particles that are periodic. You can start to see this really beautiful dictionary. Now, what happens when we start trying to build up a trace chain at really high temperature, with really small time samples and really fine resolution? The way you're going to do it is you're going to have to sample. I got this state, now I got this state, and I got this state. Now I can start to figure out there's probability of going from this state to this state, so I can start to build up. What am I going to see? My initial matrices are going to be zeros and ones, because I don't have enough data to do anything finer. But that's not an insight into the nature of reality. That's sampling error. That's what we're going to claim about the gluon sea: when they look at the proton in very, very high resolution, when we're getting closer and closer to reality, what you're seeing is entirely sampling error.

[1:18:16] Richard Watson: Doesn't that gluon C end up looking like something that's a continuum with waves of much lower frequency that connect it right back up to the top level?

[1:18:37] Donald Hoffman: There are all sorts of weird structures that you can see and unusual dynamics down at that level. It's not just noise; it's noise with some kind of unusual coherence to it, which is perhaps what you would expect when this is a sampling from something deeper that does have coherence.

[1:19:02] Richard Watson: Yeah.

[1:19:03] Donald Hoffman: A sampling error on something bigger.

[1:19:05] Richard Watson: That's asking, the tiny vibrations between the bow and the string are just heat; they're just a sea of inherent microscopic influences. How come then, when you add them all up, they become this fundamental? They weren't tiny incoherent things. They actually had structure to them because otherwise they wouldn't create the stick-slip dynamics of the particular structure. They can't be; they would all cancel out if they didn't have any structure to them.

[1:19:42] Donald Hoffman: That's right. If this is a sampling of a pre-existing kernel, there is a structure to that kernel which is going to be effective; as you get more and more sampling, you'll begin to see the harmonic structure of the underlying kernel. We have money to hire one or two postdocs who have recent PhDs in algebraic geometry and know these positive geometries. We're about to put the word out, but if you know any bright, young, new PhDs in mathematics who know algebraic geometry.

[1:20:20] Richard Watson: That's not the kind of PhD I know. I can barely do matrix multiplication myself. But if I come across any, I'll let you know.

[1:20:32] Donald Hoffman: Yes, thank you.

[1:20:33] Richard Watson: If you had people that could do that kind of maths but understood the biology, you would want them to stay with you and do that kind of maths. For me, I would like to explore those relationships between shape and form of the Clifford algebra, the geometric algebra, the lambda calculus, universal programming language, and the adaptive processes, processes of adaptation and learning that happen spontaneously as a system deforms under stress. That's the sort of calculus that I would like to be able to relate.

[1:21:25] Donald Hoffman: I'm very interested in going there one step at a time. We would like to do that. The lambda calculus is quite fun. When I was a graduate student at MIT, we had these Lisp machines. Lisp is basically lambda calculus. There's a programming language based on lambda. So I wrote my entire dissertation programs in the lambda calculus on the Lisp machine. Keeping track of all the parentheses was, back then, quite a chore.

[1:22:01] Richard Watson: Does that make sense, when you think about what's the difference between the program and the data in lambda calculus? It's just which side of the application you write it on, right? When one song meets another song, if an observer takes the frame of reference of song A, then song A is a program that is operating on song B. But if the observer takes the frame of reference of song B, then song B is the program that is operating on song A. Taking the frame of reference is just turning around to phase-lock the components that are in common between the observer and song A, or turning around to phase-lock with the components which are in common with song B.

[1:22:44] Donald Hoffman: I agree. One way to put it is how you decide to attend to the whole system. Which way are you attending to it? One way to think about this trace process is it's an attention. When I trace on these three states, what I'm doing is I want to attend to those three states. I'm attending to it. So it's another way of thinking about this: there is this one universal consciousness, in some sense, the whole, and different ways of attending to aspects of it.

[1:23:10] Richard Watson: Yeah.

Donald Hoffman: That's a good way of thinking about this. You see different music when you look at it from different points of attention.

[1:23:17] Richard Watson: Fantastic.

[1:23:19] Donald Hoffman: A lot of fun.

[1:23:20] Richard Watson: Fantastic. I'm so enthused. Thank you.

[1:23:23] Donald Hoffman: Thank you. It was very fun. The synergy was surprisingly good between the ideas, so I'm grateful. I hope Mike will have us talk again.

[1:23:34] Richard Watson: When he left, the recording stopped, but I pressed record again. We might be able to split this together if we need to.

[1:23:41] Donald Hoffman: That would be good, because I think this conversation might be one that a lot of people would be interested in.

[1:23:51] Richard Watson: Thanks, Don. Nice to meet you.

[1:23:52] Donald Hoffman: You too.


Related episodes