Part 18: Thinking About Thinking

AI offers theoretical insights into human memory

May 09, 2025

Humpty Dumpty sat on a wall, Humpty Dumpty had a great fall. Can all our computers and all our algorithms put Humpty’s brain parts back together again?

Reading and language, those most human of cognitive functions, have long been the domain of cognitive science. But new technologies for high-dimensional recording directly from the human brain are making it possible to study the neural coding of language and other human-specific cognitive functions. Making sense of the complex activity tied to these processes requires new theoretical approaches—and insights from artificial intelligence—a new field called NeuroAI (Sejnowski, 2025).

Reading and language, those most human of cognitive functions, have long been the domain of cognitive science. However, new technologies for high-dimensional recording directly from the human brain are now making it possible to study the neural coding of language and other human-specific cognitive functions. Making sense of the complex activity tied to these processes will require new theoretical approaches.

Thinking about reading

As you read this article, your eyes make fast, saccadic movements across the page, taking in small groups of words in your fovea three times per second. Each saccade is a snapshot that must be integrated with all the previous words, building up a conceptual understanding of what is being conveyed. After reading this article, your brain will think about it in the context of experiences and thoughts previously stored in long-term memory. Thinking is generative. Thinking underlies planning future actions. Thinking is fleeting, constantly coming and going.

The time scale for these cognitive functions is minutes to hours, much longer than well-studied sensorimotor actions lasting seconds. Researchers have developed conceptual frameworks for interpreting neural activity for fast automatized actions, correlating firing rates of sensory and motor neurons with sensory perception and behavior. This approach doesn’t work when analyzing neural population activity while thinking on much longer time scales that are not directly related to behavior. As a field, we know much less about the fundamental neural mechanisms that underlie thinking, planning, and reasoning. We need a new conceptual framework for how globally distributed brain states are formed and maintained for hours.

Linking words read across many sentences or words heard during a long lecture requires temporal context (Muller et al. 2024) to relate a new word to previous words. How do brains encode temporal context over hours? Long-term working memory (Ericsson and Kintsch, 1995), which supports longer-term cognitive functions, likely plays a role. Long-term working memory receives and maintains sensory inputs, using them for cognitive processing over hours, intermediate between short-term and long-term memory (Fig. 1). How does long-term working memory implement temporal context? How is long-term working memory used to generate cognitive behaviors such as thinking and planning? How are fleeting thoughts and temporary plans converted to overt behaviors?

**Figure 1. Organization of human memory systems**. *Sensory memories* are short-lived but can enter *short-term memory* if attended. Cognitive processing of auditory and visual representations in the current context are managed in *long-term working memory* by a central executive. *Long-term memories* are retrieved into the *working memory*. The duration of *long-term working memories* are extended with *rehearsal* and can be consolidated into *long-term memory* during sleep. Self-generated actions are performed by the *Motor System* (Courtesy of Carmen Amo Alonso, based on Atkinson and Shiffrin, 1968)

Hints from ChatGPT

Artificial intelligence may offer insight into temporal context. Like humans, large language models track information about the sequence of words across many sentences and paragraphs and use temporal context to link these words semantically. I argue here that generative transformers, a key architectural feature of many large language models, demonstrate how neural networks can create temporal context and that a similar process is at work in biological brains. I will make the case that temporal context, implemented by transformers in a spatially static way, is implemented dynamically in brains.

Transformers are the engines under the hood of ChatGPT. They are feedforward neural networks that comprise an encoder that receives queries and a decoder that outputs words in English, one word at a time. Each output word is looped back to the input of the decoder. As each word is produced on the output layer, it is added to the long input vector, providing a comprehensive temporal context for predicting the next word. Transformers, therefore, convert temporal sequences into a spatial sequence that gives the feedforward network access to all the words at the same time. Traveling waves of sparse cortical activity offer a candidate dynamical mechanism for temporal context in brains. A traveling wave recodes input sequences into spatial patterns and extends working memory across the cortex.

What is a cortical traveling wave?

Traveling waves were first observed 100 years ago and are especially prominent in the cerebral cortex; however, we know little about what they do. Scientists can visualize traveling waves with voltage-sensitive dyes (see Fig. 2). Sensory inputs can trigger traveling waves. Cortical traveling waves are sparse, with only a few percent of the cortical neurons activated by a passing wave. Unlike simulations of dense traveling waves in small network models, where all the neurons spike as the wave goes by, sparse traveling waves require large recurrent neural network models with 100,000 spiking neurons with the same connectivity as the cortex.

**Figure 2. Optical recordings of neurons in mouse visual cortex measured with a voltage-sensitive dye** (Haziza et al., 2024). *Top*: two sequences of 3 millisecond frames illustrating waves traveling from bottom to top of frame. *Bottom*: Time course of voltage measurements at three locations in the top image separated by half a millimeter showing the waveform and phase differences of the traveling wave. The color bar indicates the level of activity across the cortical patch of a mouse across time. Traveling waves are wave packets.

Traveling wave electrical activity in recurrent networks can extend working memory with rehearsal up to a minute. However, some other mechanism is needed to extend time scales from a minute to hours. A promising candidate is spike-timing-dependent plasticity (STDP), a form of plasticity that requires repetitive, precisely-timed pairing of pre- and postsynaptic spikes within 10 milliseconds of each other at frequencies above 10 Hz. In cortical slice experiments, spike pairings repeated 50 or more times induce enduring changes in synaptic strengths. With fewer pairings, however, the change in strength fades away over minutes and hours, depending on the number and frequency of pairings. STDP is generally considered a synaptic mechanism for forming long-term memories. Instead, I propose that STDP mainly supports temporary working memories rather than long-term memory.

STDP is a rapid, but temporary, weight change.

STDP can rapidly change the strength of cortical synapses, the vast majority of which are small and labile. These changes could support a temporary working memory complementary to the fewer but larger, more stable synapses that store long-term memories and are used for fast sensorimotor processing. Although synaptic plasticity in these small synapses may be temporary, their capacity is immense. The induction phase of synaptic plasticity triggered by waves traveling across the cortex is like a palimpsest that can be rewritten many times and could support a Global Workspace, a leading model for conscious awareness that posits that information is shared across cortical regions. Traveling waves could accomplish this by rapidly assembling a second tier of global connectivity on top of the first tier, which supports more automatized sensorimotor behaviors. These two tiers are complementary, and their relationship is somewhat analogous to classical mechanics and quantum mechanics, where a collapse of a quantum wave packet corresponds to a decision to recruit the motor system.

I suggest that together, these two aspects of cortical physiology—traveling waves, which are ubiquitous but lack a well-established function, and STDP, which is well-established but thought to be the basis of long-term memory—are responsible for long-term working memory and underlie some aspects of thinking and cognitive processing. These two neural mechanisms can interact because the frequencies of cortical traveling waves match the frequencies of pairing required for STDP. In this new conceptual framework, STDP is induced temporarily by precisely timed wavefronts. Spontaneous traveling waves are also ubiquitous and could be used to recall long-term memories and rehearse fading working memories (Fig. 1).

The grid of large synapses supporting long-term memories for tier 1 provides high-speed highways for cortical activity in tier 2 called mathematical manifolds. The vehicles on the road are not single passengers, but trucks that carry bundles of associated items called a schema – a structured plan, such as what to expect when you visit a restaurant (Rumelhart, 1980).

These are a few key pieces of the thinking puzzle. The Humpty Dumpty challenge is to put all the pieces together again, including pieces still left out. For example, we still need to incorporate the roles of inhibitory neurons and neuromodulators.

Transformers in ChatGPT have given us insights into how temporal context and long-term working memory might be implemented in brains, offering one example of NeuroAI’s new conceptual framework for understanding cortical function.

Can ChatGPT think?

ChatGPT can answer a vast range of factual questions, not always accurately, but enough correctly to be useful. An area where ChatGPT stumbles is simple arithmetic, where humans also have the same difficulties. With training, we get better, and so does ChatGPT when given special fine-tuning to go through the same thought process that children are taught, called the chain of reasoning: Break the problem down into subparts that are easier to solve and try out different approaches to each part.

Mimicking mathematical reasoning has significantly improved ChatGPT's math performance. However, the results are not as good when asked to solve different kinds of math problems, especially ones that human have not yet solved. What is missing is humans' flexibility and agility to use analogies and self-generate fresh workarounds. Can we get a hint from human brains to help ChatGPT think more like us?

What happens at the end of a dialog with ChatGPT? A human would continue thinking about a problem, but ChatGPT goes silent. There is no self-generated activity in ChatGPT. We have an inner monologue with ourselves that is absent from ChatGPT. Thinking about how transformers achieve temporal context has given us a fresh perspective on how human brains can self-generate thought. The next generation of ChatGPT might benefit from having a self-generative, long-term working memory similar to ours.

References

Atkinson, R.C. and Shiffrin, R.M. Human Memory: A Proposed System and its Control Processes, Atkinson, R. C. (1968). A proposed system and its control processes. The Psychology of Learning and Motivation, 2, 89-195

Baars, B. J. (1988). A Cognitive Theory of Consciousness. Cambridge University Press.

Davis, Z. W., Benigno, G. B, Fletterman, C., Desbordes, T., Steward, C., Sejnowski, T. J., Reynolds, J. H., Muller, L. Spontaneous traveling waves naturally emerge from horizontal fiber time delays and travel through locally asynchronous-irregular states, Nature Communications, 12, 605 (2021).

Ericsson, K. A., Kintsch, W. (1995). Long-term working memory. Psychological Review, 102(2), 211.

Feldman, D. E. (2012). The spike-timing dependence of plasticity. Neuron, 75(4), 556-571.

Haziza, S., Chrapkiewicz, R., Zhang, Y., Kruzhilin, V., Li, J., Li, J., Delamare, G., Swanson, R., Buzsáki, G., Kannan, M. and Vasan, G., (2024). Imaging high-frequency voltage dynamics in multiple neuron classes of behaving mammals. bioRxiv 10.1101/2024.08.15.607428

Loewenstein, Yonatan, Annerose Kuras, and Simon Rumpel. (2011) Multiplicative dynamics underlie the emergence of the log-normal distribution of spine sizes in the neocortex in vivo.” J. Neuro. 31.26: 9481-9488.

Muller L., Chavane F., Reynolds J., Sejnowski T. J. Cortical travelling waves: mechanisms and computational principles. (2018) Nature Reviews Neuroscience 19(5): 255-268.

Muller, L., Churchland, P. S., Sejnowski, T. J. (2024). Transformers and cortical waves: encoders for pulling in context across time. Trends in Neurosciences. 47 (10):788-802.

Rumelhart, D. E. (2017). Schemata: The building blocks of cognition. In Theoretical issues in reading comprehension. Spiro, R. J., Bruce, B. C., Brewer, W. E., Eds., Lawrence Erlbaum Associates, (Hillsdale, New Jersey). (pp. 33-58)

Sejnowski, T. J. (2024). ChatGPT and the Future of AI., MIT Press. For regular updates to this book see my Substack blog on “Brains and AI.”

Sejnowski, T. J. (2025). Thinking about thinnking, The Transmitter. https://www.thetransmitter.org/human-neurotechnology/thinking-about-thinking-ai-offers-theoretical-insights-into-human-memory/

Zador, A., Escola, S., Richards, B., Ölveczky, B., Bengio, Y., Boahen, K., Botvinick, M., Chklovskii, D., Churchland, A., Clopath, C., DiCarlo, J., Ganguli, S., Hawkins, J., Körding, K., Alexei Koulakov, K., Yann LeCun, Y., Lillicrap, T., Marblestone, A., Olshausen, B., Pouget, A., Savin, C., Sejnowski, T. Simoncelli, E., Solla, S., Sussillo, S., Tolias A. S., Tsao, D., Catalyzing next-generation Artificial Intelligence through NeuroAI. Nat Commun 14, 1597 (2023).

Elena Bauza

May 9

Excellent article, very informative

Expand full comment

Blanca

Jun 13

The link between transformers and human long-term working memory is an interesting one. Using temporal context as a shared mechanism makes sense, especially when you consider how both systems build meaning across sequences.

The idea that STDP supports temporary working memory instead of just long-term memory is also compelling. It helps explain how we hold and reshape thoughts over minutes or hours without committing them to permanent storage.

Curious if future AI systems could benefit from something like spontaneous internal activity,like a background loop that keeps refining or connecting ideas even after the prompt ends. That might be a step closer to what we experience as “thinking.”

Brains and AI

Part 18: Thinking About Thinking

AI offers theoretical insights into human memory

Discussion about this post