Token Prediction Engine
Watch an LLM build text, one token at a time
In plain English: the model reads what it has so far, guesses the most likely next word, appends it, and repeats. One word at a time, every time. There is no grand plan for the whole sentence. Each step is a fresh prediction based solely on what came before.
Why this matters: this is the exact mechanism behind every response from GPT, Gemini, Claude, and any other large language model. When an AI writes a poem, answers a question, or translates a sentence, it is running this same loop: read, predict, append, repeat. Understanding this loop means understanding the core of how modern AI “thinks.”
A token is not always a full word. It can be a word, part of a word, or even just punctuation. The word “understanding” might be split into “under” + “standing.” Models work with tokens, not words, because it lets them handle any language or text pattern efficiently.
The context window you see above is the model’s entire memory for this conversation. Everything outside it simply does not exist for the model.
Attention is how the model decides which tokens matter most for the current prediction. When predicting the next word after “The capital of France is,” the model pays heavy attention to “France” and “capital” while mostly ignoring “The.”
The golden glow on each token shows its attention weight. Brighter glow means the model considers that token more important for deciding what comes next.
Language is inherently ambiguous. After “I went to the,” the next word could be “store,” “park,” “doctor,” or thousands of other options. The model assigns a probability to each one, reflecting how likely it thinks each continuation is.
This is what makes AI creative: it does not always pick the most likely option. By sampling from the distribution, it can surprise you with unexpected but perfectly sensible completions.
Every AI chatbot, every code assistant, every AI-generated article uses this exact loop. There is no secret intelligence hiding behind the scenes, just a model that has learned statistical patterns from enormous amounts of text, applied one token at a time.
What you see in this demo is real. Each step makes a live API call to Claude. The probabilities, the attention weights, the chosen tokens are all coming from the same system that powers real AI applications.