AIHero

    Failure Modes

    Attention budget

    Each token has a finite amount of influence to distribute across the rest of the context. Per-token, doesn't grow when context does.

    Matt Pocock
    Matt Pocock

    Each token has a finite amount of influence to distribute across the rest of the context. Heavy influence on one relationship leaves less for others. The budget is per-token and doesn't grow when the context does, which is why long sessions dilute.

    Think of it as signal and noise. Your instruction is a signal at fixed volume; every other token in the context window is competing sound. The instruction never gets quieter — it's still there, character for character — but as the context grows, the room gets louder around it, and the signal-to-noise ratio drops. An instruction that was the loudest thing at 10k tokens of context is background hum at 150k. This is the mechanism behind attention degradation: the model doesn't forget; the signal gets lost in the noise.

    The symptom reads as disobedience — the agent agreed to a constraint early on and then drifts from it, and re-pasting the constraint helps only briefly. The cause isn't the instruction; it's everything else in the window competing with it.

    What you can control is what goes into the context. Content that doesn't serve the task isn't neutral — it's noise over everything that does. Keep the window small, clear when the accumulated context stops paying for itself, and restate the constraints that matter instead of trusting their early mention to hold.

    Usage:

    "Why does it keep ignoring the schema I pasted at the top?"

    "We're well into the dumb zone — every token's attention budget is fixed, but the context kept growing. The signal on the schema is now competing with thousands of newer tokens."

    Want more than vocabulary?

    Join AI Hero for practical skills, thinking on AI engineering, and resources that keep you ahead of the curve.

    I respect your privacy. Unsubscribe at any time.

    Share