Section 1 — The Model

    Cache tokens

    Input tokens the provider has cached from a previous model provider request so it doesn't have to reprocess them. When consecutive requests share a prefix, the provider reuses t...

    Matt Pocock
    Matt Pocock

    Input tokens the provider has cached from a previous model provider request so it doesn't have to re-process them. When consecutive requests share a prefix, the provider reuses the work via its prefix cache and bills the cached portion at a much lower rate. The lever that makes long sessions affordable — without it, every turn re-pays for the whole history.

    Usage:

    "Cost on long sessions is brutal — eight bucks for a refactor."

    "Check the cache tokens. If the harness is reordering the system prompt or files between turns, the prefix breaks and you re-pay full input rate every request."

    Want more than vocabulary?

    Join AI Hero for practical skills, thinking on AI engineering, and resources that keep you ahead of the curve.

    I respect your privacy. Unsubscribe at any time.

    Share