Section 1 — The Model
Cache tokens
Input tokens the provider has cached from a previous model provider request so it doesn't have to reprocess them. When consecutive requests share a prefix, the provider reuses t...
Input tokens the provider has cached from a previous model provider request so it doesn't have to re-process them. When consecutive requests share a prefix, the provider reuses the work via its prefix cache and bills the cached portion at a much lower rate. The lever that makes long sessions affordable — without it, every turn re-pays for the whole history.
Usage:
"Cost on long sessions is brutal — eight bucks for a refactor."
"Check the cache tokens. If the harness is reordering the system prompt or files between turns, the prefix breaks and you re-pay full input rate every request."