Section 1 — The Model
Prefix cache
The providerside store that lets consecutive model provider requests skip reprocessing a shared prefix. When the start of a request matches the start of a recent one — same syst...
The provider-side store that lets consecutive model provider requests skip re-processing a shared prefix. When the start of a request matches the start of a recent one — same system prompt, same history up to some point — the provider reuses its prior work and bills those tokens as cache tokens at a much lower rate.
Anything that changes the prefix (reordering files, rewriting the system prompt mid-session, injecting a timestamp near the top) invalidates the cache from that point on, and the rest of the request bills at full input token rate.
Usage:
"Why did the bill spike halfway through the session?"
"Harness started injecting the current time into the system prompt every turn. Prefix cache breaks at the first changed token, so every request after that billed at full rate."