Tokens the harness sends on each model provider request — the system prompt, the conversation history, tool results, everything the model reads before it writes. Billed at a lower rate than output tokens, because they are less expensive to process than output tokens.
When doing AI coding, input tokens make up most of your bill. The model is stateless, so each turn re-sends the entire session as input: your first message, every response, every tool result since. The input for turn fifty contains the previous forty-nine turns. A single model provider request might produce a few hundred output tokens but re-send a hundred thousand input tokens of accumulated history.
The prefix cache reduces the cost: history that exactly matches a previous request is billed as cheap cache tokens rather than full-price input. When input costs still hurt, the fix is to shrink what gets re-sent — clearing or compacting between tasks.
Usage:
"Bill's high but the agent's barely writing anything."
"It's the input tokens — every turn re-sends the whole session. Without the prefix cache you re-pay for the history each request."