AIHero

    The Model

    Model provider request

    One round-trip from the harness to the model provider. The harness sends context; the provider returns one response.

    Matt Pocock
    Matt Pocock

    One round-trip from the harness to the model provider. The harness sends the current context; the provider returns one response (a tool call or a final answer). A single user message can spawn many model provider requests if the agent calls tools — each tool result triggers another request.

    Each request carries everything: the system prompt, the full conversation so far, every tool result. The model is stateless, so the provider keeps nothing between requests — request forty re-sends what request thirty-nine sent, plus one more tool result. The prefix cache exists to make this repetition affordable.

    The request is also the unit of billing. Input tokens, output tokens, and cache discounts are all counted per request, which is why an innocuous-looking question can cost a surprising amount: the cost isn't proportional to your message, it's proportional to the number of requests times the size of the context each one carries.

    It's worth keeping the request distinct from the turn. A turn is one exchange with you, and a single turn — "fix the failing test" — plays out as a chain of requests:

    RequestModel returnsHarness then
    1Tool call: run the testsRuns them, appends the failure output
    2Tool call: read the test fileAppends the file contents
    3Tool call: read the source fileAppends the file contents
    4Tool call: edit the source fileApplies the edit, appends the result
    5Tool call: run the tests againRuns them, appends the pass output
    6Final answer: "fixed, tests pass"Shows it to you

    Six requests for one turn — each one re-sending the whole context. When you wonder where the tokens went, count the requests, not the turns.

    Usage:

    "One question burned forty thousand tokens?"

    "Look at the tool calls — twelve grep, eight read, four edits. Each tool result spawns another model provider request, and the whole session prefix re-sends every time."

    Want more than vocabulary?

    Join AI Hero for practical skills, thinking on AI engineering, and resources that keep you ahead of the curve.

    I respect your privacy. Unsubscribe at any time.

    Share