AIHero

    The Model

    Output tokens

    Tokens the model generates back. Billed at a higher rate than input tokens, since they cost more compute to produce.

    Matt Pocock
    Matt Pocock

    Tokens the model generates back. Billed at a higher rate than input tokens — commonly around five times the rate — since they cost more compute to produce.

    Everything the model writes counts: the prose you read, the code it emits, tool calls, and any extended thinking the model does before answering. That last one surprises people — reasoning tokens are billed as output even when the harness often doesn't show them to you.

    Output tokens also set the pace of a session. The model reads input quickly but generates output one token at a time, so when a turn feels slow, it's almost always the output being written, not the input being read. A long wait usually means a long answer is coming.

    Usage:

    "The refactor session is burning through credit even though the inputs are small."

    "Agent's rewriting whole files instead of patching. Output tokens cost roughly five times the input rate — get it emitting edits and the bill drops."

    Want more than vocabulary?

    Join AI Hero for practical skills, thinking on AI engineering, and resources that keep you ahead of the curve.

    I respect your privacy. Unsubscribe at any time.

    Share