Section 1 — The Model

    Inference

    Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does nexttoken prediction over the context it's...

    Matt Pocock
    Matt Pocock

    Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does next-token prediction over the context it's given. Cheap relative to training, but billed per token and the dominant cost of using a model.

    Usage:

    "Why does the bill scale with usage instead of being a flat license?"

    "You're paying for inference — every model provider request runs the model on the provider's hardware. Training already happened, but inference costs accrue per request, and a single turn can expand into many requests when tools are called."

    Want more than vocabulary?

    Join AI Hero for practical skills, thinking on AI engineering, and resources that keep you ahead of the curve.

    I respect your privacy. Unsubscribe at any time.

    Share