Section 1 — The Model
Inference
Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does nexttoken prediction over the context it's...
Running a trained model to generate output — what happens on every model provider request. Parameters stay fixed; the model just does next-token prediction over the context it's given. Cheap relative to training, but billed per token and the dominant cost of using a model.
Usage:
"Why does the bill scale with usage instead of being a flat license?"
"You're paying for inference — every model provider request runs the model on the provider's hardware. Training already happened, but inference costs accrue per request, and a single turn can expand into many requests when tools are called."