Okay, the solution here is to await output.usage and log it to the console.
console.log(await output.usage)
We can see that output.usage has several different properties on it.
/*cachedInputTokens?inputTokensoutputTokensreasoningTokens?totalTokens*/
We have the inputTokens. In other words, this is the number of tokens in our input, in our prompt, our system prompt, whatever.
Then we have the outputTokens. These are sometimes called completion tokens. This is the number of tokens that the LLM generated.
Some models also produce reasoningTokens. This is for quote-unquote thinking models and they may bill you differently for reasoning tokens or output tokens or input tokens.
Finally you've got the totalTokens used here too.
When we run this we can see that we get the streaming output down and then we get the usage logged here.
{inputTokens: 13,outputTokens: 127,totalTokens: 140,reasoningTokens: undefined,cachedInputTokens: undefined}
By the way for those intrigued I'm going to talk about cachedInputTokens pretty soon.
Understanding how much your application is using is obviously really important and the AI SDK makes this really, really simple to observe.
Nice work and I will see you in the next one.