Okay, the solution here is to await output.usage
and log it to the console.
console.log(await output.usage)
We can see that output.usage
has several different properties on it.
/*cachedInputTokens?inputTokensoutputTokensreasoningTokens?totalTokens*/
We have the inputTokens
. In other words, this is the number of tokens in our input, in our prompt, our system prompt, whatever.
Then we have the outputTokens
. These are sometimes called completion tokens. This is the number of tokens that the LLM generated.
Some models also produce reasoningTokens
. This is for quote-unquote thinking models and they may bill you differently for reasoning tokens or output tokens or input tokens.
Finally you've got the totalTokens
used here too.
When we run this we can see that we get the streaming output down and then we get the usage logged here.
{inputTokens: 13,outputTokens: 127,totalTokens: 140,reasoningTokens: undefined,cachedInputTokens: undefined}
By the way for those intrigued I'm going to talk about cachedInputTokens
pretty soon.
Understanding how much your application is using is obviously really important and the AI SDK makes this really, really simple to observe.
Nice work and I will see you in the next one.