Sometimes you need to be able to use more than text to prompt your LLM. You might need it to describe images, to extract data from PDFs.
Fortunately, the AI SDK provides a way of passing files through the wire so that your backend, where you actually contact the LLM, can pick them up.
I've given our frontend a couple of upgrades, namely the ability to upload files. I've given you an image that you can upload, and the plan is to ask the LLM to describe the image.
Our /api/chat POST endpoint looks very similar to previous exercises:
export const POST = async (req: Request): Promise<Response> => {const body = await req.json()const messages: UIMessage[] = body.messagesconst modelMessages: ModelMessage[] = convertToModelMessages(messages)const streamTextResult = streamText({model: google("gemini-2.0-flash"),messages: modelMessages,})const stream = streamTextResult.toUIMessageStream()return createUIMessageStreamResponse({stream,})}
We're calling Gemini 2.0 Flash, which is a model that can handle images. Not all models can handle images, so check your provider details for more info.
We're converting the messages that we get from the body into ModelMessages, passing those messages directly into streamText, then turning that into a UIMessageStream, which we stream back to the frontend.
The only TODO inside this exercise is in the frontend.
The issue is really on the sendMessage function. Currently, we are only sending a message with the text:
<ChatInput// ...onSubmit={async (e) => {e.preventDefault()const formData = new FormData(e.target as HTMLFormElement)const file = formData.get("file") as File// TODO: figure out how to pass the file// _as well as the text_ to the// /api/chat route!// NOTE: You have a helpful function below// called fileToDataURL that you can use to// convert the file to a data URL. This// will be useful!sendMessage({text: input,})// ...}}/>
We're extracting the file off the form data that we get from the onSubmit event, but we're not actually passing it to the sendMessage function.
Your job is to look inside this sendMessage function, and especially look at the parts array here, which is going to tell you some interesting little features. Using parts, you should be able to send both a text part and a file part.
fileToDataURL FunctionThere is also a fileToDataURL function down at the bottom that I've provided to you:
const fileToDataURL = (file: File) => {return new Promise<string>((resolve, reject) => {const reader = new FileReader()reader.onload = () => resolve(reader.result as string)reader.onerror = rejectreader.readAsDataURL(file)})}
When we send the file part, we're going to need to turn it into a data URL so that we can send it.
Once this is done, you should be able to upload any image and tell the LLM what you want doing with that image, either to describe it or do a bit of OCR or anything.
Good luck, and I'll see you in the solution.
Modify the sendMessage call to include both text and file data
parts array to send both types of data. Use autocompletion to find the right type.fileToDataURL functionTest your implementation by uploading an image
pnpm run exercise