Passing Images and Files

Okay, let's see if it's working. We have an image being uploaded.

Could you describe this image please?

We get back a description. The way this works is we convert the File into a data URL and send it as a part alongside the user's text.

Convert a File to a data URL:

const fileToDataURL = (file: File) => {
  return new Promise<string>((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result as string);
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
};

We send two parts in sendMessage:

text: the user's input
file: includes a url (data URL or hosted URL) and the mediaType (IANA type from the File)

sendMessage({
  parts: [
    {
      type: 'text',
      text: input,
    },
    {
      type: 'file',
      mediaType: file.type,
      url: await fileToDataURL(file),
    },
  ],
});

On the server, we convert UI messages, call the model, and stream the response back to the UI:

const modelMessages: ModelMessage[] =
  convertToModelMessages(messages);

const streamTextResult = streamText({
  model: google('gemini-2.0-flash'),
  messages: modelMessages,
});

const stream = streamTextResult.toUIMessageStream();

return createUIMessageStreamResponse({
  stream,
});

Just a little bit of front-end work lets us pass an image directly to the LLM. This multimodal flow is straightforward with the AI SDK.

If you're wondering about other modalities (transcribing audio, generating images, etc.), the AI SDK likely supports them too.

Nice work!

Okay, let's see if it's working. We have an image being uploaded.

Could you describe this image please?

We get back a description. The way this works is we convert the File into a data URL and send it as a part alongside the user's text.

Convert a File to a data URL:

const fileToDataURL = (file: File) => {
  return new Promise<string>((resolve, reject) => {
    const reader = new FileReader();
    reader.onload = () => resolve(reader.result as string);
    reader.onerror = reject;
    reader.readAsDataURL(file);
  });
};

We send two parts in sendMessage:

text: the user's input
file: includes a url (data URL or hosted URL) and the mediaType (IANA type from the File)

sendMessage({
  parts: [
    {
      type: 'text',
      text: input,
    },
    {
      type: 'file',
      mediaType: file.type,
      url: await fileToDataURL(file),
    },
  ],
});

On the server, we convert UI messages, call the model, and stream the response back to the UI:

const modelMessages: ModelMessage[] =
  convertToModelMessages(messages);

const streamTextResult = streamText({
  model: google('gemini-2.0-flash'),
  messages: modelMessages,
});

const stream = streamTextResult.toUIMessageStream();

return createUIMessageStreamResponse({
  stream,
});

Just a little bit of front-end work lets us pass an image directly to the LLM. This multimodal flow is straightforward with the AI SDK.

If you're wondering about other modalities (transcribing audio, generating images, etc.), the AI SDK likely supports them too.

Nice work!

AI SDK Basics

LLM Fundamentals

Agents

Persistence

Context Engineering

Evals

Streaming

Agents and Workflows

Advanced Patterns

Reference

Passing Images and Files Solution

AI SDK Basics(10)

AI SDK Basics

LLM Fundamentals(5)

LLM Fundamentals

Agents(5)

Agents

Persistence(4)

Persistence

Context Engineering(5)

Context Engineering

Evals(7)

Evals

Streaming(4)

Streaming

Agents and Workflows(4)

Agents and Workflows

Advanced Patterns(4)

Advanced Patterns

Reference(9)

Reference

Passing Images and Files Solution

Video Transcript

Video Transcript