Vercel AI SDK

Guide to integrating Hume TTS into your web application with the Vercel AI SDK.

The Vercel AI SDK provides a unified interface for integrating AI capabilities—such as text-to-speech—into web applications built with frameworks like Next.js and SvelteKit. It abstracts away provider-specific details, making it simple to work with AI models across frameworks like Next.js and SvelteKit.

This guide walks you through how to use the AI SDK to integrate Hume’s expressive TTS into your web application.

Prefer code over docs? See our Next.js example using Hume TTS with the Vercel AI SDK.

Installation

To get started, install the required packages:

  1. ai: the core AI SDK package.
  2. @ai-sdk/hume: the Hume speech provider package.
1bun add ai @ai-sdk/hume

Authentication

The HumeProvider integrates Hume’s TTS API and requires a valid API key to authenticate requests. Follow these steps to obtain your credentials and configure your environment.

1

Get your Hume API key

Sign in to the Hume Platform and follow the getting your API key guide to get your API key.

2

Configure environment variables

Define your Hume API key in an environment file. Most frameworks support multiple .env file naming conventions depending on the environment (e.g., .env, .env.local, .env.development, etc.). Use the one that best fits your setup.

.env
HUME_API_KEY=your-api-key-here
3

Supply API key to HumeProvider

Pass your API key using the apiKey field when creating the HumeProvider. This ensures all requests made by the SDK to the Hume TTS API are properly authenticated.

HumeProvider
1import { createHume } from '@ai-sdk/hume';
2
3export const hume = createHume({
4 apiKey: process.env.HUME_API_KEY ?? '',
5});

Usage

The AI SDK provides a unified generateSpeech function for converting text to speech across providers. Hume integrates into this interface via the HumeProvider, which exposes Hume’s expressive TTS model through the speech() factory method.

Basic implementation

To generate speech, call generateSpeech() with at least two arguments:

  • model: Use hume.speech() to specify Hume as the provider.
  • text: The input string to be synthesized into speech.
TypeScript
1import { experimental_generateSpeech as generateSpeech } from 'ai';
2import { hume } from '@ai-sdk/hume';
3
4const result = await generateSpeech({
5 model: hume.speech(),
6 text: 'Hello, world!',
7});

Specify a voice

Use the voice argument to specify a voice from Hume’s Voice Library or one of your Custom Voices.

TypeScript
1import { experimental_generateSpeech as generateSpeech } from 'ai';
2import { hume } from '@ai-sdk/hume';
3
4const result = await generateSpeech({
5 model: hume.speech(),
6 text: 'Hello, world!',
7 voice: '9e068547-5ba4-4c8e-8e03-69282a008f04', // Male English Actor
8});

Add instructions

Guide tone and delivery by passing natural language instructions to the instructions argument. See our Acting instructions guide for examples and best practices.

TypeScript
1import { experimental_generateSpeech as generateSpeech } from 'ai';
2import { hume } from '@ai-sdk/hume';
3
4const result = await generateSpeech({
5 model: hume.speech(),
6 text: 'Hello, world!',
7 voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
8 instructions: 'The voice has a happy and enthusiastic tone.',
9});

Provide context

Hume’s speech language model can reuse previously generated speech tokens to preserve emotion, cadence, and linguistic continuity. Provide context via providerOptions in one of two ways:

  1. Generation ID: The ID corresponding to a previous generation.
  2. Context utterances: One or more full TTS-input objects that the model synthesizes into speech tokens and then uses as context.

For more details on how context works, see our Continuation guide.

TypeScript
1import { experimental_generateSpeech as generateSpeech } from 'ai';
2import { hume } from '@ai-sdk/hume';
3
4const providerOptions = {
5 hume: {
6 context: {
7 generation_id: "795c949a-1510-4a80-9646-7d0863b023ab"
8 }
9 }
10}
11
12const result = await generateSpeech({
13 model: hume.speech(),
14 text: 'Hello, world!',
15 voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
16 instructions: 'The voice has a happy and enthusiastic tone.',
17 providerOptions,
18});

Process audio

The response from generateSpeech is a SpeechResult containing a GeneratedAudioFile with the following properties:

  • base64: The file as a base64-encoded string.
  • uint8Array: The file as a Uint8Array.
  • mimeType: The audio file’s MIME type (e.g., "audio/mpeg").
  • format: File format (e.g., "mp3", "wav")

The code snippet below converts the returned audio into a Blob URL you can feed directly to an audio player for playback.

TypeScript
1import { experimental_generateSpeech as generateSpeech } from 'ai';
2import { hume } from '@ai-sdk/hume';
3
4const result = await generateSpeech({
5 model: hume.speech(),
6 text: 'Hello, world!',
7 voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
8 instructions: 'The voice has a happy and enthusiastic tone.',
9});
10
11const { uint8Array, mimeType } = result.audio;
12const blob = new Blob([uint8Array], { type: mimeType });
13const url = URL.createObjectURL(blob);
14
15const audio = new Audio(url);
16audio.play();

Constraints

  • Voice specification: Pass the voice’s ID rather than its display name—the provider looks up voices by ID.

  • Implicit default voice: If you omit voice, the SDK defaults to a voice from Hume’s Voice LibraryColton Rivers (d8ab67c6-953d-4bd8-9370-8fa53a0f1453).

  • Audio formats: Output can be WAV, MP3, or PCM. Defaults to MP3 if not specified.

  • Fixed sample rate: The Hume API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.

  • One utterance per request: The AI SDK sends a single utterance per call. Split multi-utterance text inputs into separate requests.

  • Config surface: Through the SDK you can adjust text, voice, instructions, speed, and outputFormat. Additional Hume-specific TTS options aren’t exposed yet.

  • Audio-only response: The SDK calls the /v0/tts/file endpoint, which returns just the audio payload—metadata such as generation_id isn’t included in the response.

  • Streaming not yet integrated: Hume offers real-time streaming TTS, but the AI SDK hasn’t wired up those endpoints yet, so audio is returned only after synthesis completes.

Resources