Vercel AI SDK | Hume API

The Vercel AI SDK provides a unified interface for integrating AI capabilities—such as text-to-speech—into web applications built with frameworks like Next.js and SvelteKit. It abstracts away provider-specific details, making it simple to work with AI models across frameworks like Next.js and SvelteKit.

This guide walks you through how to use the AI SDK to integrate Hume’s expressive TTS into your web application.

Prefer code over docs? See our Next.js example using Hume TTS with the Vercel AI SDK.

Installation

To get started, install the required packages:

ai: the core AI SDK package.
@ai-sdk/hume: the Hume speech provider package.

1 bun add ai @ai-sdk/hume

Authentication

The HumeProvider integrates Hume’s TTS API and requires a valid API key to authenticate requests. Follow these steps to obtain your credentials and configure your environment.

Get your Hume API key

Configure environment variables

Define your Hume API key in an environment file. Most frameworks support multiple .env file naming conventions depending on the environment (e.g., .env, .env.local, .env.development, etc.). Use the one that best fits your setup.

.env

HUME_API_KEY=your-api-key-here

Supply API key to HumeProvider

Pass your API key using the apiKey field when creating the HumeProvider. This ensures all requests made by the SDK to the Hume TTS API are properly authenticated.

HumeProvider

1 import { createHume } from '@ai-sdk/hume';
2 
3 export const hume = createHume({
4   apiKey: process.env.HUME_API_KEY ?? '',
5 });

Usage

The AI SDK provides a unified generateSpeech function for converting text to speech across providers. Hume integrates into this interface via the HumeProvider, which exposes Hume’s expressive TTS model through the speech() factory method.

Basic implementation

To generate speech, call generateSpeech() with at least two arguments:

model: Use hume.speech() to specify Hume as the provider.
text: The input string to be synthesized into speech.

TypeScript

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import { hume } from '@ai-sdk/hume';
3 
4 const result = await generateSpeech({
5   model: hume.speech(),
6   text: 'Hello, world!',
7 });

Specify a voice

Use the voice argument to specify a voice from Hume’s Voice Library or one of your Custom Voices.

TypeScript

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import { hume } from '@ai-sdk/hume';
3 
4 const result = await generateSpeech({
5   model: hume.speech(),
6   text: 'Hello, world!',
7   voice: '9e068547-5ba4-4c8e-8e03-69282a008f04', // Male English Actor
8 });

Add instructions

Guide tone and delivery by passing natural language instructions to the instructions argument. See our Acting instructions guide for examples and best practices.

TypeScript

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import { hume } from '@ai-sdk/hume';
3 
4 const result = await generateSpeech({
5   model: hume.speech(),
6   text: 'Hello, world!',
7   voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
8   instructions: 'The voice has a happy and enthusiastic tone.',
9 });

Provide context

Hume’s speech language model can reuse previously generated speech tokens to preserve emotion, cadence, and linguistic continuity. Provide context via providerOptions in one of two ways:

Generation ID: The ID corresponding to a previous generation.
Context utterances: One or more full TTS-input objects that the model synthesizes into speech tokens and then uses as context.

For more details on how context works, see our Continuation guide.

TypeScript

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import { hume } from '@ai-sdk/hume';
3 
4 const providerOptions = {
5   hume: {
6     context: {
7       generation_id: "795c949a-1510-4a80-9646-7d0863b023ab"
8     }
9   }
10 }
11 
12 const result = await generateSpeech({
13   model: hume.speech(),
14   text: 'Hello, world!',
15   voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
16   instructions: 'The voice has a happy and enthusiastic tone.',
17   providerOptions,
18 });

Process audio

The response from generateSpeech is a SpeechResult containing a GeneratedAudioFile with the following properties:

base64: The file as a base64-encoded string.
uint8Array: The file as a Uint8Array.
mimeType: The audio file’s MIME type (e.g., "audio/mpeg").
format: File format (e.g., "mp3", "wav")

The code snippet below converts the returned audio into a Blob URL you can feed directly to an audio player for playback.

TypeScript

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import { hume } from '@ai-sdk/hume';
3 
4 const result = await generateSpeech({
5   model: hume.speech(),
6   text: 'Hello, world!',
7   voice: '9e068547-5ba4-4c8e-8e03-69282a008f04',
8   instructions: 'The voice has a happy and enthusiastic tone.',
9 });
10 
11 const { uint8Array, mimeType } = result.audio;
12 const blob = new Blob([uint8Array], { type: mimeType });
13 const url = URL.createObjectURL(blob);
14 
15 const audio = new Audio(url);
16 audio.play();

Constraints

Voice specification: Pass the voice’s ID rather than its display name—the provider looks up voices by ID.
Implicit default voice: If you omit voice, the SDK defaults to a voice from Hume’s Voice Library—Colton Rivers (d8ab67c6-953d-4bd8-9370-8fa53a0f1453).
Audio formats: Output can be WAV, MP3, or PCM. Defaults to MP3 if not specified.
Fixed sample rate: The Hume API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.
One utterance per request: The AI SDK sends a single utterance per call. Split multi-utterance text inputs into separate requests.
Config surface: Through the SDK you can adjust text, voice, instructions, speed, and outputFormat. Additional Hume-specific TTS options aren’t exposed yet.
Audio-only response: The SDK calls the /v0/tts/file endpoint, which returns just the audio payload—metadata such as generation_id isn’t included in the response.
Streaming not yet integrated: Hume offers real-time streaming TTS, but the AI SDK hasn’t wired up those endpoints yet, so audio is returned only after synthesis completes.

Resources

AI SDK Source Code

Explore the source code for the HumeProvider in the AI SDK on GitHub.

AI SDK Documentation

Reference the official AI SDK docs for a full list of options and configuration details.

Example Project

Check out a working example to get started with integrating Hume TTS in your Next.js application.

Hume TTS Documentation

Learn more about Hume’s Speech language model, and features of Hume’s TTS API.