Vercel AI SDK
The Vercel AI SDK provides a unified interface for integrating AI capabilities—such as text-to-speech—into web applications built with frameworks like Next.js and SvelteKit. It abstracts away provider-specific details, making it simple to work with AI models across frameworks like Next.js and SvelteKit.
This guide walks you through how to use the AI SDK to integrate Hume’s expressive TTS into your web application.
Prefer code over docs? See our Next.js example using Hume TTS with the Vercel AI SDK.
Installation
To get started, install the required packages:
- ai: the core AI SDK package.
- @ai-sdk/hume: the Hume speech provider package.
Authentication
The HumeProvider integrates Hume’s TTS API and requires a valid API key to authenticate requests. Follow these steps to obtain your credentials and configure your environment.
Get your Hume API key
Sign in to the Hume Platform and follow the getting your API key guide to get your API key.
Usage
The AI SDK provides a unified generateSpeech
function for converting text to speech across providers. Hume integrates into this interface via the
HumeProvider, which exposes Hume’s expressive TTS
model through the speech()
factory method.
Basic implementation
To generate speech, call generateSpeech()
with at least two arguments:
model
: Usehume.speech()
to specify Hume as the provider.text
: The input string to be synthesized into speech.
Specify a voice
Use the voice
argument to specify a voice from Hume’s Voice Library
or one of your Custom Voices.
Add instructions
Guide tone and delivery by passing natural language instructions to the instructions
argument. See our Acting
instructions guide for examples and best practices.
Provide context
Hume’s speech language model can reuse previously generated speech tokens to preserve emotion, cadence, and linguistic continuity.
Provide context via providerOptions
in one of two ways:
- Generation ID: The ID corresponding to a previous generation.
- Context utterances: One or more full TTS-input objects that the model synthesizes into speech tokens and then uses as context.
For more details on how context works, see our Continuation guide.
Process audio
The response from generateSpeech
is a SpeechResult
containing a GeneratedAudioFile
with the following properties:
base64
: The file as a base64-encoded string.uint8Array
: The file as a Uint8Array.mimeType
: The audio file’s MIME type (e.g.,"audio/mpeg"
).format
: File format (e.g.,"mp3"
,"wav"
)
The code snippet below converts the returned audio into a Blob URL you can feed directly to an audio player for playback.
Constraints
-
Voice specification: Pass the voice’s ID rather than its display
name
—the provider looks up voices by ID. -
Implicit default voice: If you omit
voice
, the SDK defaults to a voice from Hume’s Voice Library—Colton Rivers (d8ab67c6-953d-4bd8-9370-8fa53a0f1453
). -
Audio formats: Output can be WAV, MP3, or PCM. Defaults to MP3 if not specified.
-
Fixed sample rate: The Hume API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.
-
One utterance per request: The AI SDK sends a single utterance per call. Split multi-utterance text inputs into separate requests.
-
Config surface: Through the SDK you can adjust
text
,voice
,instructions
,speed
, andoutputFormat
. Additional Hume-specific TTS options aren’t exposed yet. -
Audio-only response: The SDK calls the /v0/tts/file endpoint, which returns just the audio payload—metadata such as generation_id isn’t included in the response.
-
Streaming not yet integrated: Hume offers real-time streaming TTS, but the AI SDK hasn’t wired up those endpoints yet, so audio is returned only after synthesis completes.
Resources
Explore the source code for the HumeProvider
in the AI SDK on GitHub.
Reference the official AI SDK docs for a full list of options and configuration details.
Check out a working example to get started with integrating Hume TTS in your Next.js application.
Learn more about Hume’s Speech language model, and features of Hume’s TTS API.