Voice Guide
Hume’s text-to-speech (TTS) API lets you specify which voice to use when synthesizing speech. You can use a custom voice that you have saved or select one from Hume’s Voice Library. If you omit the voice field, the model will generate one dynamically based on the input text and optional description.
This guide explains how to specify a voice in both standard and streaming TTS requests.
To learn how to create or manage voices, see the Voice Design Guide and Voice Management Guide.
Voice reference options
You can specify a voice in your request using either its id
or name
. Each voice belongs to a provider
, which
indicates the source of the voice and who can access it.
The provider field accepts the following values:
If you omit the provider
field, it defaults to CUSTOM_VOICE
. To use a voice from the Voice Library, you
must explicitly set the provider to HUME_AI
.
CUSTOM_VOICE
HUME_AI
You can find voice IDs and names using the List Voices endpoint or in the My Voices section of the Platform UI.
Specify a voice in your request
To specify a voice for speech synthesis, include the voice field in the first utterance of your request. That voice will be used for all subsequent utterances unless you override it in a later utterance.
Both standard and streaming TTS endpoints support voice selection. The request body format is identical across both.
Standard
Streaming
Instant mode is enabled by default for streaming endpoints. This mode requires a voice to be specified. If you omit the voice, the request will return an error.
Resources
See the Voice Design Guide for how to design and create custom voice.
Create a voice clone from a live recording or an audio file.
Control speech delivery using expressive performance cues.
Generate speech that leverages previous generations as context.