Hume’s text-to-speech (TTS) API lets you specify which voice to use when synthesizing speech. You can use a custom voice that you have saved or select one from Hume’s Voice Library.
This guide explains how to specify a voice across all of Hume’s TTS endpoints.
To learn how to create or manage voices, see the Voice Design Guide, Voice Cloning Guide, and Voice Management Guide.
You can specify a voice by name or id. If you use name, include a provider (defaults to CUSTOM_VOICE). To reference a voice from Hume’s Voice Library by name, set the provider to HUME_AI.
Get voice IDs and names from /v0/tts/voices or from the Platform’s Voice Library page.
To set a voice, include the voice field in the first utterance of your request. That voice is used for all following utterances unless you override it later.
Voice specification works the same across streaming and non-streaming endpoints. The code snippets below demonstrate how to set the voice in your TTS request.
Octave 1 voices are supported for both Octave 1 and Octave 2 requests, while Octave 2 voices are only supported for Octave 2 requests. If you specify an Octave 2 voice for an Octave 1 request, it will return an error.
Learn how to design and create custom voices.
Create a voice clone from a live recording or an audio file.
Control speech delivery using expressive performance cues.
Generate speech that leverages previous generations as context.