Voice Guide | Hume API

Hume’s text-to-speech (TTS) API lets you specify which voice to use when synthesizing speech. You can use a custom voice that you have saved or select one from Hume’s Voice Library. If you omit the voice field, the model will generate one dynamically based on the input text and optional description.

This guide explains how to specify a voice in both standard and streaming TTS requests.

To learn how to create or manage voices, see the Voice Design Guide and Voice Management Guide.

Voice reference options

You can specify a voice in your request using either its id or name. Each voice belongs to a provider, which indicates the source of the voice and who can access it.

The provider field accepts the following values:

Provider	Description
`CUSTOM_VOICE`	Select from designed or cloned voices you’ve saved to your account. These voices are private.
`HUME_AI`	Select from Hume’s shared Voice Library of predesigned voices. These voices are public.

If you omit the provider field, it defaults to CUSTOM_VOICE. To use a voice from the Voice Library, you must explicitly set the provider to HUME_AI.

CUSTOM_VOICE

HUME_AI

Specify a saved voice by name

1 {
2   "voice": {
3     "name": "My Custom Voice",
4     // "provider": "CUSTOM_VOICE" (optional)
5   }
6 }

Specify a saved voice by ID

1 {
2   "voice": {
3     "id": "795c949a-1510-4a80-9646-7d0863b023ab",
4     // "provider": "CUSTOM_VOICE" (optional)
5   }
6 }

You can find voice IDs and names using the List Voices endpoint or in the My Voices section of the Platform UI.

Specify a voice in your request

To specify a voice for speech synthesis, include the voice field in the first utterance of your request. That voice will be used for all subsequent utterances unless you override it in a later utterance.

Both standard and streaming TTS endpoints support voice selection. The request body format is identical across both.

Standard

Streaming

1 curl https://api.hume.ai/v0/tts \
2   -H "X-Hume-Api-Key: $HUME_API_KEY" \
3   --json '{
4   "utterances": [
5     {
6       "text": "Beauty is no quality in things themselves: It exists merely in the mind which contemplates them.",
7       "voice": {
8         "id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
9         "provider": "HUME_AI"
10       }
11     }
12   ]
13 }'

Instant mode is enabled by default for streaming endpoints. This mode requires a voice to be specified. If you omit the voice, the request will return an error.

Resources

Voice Design Guide

See the Voice Design Guide for how to design and create custom voice.

Voice Cloning Guide

Create a voice clone from a live recording or an audio file.

Acting Instructions

Control speech delivery using expressive performance cues.

Continuation Guide

Generate speech that leverages previous generations as context.