Voice Design

A guide to designing expressive, natural-sounding voices using Octave, Hume’s speech-language model.

Octave enables you to design custom voices using intuitive, descriptive prompts. This guide explains how voice design works, shares best practices for writing effective prompts, and demonstrates how to create reusable voices in the Platform UI and API.

How voice design works

Designing a voice with Octave involves guiding the model with both what kind of voice to generate and what that voice should say. These two inputs work together to produce expressive, character-consistent speech:

  1. Voice prompt (description): A natural-language prompt describing how the speaker should sound. This includes tone, personality, emotion, and context. The prompt sets the foundation for the voice’s identity.
  2. Input text: A sample line that fits naturally with the character’s voice and identity. It gives the model a reference for delivery—helping it match tone, pacing, and emotional nuance to the prompt.

Octave uses both inputs holistically. It doesn’t treat the prompt as a set of isolated traits—it interprets it in full context, just as a human would when imagining a speaker. The model then generates speech that reflects not just the words, but the personality behind them.

This allows for a wide range of voices: warm and professional, anxious and fast-paced, playful and sarcastic—even when the text stays the same. You can iterate quickly: revise your prompt, try alternate lines of text, and fine-tune the result by pairing tone, identity, and delivery.

In the next section, we’ll explore practical techniques for crafting clear, expressive voice prompts that lead to more natural, accurate results.

Crafting voice prompts

Octave understands language in context. The more clearly you describe who’s speaking and how they should sound, the more naturally the model will bring your voice to life.

  1. Character and setting: Octave produces more expressive, natural speech when it understands:

    • Voice identity: personality, tone, emotional quality
    • How they speak: pace, clarity, intensity
    • What context they’re in: setting, role, or intent
  2. Voice profile: When writing a prompt, consider including details like:

    • Tone: “serious”, “playful”, “melancholic”
    • Speaking style: “clear”, “fast-paced”, “informal”
    • Emotion or attitude: “cheerful”, “anxious”, “skeptical”
  3. Formatting: Use standard formatting conventions to help Octave interpret your input clearly. This improves how it handles phrasing, structure, and delivery cues:

    • Use standard punctuation to support your intended phrasing, structure, and tone.
    • Avoid non-speech markup or symbols, such as emojis, HTML tags, or Markdown formatting.
    • Keep formatting clean and readable, reflecting how the sentence would be spoken aloud.

Here are a few examples created by the Hume team that show how a well-crafted prompt and natural input text combine to produce distinct, expressive voices. You can preview each one in the Voice Library.

Preview this voice in the Voice Library or try it out in the TTS Playground

Voice ID
ee96fb5f-ec1a-4f41-a9ba-6d119e64c8fd
Voice prompt
"The speaker has a reflective, Black American voice, reminiscent
of a Harlem storyteller sharing nostalgic memories with a tone of
resilience and quiet strength."
Text
"Well I remember back in the day, we'd sit and talk about dreams.
You know... big dreams... uh... and even bigger realities. Um...
It wasn't always easy, but those dreams kept us going."

Preview this voice in the Voice Library or try it out in the TTS Playground

Voice ID
9e068547-5ba4-4c8e-8e03-69282a008f04
Voice prompt
"The speaker has a confident, charismatic tone, like a tech guru
explaining a new technology with infectious enthusiasm, and the
excitement of a viral storyteller."
Text
"Hume's AI voice generator is insane, because you can tell it
exactly how you want the voice to sound."

Preview this voice in the Voice Library or try it out in the TTS Playground

Voice ID
5bb7de05-c8fe-426a-8fcc-ba4fc4ce9f9c
Voice prompt
"Previously Blunt Female Voice. An Asian American woman
speaking with a lot of sass and personality."
Text
"Uh... yeah, I'm always awake—I literally don't sleep."

Preview this voice in the Voice Library or try it out in the TTS Playground

Voice ID
d8ab67c6-953d-4bd8-9370-8fa53a0f1453
Voice prompt
"Previously Charming Cowboy. A grizzled old cowboy with a folksy
Texan drawl Southern accent, speaking in a charismatic tone with
a deep but relaxed vibe."
Text
"The real kicker? You can spin up any dang voice you dream of
with just a prompt. 'Cause, y’see, it’s one of those LLM things
— and that’s just what they do."

Preview this voice in the Voice Library or try it out in the TTS Playground

Voice ID
96ee3964-5f3f-4a5a-be09-393e833aaf0e
Voice prompt
"A black woman with a confident, resonant voice with a subtle
Louisiana accent, reflecting a blend of cultural pride and
professional ambition."
Text
"From the bayou to the boardroom, I carry my heritage with
pride and a fierce determination to succeed."

Create a custom voice

Once you’ve created a generation that captures the voice you want, you can save it as a custom voice. This stores both the speech and the prompt that shaped it, so the model can reliably reproduce the same vocal identity in future requests.

You can create and save voices using:

  • The Platform UI – great for interactive exploration and refinement.
  • The API – ideal for programmatic use cases, such as letting end users design and save voices in your application.

Using the UI

This section walks through the voice creation flow in the Platform UI, from generating samples to saving your voice.

1

Go to the Platform’s Voice Library to view available voices and create your own.

Voice library page
2

Click “Create voice”

Click the Create voice button in the top right to open the voice design interface.

Voice design modal empty
3

Input Text and Voice Prompt

Enter your Text and Voice prompt.

Use Enhance to improve your inputs, or Auto-generate to get help crafting them.

Voice design modal filled
4

Generate samples

Click Generate samples to create three voice candidates based on your inputs.

Preview each and choose your favorite. You can keep generating new sets of samples until you find one you like.

Voice design modal samples
5

Name your voice

Enter a Name for your voice.

Optionally provide a Description for your reference.

Voice design modal named voice
6

Save your voice

Click Save voice to complete the creation flow. You’ll be redirected to the My Voices tab.

My voices page

Using the API

This section walks through the voice creation API flow: generating speech in a new voice and saving that generation as a reusable voice.

1

Generate a voice

Generate a new voice by making a POST request to /v0/tts.

In the utterances field, include both a description and text.

You can optionally request multiple generations to explore variations.

1curl https://api.hume.ai/v0/tts \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [{
5 "text": "Hume'"'"'s AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
6 "description": "The speaker has a confident, charismatic tone, like a tech guru explaining a new technology with infectious enthusiasm, and the excitement of a viral storyteller."
7 }],
8 "num_generations": 1
9 }'

The response includes one or more generations, each with a generation_id, audio, and additional metadata.

Listen to each generation and choose the one you want to save. Use its generation_id in the next step to save it as a voice.

JSON
1{
2 "request_id": "553ce0cb-a958-48ce-befc-88fca6310a028583094",
3 "generations": [
4 {
5 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
6 "duration": 5.88,
7 "file_size": 94464,
8 "encoding": {
9 "format": "mp3",
10 "sample_rate": 48000
11 },
12 "audio": "//uUxAAAEM1rHUewycq...",
13 "snippets": [
14 [
15 {
16 "id": "9295d4ab-3c1a-489f-9f12-c81ea6c8585c",
17 "text": "Hume's AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
18 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
19 "utterance_index": 0,
20 "audio_format": "mp3",
21 "transcribed_text": "Hume's AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
22 "audio": "//uUxAAAAAAAAAAAAAA..."
23 }
24 ]
25 ]
26 }
27 ]
28}
2

Save the voice

Make a POST request to /v0/tts/voices to save a generation as a reusable voice.

Include the generation_id and a name for the new voice.

1curl https://api.hume.ai/v0/tts/voices \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
5 "name": "My Custom Voice"
6 }'

The response includes the name and id of your saved voice.

JSON
1{
2 "name": "My Custom Voice",
3 "id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
4 "provider": "CUSTOM_VOICE"
5}

What’s next

You can use your custom voices across Hume products that support speech synthesis. Reference them by name or ID in TTS requests, or use them in EVI by specifying the voice in your configuration.

Use the playgrounds to preview how your saved voice sounds in different scenarios:

If you’re building an interface for others to create voices, you may also want to offer basic voice management—such as listing saved voices or deleting those no longer needed:

See guides below for details on how to use your voice in your project or integration.