Voice Design

A guide to designing expressive, natural-sounding voices using Octave, Hume’s speech-language model.

Octave enables you to design custom voices using intuitive, descriptive prompts. This guide explains how voice design works, shares best practices for writing effective prompts, and demonstrates how to create reusable voices in the Platform UI and API.

How voice design works

Designing a voice with Octave involves guiding the model with both what kind of voice to generate and what that voice should say. These two inputs work together to produce expressive, character-consistent speech:

  1. Voice prompt (description): A natural-language prompt describing how the speaker should sound. This includes tone, personality, emotion, and context. The prompt sets the foundation for the voice’s identity.
  2. Input text: A sample line that fits naturally with the character’s voice and identity. It gives the model a reference for delivery—helping it match tone, pacing, and emotional nuance to the prompt.

Octave uses both inputs holistically. It doesn’t treat the prompt as a set of isolated traits—it interprets it in full context, just as a human would when imagining a speaker. The model then generates speech that reflects not just the words, but the personality behind them.

This allows for a wide range of voices: warm and professional, anxious and fast-paced, playful and sarcastic—even when the text stays the same. You can iterate quickly: revise your prompt, try alternate lines of text, and fine-tune the result by pairing tone, identity, and delivery.

In the next section, we’ll explore practical techniques for crafting clear, expressive voice prompts that lead to more natural, accurate results.

Crafting voice prompts

Octave understands language in context. The more clearly you describe who’s speaking and how they should sound, the more naturally the model will bring your voice to life.

  1. Character and setting: Octave produces more expressive, natural speech when it understands:

    • Voice identity: personality, tone, emotional quality
    • How they speak: pace, clarity, intensity
    • What context they’re in: setting, role, or intent
  2. Voice profile: When writing a prompt, consider including details like:

    • Tone: “serious”, “playful”, “melancholic”
    • Speaking style: “clear”, “fast-paced”, “informal”
    • Emotion or attitude: “cheerful”, “anxious”, “skeptical”
  3. Formatting: Use standard formatting conventions to help Octave interpret your input clearly. This improves how it handles phrasing, structure, and delivery cues:

    • Use standard punctuation to support your intended phrasing, structure, and tone.
    • Avoid non-speech markup or symbols, such as emojis, HTML tags, or Markdown formatting.
    • Keep formatting clean and readable, reflecting how the sentence would be spoken aloud.

Below are a few sample voice prompts crafted by the Hume team.

Valley Girl
The speaker has an expressive, totally disgusted Valley Girl voice, with
a heavy Californian accent, delivering each word with maximum disdain,
like a lifestyle influencer reacting to a truly tragic fashion faux pas.

Create a custom voice

Once you’ve created a generation that captures the voice you want, you can save it as a custom voice. This stores both the speech and the prompt that shaped it, so the model can reliably reproduce the same vocal identity in future requests.

You can create and save voices using:

  • The Platform UI – great for interactive exploration and refinement.
  • The API – ideal for programmatic use cases, such as letting end users design and save voices in your application.

Using the UI

This section walks through the voice creation flow in the Platform UI, from generating samples to saving your voice.

1

Visit the Platform’s Voice design page.

Voice design page
2

Input Text and Voice Prompt

Enter your Text and Voice prompt.

Use Enhance to improve your inputs, or Auto-generate to get help crafting them.

Voice design text and prompt
3

Generate samples

Click Generate samples to create three voice candidates based on your inputs.

Preview each and choose your favorite. You can keep generating new sets of samples until you find one you like.

Voice design samples
4

Name your voice

Enter a Name for your voice.

Optionally provide a Description for your reference.

Voice design modal named voice
5

Save your voice

Click Save voice to complete the creation flow. You’ll be redirected to the My Voices tab.

My voices page

Using the API

This section walks through the voice creation API flow: generating speech in a new voice and saving that generation as a reusable voice.

1

Generate a voice

Generate a new voice by making a POST request to /v0/tts.

In the utterances field, include both a description and text.

You can optionally request multiple generations to explore variations.

1curl https://api.hume.ai/v0/tts \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [{
5 "text": "Hume'"'"'s AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
6 "description": "The speaker has a confident, charismatic tone, like a tech guru explaining a new technology with infectious enthusiasm, and the excitement of a viral storyteller."
7 }],
8 "num_generations": 1
9 }'

The response includes one or more generations, each with a generation_id, audio, and additional metadata.

Listen to each generation and choose the one you want to save. Use its generation_id in the next step to save it as a voice.

JSON
1{
2 "request_id": "553ce0cb-a958-48ce-befc-88fca6310a028583094",
3 "generations": [
4 {
5 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
6 "duration": 5.88,
7 "file_size": 94464,
8 "encoding": {
9 "format": "mp3",
10 "sample_rate": 48000
11 },
12 "audio": "//uUxAAAEM1rHUewycq...",
13 "snippets": [
14 [
15 {
16 "id": "9295d4ab-3c1a-489f-9f12-c81ea6c8585c",
17 "text": "Hume's AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
18 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
19 "utterance_index": 0,
20 "audio_format": "mp3",
21 "transcribed_text": "Hume's AI voice generator is insane, because you can tell it exactly how you want the voice to sound.",
22 "audio": "//uUxAAAAAAAAAAAAAA..."
23 }
24 ]
25 ]
26 }
27 ]
28}
2

Save the voice

Make a POST request to /v0/tts/voices to save a generation as a reusable voice.

Include the generation_id and a name for the new voice.

1curl https://api.hume.ai/v0/tts/voices \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "generation_id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
5 "name": "My Custom Voice"
6 }'

The response includes the name and id of your saved voice.

JSON
1{
2 "name": "My Custom Voice",
3 "id": "9e068547-5ba4-4c8e-8e03-69282a008f04",
4 "provider": "CUSTOM_VOICE"
5}

What’s next

You can use your custom voices across Hume products that support speech synthesis. Reference them by name or ID in TTS requests, or use them in EVI by specifying the voice in your configuration.

Use the playgrounds to preview how your saved voice sounds in different scenarios:

If you’re building an interface for others to create voices, you may also want to offer basic voice management—such as listing saved voices or deleting those no longer needed:

See guides below for details on how to use your voice in your project or integration.