For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Start buildingGet support
DocumentationAPI ReferenceChangelogDiscord
  • Introduction
    • Welcome to Hume AI
    • Getting your API keys
    • Support
    • Pricing
  • Voice
    • Overview
    • Voice design
    • Voice cloning
    • Voice management
  • Text-to-Speech (TTS)
    • Overview
    • Voice
    • Acting instructions
    • Voice conversion
    • Continuation
    • Timestamps
    • FAQ
  • Speech-to-Speech (EVI)
    • Overview
    • FAQ
  • Expression Measurement
    • Overview
    • About the science
    • FAQ
  • Integrations
    • MCP
    • Vercel AI SDK
    • LiveKit
    • Pipecat
    • Vapi
    • Twilio
    • Agora
  • Resources
    • Terms of use
    • Use case guidelines
    • Billing
    • Errors
    • Privacy
    • Status
Start buildingGet support
LogoLogo
LogoLogo
On this page
  • Voice design
  • Voice cloning
  • Voice management
  • Voice integration
Voice

Voice

Utilize Hume’s Voice Library or design custom voices tailored to your application.

Was this page helpful?
Edit this page
Previous

Voice Design

A guide to designing expressive, natural-sounding voices using Octave, Hume’s speech-language model.

Next
Built with

Octave 2 (preview) and EVI 4-mini are live! Expanded language support and lower latency for faster, more natural responses. Learn more.

Voice is foundational to any system that generates speech. It sets the tone, style, and pacing for how content is delivered. Whether it’s the friendly demeanor of a virtual assistant, the immersive narration of an audiobook, or the distinct personality of a character, the chosen voice shapes the listener’s experience.

Octave is Hume’s speech-language model for generating expressive speech with LLM intelligence. Unlike conventional TTS systems that rely on acoustic templates or phoneme-based pipelines, Octave understands what the text means and how it should be spoken.

Voices, whether selected from the Voice Library or created using prompts, are used in Hume’s two voice products: Empathic Voice Interface (EVI) and Text-to-Speech (TTS). If you’re getting started with either, selecting or designing a voice is often your first step.

Empathic Voice Interface (EVI)

Real-time, emotionally intelligent voice AI for conversational interfaces.

Text-to-Speech (TTS)

Synthesize expressive speech from text using Octave.

Try our free voice design demo to hear how Octave generates expressive speech from natural language descriptions — no signup or code required.

Voice design

Octave deeply models language and speech patterns to generate new voices from natural language descriptions. These prompts can specify tone, emotion, accent, and other stylistic traits with a high degree of control.

The Voice Library offers over 100 voices crafted by Hume with Octave, each reflecting a unique style, personality, or accent. These voices can be used directly or serve as inspiration for creating your own.

Voice Design Guide

See the Voice Design Guide for how to design and create a custom voice.

Voice Library

Visit the Voice Library to explore Hume’s predesigned voices.

Voice cloning

While Octave supports voice design from natural language descriptions, it can also create voices from audio samples, reflecting the speaker’s tone, accent, cadence, and vocal identity.

Voice Cloning Guide

Create a voice clone from a live recording or an audio file.

Voice management

Manage your custom voices using the Platform UI or programmatically through the API. Use the guide below that best matches your preferred workflow.

Voice Management Guide

View, rename, and delete custom voices via the Platform or API.

Voice integration

Voices you design or select from the Voice Library can be used across all Hume products that support speech synthesis. The guides below explain how to configure a voice for each API.

Empathic Voice Interface (EVI)

Configure EVI to use a specified voice.

Text-to-Speech (TTS)

Specify a voice in your TTS requests.