For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Start buildingGet support
DocumentationAPI ReferenceChangelogDiscord
  • Introduction
    • Welcome to Hume AI
    • Getting your API keys
    • Support
    • Pricing
  • Voice
    • Overview
    • Voice design
    • Voice cloning
    • Voice management
  • Text-to-Speech (TTS)
    • Overview
    • Voice
    • Acting instructions
    • Voice conversion
    • Continuation
    • Timestamps
    • FAQ
  • Speech-to-Speech (EVI)
    • Overview
    • FAQ
  • Expression Measurement
    • Overview
    • About the science
    • FAQ
  • Integrations
    • MCP
    • Vercel AI SDK
    • LiveKit
    • Pipecat
    • Vapi
    • Twilio
    • Agora
  • Resources
    • Terms of use
    • Use case guidelines
    • Billing
    • Errors
    • Privacy
    • Status
Start buildingGet support
LogoLogo
LogoLogo
On this page
  • Using the voice conversion API
  • Audio file requirements
  • Specifying a target voice
  • Converting audio
  • Response format
  • Best practices
  • Use cases
Text-to-Speech (TTS)

Voice Conversion

Guide to converting your audio recordings into different voices using Octave's voice conversion API.
Was this page helpful?
Edit this page
Previous

Continuation Guide

Guide to maintaining coherent speech across multiple utterances and generations.
Next
Built with

Voice conversion allows you to transform existing audio recordings by applying a different voice to them. Using Octave, Hume’s speech-language model, you can convert any speech audio to sound like it was spoken by a voice from the Voice Library or one of your custom voices, while preserving the original speech patterns, timing, and emotional expression.

Use the voice conversion playground to preview how your audio will sound with different voices:

Voice Conversion Playground

Upload audio files and preview how they sound with different voices from the Voice Library or your custom voices.

Using the voice conversion API

The voice conversion API accepts an audio file and a target voice, then returns the converted audio with the specified voice applied. You can use any voice from the Voice Library or your custom voices.

Audio file requirements

When uploading audio files for voice conversion, follow these guidelines:

  • Format: Supported formats include MP3, WAV, M4A, and OGG
  • Duration: Audio files should be at least 12 seconds long and less than 3 minutes in duration
  • Quality: For best results, use clear audio with minimal background noise
  • Content: Input audio should contain human speech
  • Sample rate: 44.1kHz is recommended

Upload only audio files for which you have the necessary rights or consent to convert. Users must comply with Hume’s Terms of Use, Ethical Guidelines, Privacy Policy, and applicable laws.

Specifying a target voice

You can specify the target voice by name or id. When using name, include a provider to indicate whether you’re using a voice from the Voice Library (HUME_AI) or a custom voice (CUSTOM_VOICE).

By ID
By Name
Specify either a custom voice or one from Hume's Voice Library by ID
1{
2 "voice": {
3 "id": "f898a92e-685f-43fa-985b-a46920f0650b",
4 "provider": "HUME_AI"
5 }
6}

Get voice IDs and names from /v0/tts/voices or from the Platform’s Voice Library page.

Converting audio

The following examples demonstrate how to convert an audio file using the voice conversion API. The endpoint accepts a multipart form request with the audio file and voice specification.

1curl --location 'https://api.hume.ai/v0/tts/voice_conversion/file' \
2 -H "X-Hume-Api-Key: YOUR_HUME_API_KEY" \
3 --output hume-voice-conversion-response.wav \
4 -F 'audio=@path/to/your/audio.wav' \
5 -F 'voice[name]=Inspiring Man' \
6 -F 'voice[provider]=HUME_AI'

Response format

The voice conversion API returns the converted audio file in the same format as the input, or in a format you specify. The response includes the audio data that you can save to a file or stream directly to your application.

Best practices

  • Use clear audio: Voice conversion works best with high-quality audio recordings that have minimal background noise and clear speech
  • Test with short clips first: Start with shorter audio files (12-30 seconds) to verify the conversion quality before processing longer files

Use cases

Voice conversion is useful for a variety of applications:

  • Vocal instruction: instead of giving the model written instruction, you can just pronounce the sentence exactly as it needs to be said
  • Voice consistency: Standardize audio recordings to use a consistent voice across different speakers
  • Creative projects: Experiment with different voices for audio content, podcasts, or narration