Voice Conversion | Hume API

Voice conversion allows you to transform existing audio recordings by applying a different voice to them. Using Octave, Hume’s speech-language model, you can convert any speech audio to sound like it was spoken by a voice from the Voice Library or one of your custom voices, while preserving the original speech patterns, timing, and emotional expression.

Use the voice conversion playground to preview how your audio will sound with different voices:

Voice Conversion Playground

Upload audio files and preview how they sound with different voices from the Voice Library or your custom voices.

Using the voice conversion API

The voice conversion API accepts an audio file and a target voice, then returns the converted audio with the specified voice applied. You can use any voice from the Voice Library or your custom voices.

Audio file requirements

When uploading audio files for voice conversion, follow these guidelines:

Format: Supported formats include MP3, WAV
Duration: Audio files should be at least 12 seconds long
Quality: For best results, use clear audio with minimal background noise
Content: Input audio should contain human speech
Sample rate: 44.1kHz is recommended

Upload only audio files for which you have the necessary rights or consent to convert. Users must comply with Hume’s Terms of Use, Ethical Guidelines, Privacy Policy, and applicable laws.

Specifying a target voice

You can specify the target voice by name or id. When using name, include a provider to indicate whether you’re using a voice from the Voice Library (HUME_AI) or a custom voice (CUSTOM_VOICE).

By ID

By Name

Specify either a custom voice or one from Hume's Voice Library by ID

1 {
2   "voice": {
3     "id": "f898a92e-685f-43fa-985b-a46920f0650b",
4     "provider": "HUME_AI"
5   }
6 }

Get voice IDs and names from /v0/tts/voices or from the Platform’s Voice Library page.

Converting audio

The following examples demonstrate how to convert an audio file using the voice conversion API. The endpoint accepts a multipart form request with the audio file and voice specification.

1 curl --location 'https://api.hume.ai/v0/tts/voice_conversion/file' \
2   -H "X-Hume-Api-Key: YOUR_HUME_API_KEY" \
3   --output hume-voice-conversion-response.wav \
4   -F 'audio=@path/to/your/audio.wav' \
5   -F 'voice[name]=Inspiring Man' \
6   -F 'voice[provider]=HUME_AI'

Response format

The voice conversion API returns the converted audio file in the same format as the input, or in a format you specify. The response includes the audio data that you can save to a file or stream directly to your application.

Best practices

Use clear audio: Voice conversion works best with high-quality audio recordings that have minimal background noise and clear speech
Test with short clips first: Start with shorter audio files (12-30 seconds) to verify the conversion quality before processing longer files

Use cases

Voice conversion is useful for a variety of applications:

Vocal instruction: instead of giving the model written instruction, you can just pronounce the sentence exactly as it needs to be said
Voice consistency: Standardize audio recordings to use a consistent voice across different speakers
Creative projects: Experiment with different voices for audio content, podcasts, or narration