Voice Conversion
Voice conversion allows you to transform existing audio recordings by applying a different voice to them. Using Octave, Hume’s speech-language model, you can convert any speech audio to sound like it was spoken by a voice from the Voice Library or one of your custom voices, while preserving the original speech patterns, timing, and emotional expression.
Use the voice conversion playground to preview how your audio will sound with different voices:
Using the voice conversion API
The voice conversion API accepts an audio file and a target voice, then returns the converted audio with the specified voice applied. You can use any voice from the Voice Library or your custom voices.
Audio file requirements
When uploading audio files for voice conversion, follow these guidelines:
- Format: Supported formats include
MP3,WAV - Duration: Audio files should be at least 12 seconds long
- Quality: For best results, use clear audio with minimal background noise
- Content: Input audio should contain human speech
- Sample rate: 44.1kHz is recommended
Upload only audio files for which you have the necessary rights or consent to convert. Users must comply with Hume’s Terms of Use, Ethical Guidelines, Privacy Policy, and applicable laws.
Specifying a target voice
You can specify the target voice by name or id. When using name, include a provider to indicate whether you’re using a voice from the Voice Library (HUME_AI) or a custom voice (CUSTOM_VOICE).
By ID
By Name
Get voice IDs and names from /v0/tts/voices or from the Platform’s Voice Library page.
Converting audio
The following examples demonstrate how to convert an audio file using the voice conversion API. The endpoint accepts a multipart form request with the audio file and voice specification.
Response format
The voice conversion API returns the converted audio file in the same format as the input, or in a format you specify. The response includes the audio data that you can save to a file or stream directly to your application.
Best practices
- Use clear audio: Voice conversion works best with high-quality audio recordings that have minimal background noise and clear speech
- Test with short clips first: Start with shorter audio files (12-30 seconds) to verify the conversion quality before processing longer files
Use cases
Voice conversion is useful for a variety of applications:
- Vocal instruction: instead of giving the model written instruction, you can just pronounce the sentence exactly as it needs to be said
- Voice consistency: Standardize audio recordings to use a consistent voice across different speakers
- Creative projects: Experiment with different voices for audio content, podcasts, or narration

