Voice conversion allows you to transform existing audio recordings by applying a different voice to them. Using Octave, Hume’s speech-language model, you can convert any speech audio to sound like it was spoken by a voice from the Voice Library or one of your custom voices, while preserving the original speech patterns, timing, and emotional expression.
Use the voice conversion playground to preview how your audio will sound with different voices:
The voice conversion API accepts an audio file and a target voice, then returns the converted audio with the specified voice applied. You can use any voice from the Voice Library or your custom voices.
When uploading audio files for voice conversion, follow these guidelines:
MP3, WAV, M4A, and OGGUpload only audio files for which you have the necessary rights or consent to convert. Users must comply with Hume’s Terms of Use, Ethical Guidelines, Privacy Policy, and applicable laws.
You can specify the target voice by name or id. When using name, include a provider to indicate whether you’re using a voice from the Voice Library (HUME_AI) or a custom voice (CUSTOM_VOICE).
Get voice IDs and names from /v0/tts/voices or from the Platform’s Voice Library page.
The following examples demonstrate how to convert an audio file using the voice conversion API. The endpoint accepts a multipart form request with the audio file and voice specification.
The voice conversion API returns the converted audio file in the same format as the input, or in a format you specify. The response includes the audio data that you can save to a file or stream directly to your application.
Voice conversion is useful for a variety of applications: