Text-to-speech API FAQ
We’ve compiled a list of frequently asked questions from our developer community. If you don’t see your question here, join the discussion on our Discord.
What languages are supported by TTS?
Hume’s OCTAVE text-to-speech currently supports English and Spanish, but more languages are coming soon.
What accents are supported by TTS?
Right now the accents that work best are standard English-language accents - e.g. all American variations (like African American, New York, Boston, Texan, etc), Indian, Canadian, Australian, Scottish, New Zealand, and South African. Support for more foreign language accents is coming soon!
To design a voice with an accent, just describe that accent in the voice prompt and then use the generate feature in the voice creation flow of the Platform UI to create some text that matches the description. You can also try generating foreign language accents by creating a voice using non-English text and then applying that voice to English text.
How is Hume’s speech-language model different from other text-to-speech models?
Hume’s TTS product is based on OCTAVE, the first LLM for text-to-speech. OCTAVE stands out from traditional TTS models because its LLM foundation allows it to understand the meaning of the words it’s saying at a much deeper level.
This fundamental difference not only enables OCTAVE to sound more natural, but also gives it its ability to generate voices based on descriptive prompts, or change its output based on instructions - it understands what these descriptions mean. This translates to unprecedented creative control through natural language instructions, enabling voice generation that responds intelligently to context, emotion, and nuanced descriptions rather than just converting text to phonemes.
How is Hume’s OCTAVE TTS different from ElevenLabs, or other text-to-speech providers?
ElevenLabs, Speechify, PlayHT, and other text-to-speech providers use more traditional text-to-speech models that focus more on the pronunciation of the characters being read but not their meaning.
By contrast, Hume’s OCTAVE TTS with its LLM backbone can understand context, emotion, and descriptions, similar to how ChatGPT can understand context and knowledge about the world as it answers your questions.
What’s the best way to prompt Hume’s model for voice design?
See our prompting guide for tips on prompting OCTAVE.
What are acting instructions? How do they help my voice output?
Acting instructions are a way to provide additional guidance in the voice delivery of Hume’s TTS, after you’ve already created the voice. For example, you can add stage directions to “slow down”, “act angry”, or “speak in an extremely exaggerated prosodic tone”!
What happens if I exceed my usage limit?
If you have the Creator, Pro, Scale, or Business plans, you can pay a set amount per 1000 additional characters. If you are on Free or Starter, additional usage will require an upgrade.
Can I generate audiobooks or podcasts with Hume’s OCTAVE TTS?
Using Hume’s Projects interface, you can create audiobooks or podcasts. You can even set different texts to be read by different voices, to create a fully immersive experience.
Can I adjust the speed, pitch, or emotions of a voice output?
You can specify pitch and emotions in a prompt when designing your AI voice and you can specify pitch and emotions in acting instructions when using an existing voice.
Can I use your TTS voices for commercial projects (e.g., YouTube, games, voice assistants)?
Yes, paid users can use the services for commercial purposes. free users are limited to non-commercial use only.
Are there any restrictions on using TTS-generated voices in monetized content?
No, but your use must comply with Hume’s prohibited use policy. There are no specific restrictions on monetization itself, but all use must adhere to their rules against harmful/illegal applications.
Do I own the rights to my custom voices?
Yes, you retain rights to your output, but you grant Hume a perpetual license to use your voice recordings and voice models to provide/improve services and develop new products. Hume won’t share your AI voices or speech samples without permission.
Is attribution required when using TTS output?*
No, no attribution is required - but a shoutout to Hume is always appreciated!
Can I clone my voice or the voices of others?
Voice cloning isn’t supported at launch, but it will be supported soon!