Text-to-speech API FAQ

We’ve compiled a list of frequently asked questions from our developer community. If you don’t see your question here, join the discussion on our Discord.

Hume’s OCTAVE text-to-speech currently supports English and Spanish, but more languages are coming soon.

Right now the accents that work best are standard English-language accents - e.g. all American variations (like African American, New York, Boston, Texan, etc), Indian, Canadian, Australian, Scottish, New Zealand, and South African. Support for more foreign language accents is coming soon!

To design a voice with an accent, just describe that accent in the voice prompt and then use the generate feature in the voice creation flow of the Platform UI to create some text that matches the description. You can also try generating foreign language accents by creating a voice using non-English text and then applying that voice to English text.

Hume’s TTS product is based on OCTAVE, the first LLM for text-to-speech. OCTAVE stands out from traditional TTS models because its LLM foundation allows it to understand the meaning of the words it’s saying at a much deeper level.

This fundamental difference not only enables OCTAVE to sound more natural, but also gives it its ability to generate voices based on descriptive prompts, or change its output based on instructions - it understands what these descriptions mean. This translates to unprecedented creative control through natural language instructions, enabling voice generation that responds intelligently to context, emotion, and nuanced descriptions rather than just converting text to phonemes.

ElevenLabs, Speechify, PlayHT, and other text-to-speech providers use more traditional text-to-speech models that focus more on the pronunciation of the characters being read but not their meaning.

By contrast, Hume’s OCTAVE TTS with its LLM backbone can understand context, emotion, and descriptions, similar to how ChatGPT can understand context and knowledge about the world as it answers your questions.

See our prompting guide for tips on prompting OCTAVE.

Acting instructions are a way to provide additional guidance in the voice delivery of Hume’s TTS, after you’ve already created the voice. For example, you can add stage directions to “slow down”, “act angry”, or “speak in an extremely exaggerated prosodic tone”!

If you have the Creator, Pro, Scale, or Business plans, you can pay a set amount per 1000 additional characters. If you are on Free or Starter, additional usage will require an upgrade.

Using Hume’s Projects interface, you can create audiobooks or podcasts. You can even set different texts to be read by different voices, to create a fully immersive experience.

You can specify pitch and emotions in a prompt when designing your AI voice and you can specify pitch and emotions in acting instructions when using an existing voice.

Yes, paid users can use the services for commercial purposes. free users are limited to non-commercial use only.

No, but your use must comply with Hume’s prohibited use policy. There are no specific restrictions on monetization itself, but all use must adhere to their rules against harmful/illegal applications.

Yes, you retain rights to your output, but you grant Hume a perpetual license to use your voice recordings and voice models to provide/improve services and develop new products. Hume won’t share your AI voices or speech samples without permission.

No, no attribution is required - but a shoutout to Hume is always appreciated!

Voice cloning isn’t supported at launch, but it will be supported soon!


Built with