TTS Python Quickstart Guide
Step-by-step guide for integrating the TTS API using Hume’s Python SDK.
This guide shows how to use Hume’s Text-to-Speech API using Hume’s Python SDK. It assumes you are running on a system with access to speakers for playback and with the PortAudio
library installed.
It demonstrates:
- Using an existing voice.
- Create a new voice via a prompt.
- Continuing from previous speech.
- Providing “acting instructions” to modulate the voice.
- Generating speech from live input.
The complete code for the example in this guide is available on GitHub.
Environment Setup
Set up a Python virtual environment and install the required packages. We recommend uv
or Poetry for managing your environment and dependencies, but you can also use venv
and pip
.
uv
poetry
venv
Authenticating the HumeClient
You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.
This example uses python-dotenv. Place your API key in a .env
file at the root of your project.
First, use your API key to instantiate the AsyncHumeClient
, importing as necessary.
Using a pre-existing voice
Use this method if you want to synthesize speech with a high-quality voice from Hume’s Voice Library, or specify provider='CUSTOM_VOICE'
to use a voice that you created previously via the Hume Platform or the API.
Create a new voice via a prompt
The Voice Creation API allows you to create custom voices programatically, via prompting. There are two steps to creating a voice:
- Send a description of the voice, along with sample text that is characteristic of the voice, to the standard
tts
endpoint without specifying a voice. - Take the
generation_id
from one of the resulting audio samples, and use it to create a new voice with the Voice Creation API.
Continuing previous speech
You can make new speech sound like a natural continuation from previous speech by providing the generation_id
of the previous audio in the context
parameter. This helps maintain consistency in tone, pacing, and emotional state.
Additionally, you can provide “acting instructions” using the description
field alongside an existing voice. When you specify both a voice and a description, the description
modulates the voice’s tone, emotion, and delivery style while maintaining the core voice characteristics.
Generating speech from live input
If you need to generate speech from text that is being produced in real-time, you can use the bidirectional streaming WebSocket endpoint at /v0/tts/stream/input
.
Support for connecting to the WebSocket directly is coming soon to the Python SDK, for the time being, this example shows how you can implement a simple WebSocket client yourself.
First, create a streaming.py
file with the StreamingTtsClient
:
You can use the client as follows: