TTS Python Quickstart Guide

This guide shows how to use Hume’s Text-to-Speech API using Hume’s Python SDK. It assumes you are running on a system with access to speakers for playback and with the PortAudio library installed.

It demonstrates:

Using an existing voice.
Create a new voice via a prompt.
Continuing from previous speech.
Providing “acting instructions” to modulate the voice.
Generating speech from live input.

The complete code for the example in this guide is available on GitHub.

Environment Setup

Set up a Python virtual environment and install the required packages. We recommend uv or Poetry for managing your environment and dependencies, but you can also use venv and pip.

uv

poetry

venv

$ uv init
> uv add hume[microphone] python-dotenv

Authenticating the HumeClient

You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.

This example uses python-dotenv. Place your API key in a .env file at the root of your project.

.env

$ echo "HUME_API_KEY=your_api_key_here" > .env

First, use your API key to instantiate the AsyncHumeClient, importing as necessary.

1 # app.py
2 import os
3 from hume import AsyncHumeClient
4 from dotenv import load_dotenv
5 
6 load_dotenv()
7 
8 api_key = os.getenv("HUME_API_KEY")
9 if not api_key:
10     raise EnvironmentError("HUME_API_KEY not found in environment variables.")
11 
12 hume = AsyncHumeClient(api_key=api_key)

Using a pre-existing voice

Use this method if you want to synthesize speech with a high-quality voice from Hume’s Voice Library, or specify provider='CUSTOM_VOICE' to use a voice that you created previously via the Hume Platform or the API.

1 import base64
2 from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
3 from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName
4 
5 utterance = PostedUtterance(
6     text="Dogs became domesticated between 23,000 and 30,000 years ago.",
7     voice=PostedUtteranceVoiceWithName(name='Ava Song', provider='HUME_AI')
8 )
9 
10 stream = hume.tts.synthesize_json_streaming(
11     utterances=[utterance],
12     strip_headers=True,
13     version="1"
14 )
15 
16 await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)

Create a new voice via a prompt

The Voice Creation API allows you to create custom voices programatically, via prompting. There are two steps to creating a voice:

Send a description of the voice, along with sample text that is characteristic of the voice, to the standard tts endpoint without specifying a voice.
Take the generation_id from one of the resulting audio samples, and use it to create a new voice with the Voice Creation API.

1 import base64
2 import time
3 from hume.empathic_voice.chat.audio.audio_utilities import play_audio
4 from hume.tts import PostedUtterance
5 
6 result1 = await hume.tts.synthesize_json(
7     utterances=[PostedUtterance(
8         description="Crisp, upper-class British accent with impeccably articulated consonants and perfectly placed vowels. Authoritative and theatrical, as if giving a lecture.",
9         text="The science of speech. That\'s my profession; also my hobby. Happy is the man who can make a living by his hobby!"
10     )],
11     num_generations=2,
12 )
13 
14 sample_number = 1
15 for generation in result1.generations:
16     print(f'Playing option {sample_number}... সন')
17     audio_data = base64.b64decode(generation.audio)
18     await play_audio(audio_data)
19     sample_number += 1
20 
21 # Prompt user to select which voice they prefer
22 print('\nWhich voice did you prefer?')
23 print('1. First voice (generation ID:', result1.generations[0].generation_id, ') সন')
24 print('2. Second voice (generation ID:', result1.generations[1].generation_id, ') সন')
25 
26 try:
27     user_choice = input('Enter your choice (1 or 2): ').strip()
28 except EOFError:
29     user_choice = '1'
30     print('No input available, selecting option 1')
31 
32 selected_index = int(user_choice) - 1
33 
34 if selected_index not in [0, 1]:
35     raise ValueError('Invalid choice. Please select 1 or 2.')
36 
37 selected_generation_id = result1.generations[selected_index].generation_id
38 print(f'Selected voice option {selected_index + 1} (generation ID: {selected_generation_id})')
39 
40 # Save the selected voice
41 voice_name = f'higgins-{int(time.time() * 1000)}'
42 await hume.tts.voices.create(
43     name=voice_name,
44     generation_id=selected_generation_id,
45 )
46 
47 print(f'Created voice: {voice_name}')

Continuing previous speech

You can make new speech sound like a natural continuation from previous speech by providing the generation_id of the previous audio in the context parameter. This helps maintain consistency in tone, pacing, and emotional state.

Additionally, you can provide “acting instructions” using the description field alongside an existing voice. When you specify both a voice and a description, the description modulates the voice’s tone, emotion, and delivery style while maintaining the core voice characteristics.

1 import base64
2 from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
3 from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName, PostedContextWithGenerationId
4 
5 stream = hume.tts.synthesize_json_streaming(
6     utterances=[PostedUtterance(
7         voice=PostedUtteranceVoiceWithName(name=voice_name),
8         text="YOU can spot an Irishman or a Yorkshireman by his brogue. I can place any man within six miles. I can place him within two miles in London. Sometimes within two streets.",
9         description="Bragging about his abilities"
10     )],
11     context=PostedContextWithGenerationId(
12         generation_id=selected_generation_id
13     ),
14     strip_headers=True
15 )
16 
17 await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)

Generating speech from live input

If you need to generate speech from text that is being produced in real-time, you can use the bidirectional streaming WebSocket endpoint at /v0/tts/stream/input.

Support for connecting to the WebSocket directly is coming soon to the Python SDK, for the time being, this example shows how you can implement a simple WebSocket client yourself.

First, create a streaming.py file with the StreamingTtsClient:

streaming.py

1 import asyncio
2 import json
3 from typing import AsyncGenerator, Dict, Any
4 import websockets
5 from hume.tts import PublishTts, SnippetAudioChunk
6 
7 class StreamingTtsClient:
8     def __init__(self, websocket: websockets.WebSocketClientProtocol):
9         self._websocket: websockets.WebSocketClientProtocol = websocket
10         self._message_queue = asyncio.Queue()
11 
12     @classmethod
13     async def connect(cls, api_key: str) -> "StreamingTtsClient":
14         client = await websockets.connect(
15             f"wss://api.hume.ai/v0/tts/stream/input?api_key={api_key}&instant_mode=true&strip_headers=true&no_binary=true"
16         )
17         ret = cls(client)
18         try:
19             asyncio.create_task(ret._message_handler())
20         except (websockets.exceptions.InvalidURI, websockets.exceptions.InvalidHandshake) as e:
21             raise RuntimeError(f"Failed to connect to WebSocket: {e}") from e
22         return ret
23 
24     async def _message_handler(self):
25         try:
26             while True:
27                 message = await self._websocket.recv()
28                 try:
29                     parsed_json = json.loads(message)
30                     chunk = SnippetAudioChunk.model_validate(parsed_json)
31                     await self._message_queue.put(chunk)
32                 except Exception as parse_error:
33                     print(f"Error parsing message: {parse_error}")
34                     print(f"Raw message was: {message}")
35         except websockets.exceptions.ConnectionClosed:
36             print("WebSocket connection closed")
37             await self._message_queue.put(None)  # Signal end of stream
38         except Exception as e:
39             print(f"Error in message handler: {e}")
40             await self._message_queue.put(None)
41 
42     async def __aiter__(self) -> AsyncGenerator[SnippetAudioChunk, None]:
43         while True:
44             message = await self._message_queue.get()
45             if message is None:
46                 break
47             yield message
48 
49     def send(self, tts: PublishTts):
50         message = tts.json()
51         print(f"Sending TTS message: {message}")
52         asyncio.create_task(self._websocket.send(message))
53 
54     async def _send_dict(self, message: Dict[str, Any]):
55         await self._websocket.send(json.dumps(message))
56 
57     async def close(self):
58         if self._websocket and not self._websocket.closed:
59             await self._websocket.close()

You can use the client as follows:

1 import asyncio
2 import base64
3 from streaming import StreamingTtsClient
4 from hume.tts import PublishTts
5 from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
6 
7 stream = await StreamingTtsClient.connect(api_key)
8 
9 # Helper functions for flushing and closing the stream
10 def send_flush():
11     asyncio.create_task(stream._send_dict({"flush": True}))
12 
13 def send_close():
14     asyncio.create_task(stream._send_dict({"close": True}))
15 
16 async def send_input():
17     print("Sending TTS messages...")
18     stream.send(PublishTts(text="Hello world."))
19     send_flush()
20     print('Waiting 8 seconds...')
21     await asyncio.sleep(8)
22     stream.send(PublishTts(text="Goodbye, world."))
23     send_flush()
24     print("Closing stream...")
25     send_close()
26 
27 async def handle_messages():
28     await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)
29 
30 await asyncio.gather(handle_messages(), send_input())

Running the Example

uv

poetry

venv

$ uv run app.py