TTS Python Quickstart Guide

Step-by-step guide for integrating the TTS API using Hume’s Python SDK.

This guide shows how to use Hume’s Text-to-Speech API using Hume’s Python SDK. It assumes you are running on a system with access to speakers for playback and with the PortAudio library installed.

It demonstrates:

  1. Using an existing voice.
  2. Create a new voice via a prompt.
  3. Continuing from previous speech.
  4. Providing “acting instructions” to modulate the voice.
  5. Generating speech from live input.

The complete code for the example in this guide is available on GitHub.

Environment Setup

Set up a Python virtual environment and install the required packages. We recommend uv or Poetry for managing your environment and dependencies, but you can also use venv and pip.

uv
$uv init
>uv add hume[microphone] python-dotenv

Authenticating the HumeClient

You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.

This example uses python-dotenv. Place your API key in a .env file at the root of your project.

.env
$echo "HUME_API_KEY=your_api_key_here" > .env

First, use your API key to instantiate the AsyncHumeClient, importing as necessary.

1# app.py
2import os
3from hume import AsyncHumeClient
4from dotenv import load_dotenv
5
6load_dotenv()
7
8api_key = os.getenv("HUME_API_KEY")
9if not api_key:
10 raise EnvironmentError("HUME_API_KEY not found in environment variables.")
11
12hume = AsyncHumeClient(api_key=api_key)

Using a pre-existing voice

Use this method if you want to synthesize speech with a high-quality voice from Hume’s Voice Library, or specify provider='CUSTOM_VOICE' to use a voice that you created previously via the Hume Platform or the API.

1import base64
2from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
3from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName
4
5utterance = PostedUtterance(
6 text="Dogs became domesticated between 23,000 and 30,000 years ago.",
7 voice=PostedUtteranceVoiceWithName(name='Ava Song', provider='HUME_AI')
8)
9
10stream = hume.tts.synthesize_json_streaming(
11 utterances=[utterance],
12 strip_headers=True
13)
14
15await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)

Create a new voice via a prompt

The Voice Creation API allows you to create custom voices programatically, via prompting. There are two steps to creating a voice:

  1. Send a description of the voice, along with sample text that is characteristic of the voice, to the standard tts endpoint without specifying a voice.
  2. Take the generation_id from one of the resulting audio samples, and use it to create a new voice with the Voice Creation API.
1import base64
2import time
3from hume.empathic_voice.chat.audio.audio_utilities import play_audio
4from hume.tts import PostedUtterance
5
6result1 = await hume.tts.synthesize_json(
7 utterances=[PostedUtterance(
8 description="Crisp, upper-class British accent with impeccably articulated consonants and perfectly placed vowels. Authoritative and theatrical, as if giving a lecture.",
9 text="The science of speech. That\'s my profession; also my hobby. Happy is the man who can make a living by his hobby!"
10 )],
11 num_generations=2,
12)
13
14sample_number = 1
15for generation in result1.generations:
16 print(f'Playing option {sample_number}... সন')
17 audio_data = base64.b64decode(generation.audio)
18 await play_audio(audio_data)
19 sample_number += 1
20
21# Prompt user to select which voice they prefer
22print('\nWhich voice did you prefer?')
23print('1. First voice (generation ID:', result1.generations[0].generation_id, ') সন')
24print('2. Second voice (generation ID:', result1.generations[1].generation_id, ') সন')
25
26try:
27 user_choice = input('Enter your choice (1 or 2): ').strip()
28except EOFError:
29 user_choice = '1'
30 print('No input available, selecting option 1')
31
32selected_index = int(user_choice) - 1
33
34if selected_index not in [0, 1]:
35 raise ValueError('Invalid choice. Please select 1 or 2.')
36
37selected_generation_id = result1.generations[selected_index].generation_id
38print(f'Selected voice option {selected_index + 1} (generation ID: {selected_generation_id})')
39
40# Save the selected voice
41voice_name = f'higgins-{int(time.time() * 1000)}'
42await hume.tts.voices.create(
43 name=voice_name,
44 generation_id=selected_generation_id,
45)
46
47print(f'Created voice: {voice_name}')

Continuing previous speech

You can make new speech sound like a natural continuation from previous speech by providing the generation_id of the previous audio in the context parameter. This helps maintain consistency in tone, pacing, and emotional state.

Additionally, you can provide “acting instructions” using the description field alongside an existing voice. When you specify both a voice and a description, the description modulates the voice’s tone, emotion, and delivery style while maintaining the core voice characteristics.

1import base64
2from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
3from hume.tts import PostedUtterance, PostedUtteranceVoiceWithName, PostedContextWithGenerationId
4
5stream = hume.tts.synthesize_json_streaming(
6 utterances=[PostedUtterance(
7 voice=PostedUtteranceVoiceWithName(name=voice_name),
8 text="YOU can spot an Irishman or a Yorkshireman by his brogue. I can place any man within six miles. I can place him within two miles in London. Sometimes within two streets.",
9 description="Bragging about his abilities"
10 )],
11 context=PostedContextWithGenerationId(
12 generation_id=selected_generation_id
13 ),
14 strip_headers=True
15)
16
17await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)

Generating speech from live input

If you need to generate speech from text that is being produced in real-time, you can use the bidirectional streaming WebSocket endpoint at /v0/tts/stream/input.

Support for connecting to the WebSocket directly is coming soon to the Python SDK, for the time being, this example shows how you can implement a simple WebSocket client yourself.

First, create a streaming.py file with the StreamingTtsClient:

streaming.py
1import asyncio
2import json
3from typing import AsyncGenerator, Dict, Any
4import websockets
5from hume.tts import PublishTts, SnippetAudioChunk
6
7class StreamingTtsClient:
8 def __init__(self, websocket: websockets.WebSocketClientProtocol):
9 self._websocket: websockets.WebSocketClientProtocol = websocket
10 self._message_queue = asyncio.Queue()
11
12 @classmethod
13 async def connect(cls, api_key: str) -> "StreamingTtsClient":
14 client = await websockets.connect(
15 f"wss://api.hume.ai/v0/tts/stream/input?api_key={api_key}&instant_mode=true&strip_headers=true&no_binary=true"
16 )
17 ret = cls(client)
18 try:
19 asyncio.create_task(ret._message_handler())
20 except (websockets.exceptions.InvalidURI, websockets.exceptions.InvalidHandshake) as e:
21 raise RuntimeError(f"Failed to connect to WebSocket: {e}") from e
22 return ret
23
24 async def _message_handler(self):
25 try:
26 while True:
27 message = await self._websocket.recv()
28 try:
29 parsed_json = json.loads(message)
30 chunk = SnippetAudioChunk.model_validate(parsed_json)
31 await self._message_queue.put(chunk)
32 except Exception as parse_error:
33 print(f"Error parsing message: {parse_error}")
34 print(f"Raw message was: {message}")
35 except websockets.exceptions.ConnectionClosed:
36 print("WebSocket connection closed")
37 await self._message_queue.put(None) # Signal end of stream
38 except Exception as e:
39 print(f"Error in message handler: {e}")
40 await self._message_queue.put(None)
41
42 async def __aiter__(self) -> AsyncGenerator[SnippetAudioChunk, None]:
43 while True:
44 message = await self._message_queue.get()
45 if message is None:
46 break
47 yield message
48
49 def send(self, tts: PublishTts):
50 message = tts.json()
51 print(f"Sending TTS message: {message}")
52 asyncio.create_task(self._websocket.send(message))
53
54 async def _send_dict(self, message: Dict[str, Any]):
55 await self._websocket.send(json.dumps(message))
56
57 async def close(self):
58 if self._websocket and not self._websocket.closed:
59 await self._websocket.close()

You can use the client as follows:

1import asyncio
2import base64
3from streaming import StreamingTtsClient
4from hume.tts import PublishTts
5from hume.empathic_voice.chat.audio.audio_utilities import play_audio_streaming
6
7stream = await StreamingTtsClient.connect(api_key)
8
9# Helper functions for flushing and closing the stream
10def send_flush():
11 asyncio.create_task(stream._send_dict({"flush": True}))
12
13def send_close():
14 asyncio.create_task(stream._send_dict({"close": True}))
15
16async def send_input():
17 print("Sending TTS messages...")
18 stream.send(PublishTts(text="Hello world."))
19 send_flush()
20 print('Waiting 8 seconds...')
21 await asyncio.sleep(8)
22 stream.send(PublishTts(text="Goodbye, world."))
23 send_flush()
24 print("Closing stream...")
25 send_close()
26
27async def handle_messages():
28 await play_audio_streaming(base64.b64decode(chunk.audio) async for chunk in stream)
29
30await asyncio.gather(handle_messages(), send_input())

Running the Example

$uv run app.py