TTS Python Quickstart Guide

Step-by-step guide for integrating the TTS API using Hume’s Python SDK.

This guide shows how to get started using Hume’s Text-to-Speech capabilities in Python using Hume’s Python SDK. It demonstrates:

  1. Converting text to speech with a new voice.
  2. Saving a voice to your voice library for future use.
  3. Giving “acting instructions” to modulate the voice.
  4. Generating multiple variations of the same text at once.
  5. Providing context to maintain consistency across multiple generations.

The complete code for the example in this guide is available on GitHub.

Environment Setup

Set up a Python virtual environment and install the required packages:

uv
$uv init
>uv add hume python-dotenv aiofiles

Authenticating the HumeClient

You must authenticate to use the Hume TTS API. Your API key can be retrieved from the Hume AI platform.

This example uses python-dotenv. Place your API key in a file .env at the root your project.

.env
$echo "HUME_API_KEY=your_api_key_here" > .env

Then create a new file app.py and use your API key to instantiate the AsyncHumeClient.

1from dotenv import load_dotenv
2import os
3from hume import AsyncHumeClient
4import asyncio
5
6load_dotenv()
7api_key = os.getenv("HUME_API_KEY")
8if not api_key:
9 raise EnvironmentError("HUME_API_KEY not found in environment variables")
10
11hume = AsyncHumeClient(api_key=api_key)

Helper function

Define a function to aid in writing generated audio to a temporary file:

1import time
2import base64
3import tempfile
4from pathlib import Path
5import aiofiles
6from hume.tts import ReturnGeneration
7
8# Create an output directory in the temporary folder.
9timestamp = int(time.time() * 1000) # similar to Date.now() in JavaScript
10output_dir = Path(tempfile.gettempdir()) / f"hume-audio-{timestamp}"
11
12async def write_result_to_file(base64_encoded_audio: str, filename: str) -> None:
13 file_path = output_dir / f"{filename}.wav"
14 audio_data = base64.b64decode(base64_encoded_audio)
15 async with aiofiles.open(file_path, "wb") as f:
16 await f.write(audio_data)
17 print("Wrote", file_path)
18
19async def main() -> None:
20 output_dir.mkdir(parents=True, exist_ok=True)
21 print("Results will be written to", output_dir)
22
23 # All the code examples in the remainder of the guide
24 # belong within this main function.
25
26if __name__ == "__main__":
27 asyncio.run(main())
28 print("Done")

Calling Text-to-Speech

To use Hume TTS, you can call hume.tts.synthesize_json with a list of utterances. Inside each utterance, put the text to speak, and optionally provide a description of how the voice speaking the text should sound. If you don’t provide a description, Hume will examine text and attempt to determine an appropriate voice.

The base64-encoded bytes of an audio file with your speech will be present at .generations[0].audio in the returned object. By default, there will only be a single variation in the .generations array, and the audio will be in wav format.

The .generations[0].generation_id field will contain an ID you can use to refer to this specific generation of speech in future requests.

1from hume.tts import PostedUtterance
2
3speech1 = await hume.tts.synthesize_json(
4 utterances=[
5 PostedUtterance(
6 description="A refined, British aristocrat",
7 text="Take an arrow from the quiver.",
8 )
9 ]
10)
11await write_result_to_file(speech1.generations[0].audio, "speech1_0")

Saving voices

Use hume.tts.voices.create to save the voice of a generated piece of audio to your voice library for future use:

1generation_id = speech1.generations[0].generation_id
2await hume.tts.voices.create(
3 name=f"aristocrat-{int(time.time())}",
4 generation_id=generation_id
5)

Continuity

Inside an utterance, specify the name or ID of a voice to generate more speech from that voice.

To generate speech that is meant to follow previously generated speech, specify context with the generation_id of that speech.

You can specify a number up to 5 in num_generations to generate multiple variations of the same speech at the same time.

1from hume.tts import PostedContextWithGenerationId, PostedUtteranceVoiceWithName
2
3speech2 = await hume.tts.synthesize_json(
4 utterances=[
5 PostedUtterance(
6 voice=PostedUtteranceVoiceWithName(name="aristocrat"),
7 text="Now take a bow.",
8 )
9 ],
10 context=PostedContextWithGenerationId(generation_id=generation_id),
11 num_generations=2,
12)
13
14await write_result_to_file(speech2.generations[0].audio, "speech2_0")
15await write_result_to_file(speech2.generations[1].audio, "speech2_1")

Acting Instructions

If you specify both voice and description, the description field will behave as “acting instructions”. It will keep the character of the specified voice, but modulated to match description.

1speech3 = await hume.tts.synthesize_json(
2 utterances=[
3 PostedUtterance(
4 voice=PostedUtteranceVoiceWithName(name="aristocrat"),
5 description="Murmured softly, with a heavy dose of sarcasm and contempt",
6 text="Does he even know how to use that thing?",
7 )
8 ],
9 context=PostedContextWithGenerationId(
10 generation_id=speech2.generations[0].generation_id
11 ),
12 num_generations=1,
13)
14await write_result_to_file(speech3.generations[0].audio, "speech3_0")

Running the Example

$uv run app.py
Built with