LiveKit | Hume API

LiveKit is an open-source platform for low-latency, bi-directional audio streaming and real-time media orchestration. With LiveKit Agents, developers can compose voice pipelines using modular components like speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS).

Hume’s expressive TTS can be integrated into your LiveKit Agents pipelines using the Hume LiveKit Agents TTS plugin. This guide covers setup instructions, integration modes, and configuration best practices.

Wanna get right to the code? See our complete LiveKit example project on GitHub.

Authentication

To use the Hume TTS plugin with LiveKit Agents, you’ll need both Hume and LiveKit credentials. Follow these steps to obtain your credentials and set up environment variables.

Get your Hume API key

To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.

Copy your LiveKit credentials

Deploy a LiveKit server or use LiveKit Cloud. In your project dashboard, copy the following:

LIVEKIT_URL – your server URL
LIVEKIT_API_KEY – your project’s API key
LIVEKIT_API_SECRET – your API secret

Configure environment variables

Create a .env file in your project and define the required environment variables. The plugin reads your Hume API key from the HUME_API_KEY variable.

.env

HUME_API_KEY=...
LIVEKIT_URL=...
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...

Usage

The Hume TTS plugin for LiveKit Agents can be used for two use cases: Agent Sessions and Standalone TTS.

Mode	Use case	Voice Control
AgentSession	Real-time conversational agents using STT → LLM → TTS pipelines.	Set once at session start; consistent throughout the session.
Standalone TTS	Direct text-to-speech without STT or LLM components.	Set per request; customize voice and expressiveness as needed.

AgentSession

When using the Hume TTS plugin within an AgentSession, follow these guidelines to ensure responsive performance and consistent voice behavior throughout the session:

Enable instant_mode: Reduces latency significantly. Note that enabling instant_mode requires explicitly specifying a voice.
Specify a voice: Select from Hume’s extensive Voice Library or your a custom voices for voice consistency.
Omit crafting parameters: Parameters for crafting TTS output like description, speed, trailing_silence, and context are globally applied to all session responses. These parameters are best suited standalone TTS.

Example implementation:

For a complete AgentSession implementation, see our LiveKit Agents example project.

AgentSession

1 from livekit.agents import (
2   Agent,
3   AgentSession,
4   JobContext,
5   WorkerOptions,
6   cli,
7 )
8 from livekit.plugins.hume import TTS, VoiceByName, VoiceProvider
9 
10 class VoiceAssistant(Agent):
11     def __init__(self):
12         super().__init__(instructions="Your system prompt...")
13 
14 async def entrypoint(ctx: JobContext) -> None:
15     await ctx.connect()
16 
17     # 1. Configure the Hume TTS plugin
18     tts = TTS(
19         voice=VoiceByName(
20             name="Male English Actor",
21             provider=VoiceProvider.hume,
22         ),
23         instant_mode=True,
24     )
25 
26     # 2. Create your AgentSession with STT/LLM as needed
27     session = AgentSession(
28         stt=..., # specify your STT config
29         llm=..., # specify your LLM config
30         tts=tts,
31     )
32 
33     # 3. Start the session with a greeting
34     await session.start(agent=VoiceAssistant(), room=ctx.room)
35     await session.generate_reply(instructions=GREETING_INSTRUCTIONS)
36 
37 if __name__ == "__main__":
38     cli.run_app(
39         WorkerOptions(entrypoint_fnc=entrypoint)
40     )

Standalone TTS

The Hume TTS plugin can be used independently for direct text-to-speech synthesis, outside of an AgentSession. This is useful when you don’t need STT or LLM components and want full control over voice selection, expressiveness, and timing on a per-request basis.

When using the plugin this way, keep the following in mind:

One utterance per request: LiveKit processes only a single utterance per request. Split multi-part dialogue into separate requests to control delivery for each line.
Supply acting instructions: Use utterance options like description, speed, and trailing_silence to shape the delivery. See our acting instructions guide for best practices.
Provide context: Use the context field to maintain continuity across requests. For more on how context influences output, see our continuation guide.

Example implementation:

For a working Standalone TTS implementation, see our LiveKit Agents example project.

Standalone TTS

1 import asyncio
2 from aiohttp import ClientSession
3 from livekit.plugins.hume import TTS
4 from simpleaudio import play_buffer
5 
6 async def standalone_tts(text: str):
7     async with ClientSession() as session:
8         # 1. Configure the Hume TTS plugin
9         tts = TTS(
10             voice=VoiceByName(
11                 name="Male English Actor",
12                 provider=VoiceProvider.hume,
13             ),
14             description="calm, pedagogical",
15             speed=0.65,
16             trailing_silence=4,
17             instant_mode=True,
18             http_session=session,
19         ),
20 
21         # 2. Collect PCM data
22         pcm_buffer = bytearray()
23         async for chunk in tts.synthesize(text):
24             pcm_buffer.extend(chunk.frame.data)
25 
26     # 3. Play back the audio
27     play_buffer(
28         pcm,
29         num_channels=1,     # mono
30         bytes_per_sample=2, # 16-bit
31         sample_rate=48000,  # 48000 Hz
32     ).wait_done()
33 
34 if __name__ == "__main__":
35     asyncio.run(
36         standalone_tts("Let us begin by taking a deep breath...")
37     )

Constraints

Audio format support: The Hume TTS plugin supports WAV, MP3, and PCM audio formats. If no audio format is specified, it will default to MP3.
Fixed sample rate: The Hume TTS API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.
Output data limitations: The plugin returns audio data with encoding details only. Additional Hume API response data (such as generation_id) is not included.
Normalized TTS interface: Due to the LiveKit Agents SDK’s normalized interface, each request must contain a single utterance. Split multi-utterance text into separate requests.
Plugin options persist: The configuration set when initializing the plugin is applied to each TTS request. To update these options during a session, use the plugin’s update_options method. See the LiveKit documentation for usage details.

Resources

LiveKit Source Code

Explore the source code or contribute to the Hume TTS plugin for LiveKit Agents on GitHub.

LiveKit Documentation

Reference the official LiveKit docs for a full list of plugin options and additional configuration details.

LiveKit Example Project

Use a working example to get started with Hume TTS and LiveKit Agents in Python.

Hume TTS Documentation

Learn more about Hume’s speech-language model, and features of Hume’s TTS API.