LiveKit

Guide to integrating Hume TTS via the Hume LiveKit Agents plugin.

LiveKit is an open-source platform for low-latency, bi-directional audio streaming and real-time media orchestration. With LiveKit Agents, developers can compose voice pipelines using modular components like speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS).

Hume’s expressive TTS can be integrated into your LiveKit Agents pipelines using the Hume LiveKit Agents TTS plugin. This guide covers setup instructions, integration modes, and configuration best practices.

Wanna get right to the code? See our complete LiveKit example project on GitHub.

Authentication

To use the Hume TTS plugin with LiveKit Agents, you’ll need both Hume and LiveKit credentials. Follow these steps to obtain your credentials and set up environment variables.

1

Get your Hume API key

To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.

2

Copy your LiveKit credentials

Deploy a LiveKit server or use LiveKit Cloud. In your project dashboard, copy the following:

  • LIVEKIT_URL – your server URL
  • LIVEKIT_API_KEY – your project’s API key
  • LIVEKIT_API_SECRET – your API secret
3

Configure environment variables

Create a .env file in your project and define the required environment variables. The plugin reads your Hume API key from the HUME_API_KEY variable.

.env
HUME_API_KEY=...
LIVEKIT_URL=...
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...

Usage

The Hume TTS plugin for LiveKit Agents can be used for two use cases: Agent Sessions and Standalone TTS.

ModeUse caseVoice Control
AgentSessionReal-time conversational agents using STT → LLM → TTS pipelines.Set once at session start; consistent throughout the session.
Standalone TTSDirect text-to-speech without STT or LLM components.Set per request; customize voice and expressiveness as needed.

AgentSession

When using the Hume TTS plugin within an AgentSession, follow these guidelines to ensure responsive performance and consistent voice behavior throughout the session:

  • Enable instant_mode: Reduces latency significantly. Note that enabling instant_mode requires explicitly specifying a voice.

  • Specify a voice: Select from Hume’s extensive Voice Library or your a custom voices for voice consistency.

  • Omit crafting parameters: Parameters for crafting TTS output like description, speed, trailing_silence, and context are globally applied to all session responses. These parameters are best suited standalone TTS.

Example implementation:

For a complete AgentSession implementation, see our LiveKit Agents example project.

AgentSession
1from livekit.agents import (
2 Agent,
3 AgentSession,
4 JobContext,
5 WorkerOptions,
6 cli,
7)
8from livekit.plugins.hume import TTS, VoiceByName, VoiceProvider
9
10class VoiceAssistant(Agent):
11 def __init__(self):
12 super().__init__(instructions="Your system prompt...")
13
14async def entrypoint(ctx: JobContext) -> None:
15 await ctx.connect()
16
17 # 1. Configure the Hume TTS plugin
18 tts = TTS(
19 voice=VoiceByName(
20 name="Male English Actor",
21 provider=VoiceProvider.hume,
22 ),
23 instant_mode=True,
24 )
25
26 # 2. Create your AgentSession with STT/LLM as needed
27 session = AgentSession(
28 stt=..., # specify your STT config
29 llm=..., # specify your LLM config
30 tts=tts,
31 )
32
33 # 3. Start the session with a greeting
34 await session.start(agent=VoiceAssistant(), room=ctx.room)
35 await session.generate_reply(instructions=GREETING_INSTRUCTIONS)
36
37if __name__ == "__main__":
38 cli.run_app(
39 WorkerOptions(entrypoint_fnc=entrypoint)
40 )

Standalone TTS

The Hume TTS plugin can be used independently for direct text-to-speech synthesis, outside of an AgentSession. This is useful when you don’t need STT or LLM components and want full control over voice selection, expressiveness, and timing on a per-request basis.

When using the plugin this way, keep the following in mind:

  • One utterance per request: LiveKit processes only a single utterance per request. Split multi-part dialogue into separate requests to control delivery for each line.

  • Supply acting instructions: Use utterance options like description, speed, and trailing_silence to shape the delivery. See our acting instructions guide for best practices.

  • Provide context: Use the context field to maintain continuity across requests. For more on how context influences output, see our continuation guide.

Example implementation:

For a working Standalone TTS implementation, see our LiveKit Agents example project.

Standalone TTS
1import asyncio
2from aiohttp import ClientSession
3from livekit.plugins.hume import TTS
4from simpleaudio import play_buffer
5
6async def standalone_tts(text: str):
7 async with ClientSession() as session:
8 # 1. Configure the Hume TTS plugin
9 tts = TTS(
10 voice=VoiceByName(
11 name="Male English Actor",
12 provider=VoiceProvider.hume,
13 ),
14 description="calm, pedagogical",
15 speed=0.65,
16 trailing_silence=4,
17 instant_mode=True,
18 http_session=session,
19 ),
20
21 # 2. Collect PCM data
22 pcm_buffer = bytearray()
23 async for chunk in tts.synthesize(text):
24 pcm_buffer.extend(chunk.frame.data)
25
26 # 3. Play back the audio
27 play_buffer(
28 pcm,
29 num_channels=1, # mono
30 bytes_per_sample=2, # 16-bit
31 sample_rate=48000, # 48000 Hz
32 ).wait_done()
33
34if __name__ == "__main__":
35 asyncio.run(
36 standalone_tts("Let us begin by taking a deep breath...")
37 )

Constraints

  • Audio format support: The Hume TTS plugin supports WAV, MP3, and PCM audio formats. If no audio format is specified, it will default to MP3.

  • Fixed sample rate: The Hume TTS API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.

  • Output data limitations: The plugin returns audio data with encoding details only. Additional Hume API response data (such as generation_id) is not included.

  • Normalized TTS interface: Due to the LiveKit Agents SDK’s normalized interface, each request must contain a single utterance. Split multi-utterance text into separate requests.

  • Plugin options persist: The configuration set when initializing the plugin is applied to each TTS request. To update these options during a session, use the plugin’s update_options method. See the LiveKit documentation for usage details.

Resources