LiveKit
LiveKit is an open-source platform for low-latency, bi-directional audio streaming and real-time media orchestration. With LiveKit Agents, developers can compose voice pipelines using modular components like speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS).
Hume’s expressive TTS can be integrated into your LiveKit Agents pipelines using the Hume LiveKit Agents TTS plugin. This guide covers setup instructions, integration modes, and configuration best practices.
Wanna get right to the code? See our complete LiveKit example project on GitHub.
Authentication
To use the Hume TTS plugin with LiveKit Agents, you’ll need both Hume and LiveKit credentials. Follow these steps to obtain your credentials and set up environment variables.
Get your Hume API key
To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.
Copy your LiveKit credentials
Deploy a LiveKit server or use LiveKit Cloud. In your project dashboard, copy the following:
LIVEKIT_URL
– your server URLLIVEKIT_API_KEY
– your project’s API keyLIVEKIT_API_SECRET
– your API secret
Usage
The Hume TTS plugin for LiveKit Agents can be used for two use cases: Agent Sessions and Standalone TTS.
AgentSession
When using the Hume TTS plugin within an AgentSession, follow these guidelines to ensure responsive performance and consistent voice behavior throughout the session:
-
Enable instant_mode: Reduces latency significantly. Note that enabling
instant_mode
requires explicitly specifying a voice. -
Specify a voice: Select from Hume’s extensive Voice Library or your a custom voices for voice consistency.
-
Omit crafting parameters: Parameters for crafting TTS output like
description
,speed
,trailing_silence
, andcontext
are globally applied to all session responses. These parameters are best suited standalone TTS.
Example implementation:
For a complete AgentSession implementation, see our LiveKit Agents example project.
Standalone TTS
The Hume TTS plugin can be used independently for direct text-to-speech synthesis, outside of an AgentSession. This is useful when you don’t need STT or LLM components and want full control over voice selection, expressiveness, and timing on a per-request basis.
When using the plugin this way, keep the following in mind:
-
One utterance per request: LiveKit processes only a single utterance per request. Split multi-part dialogue into separate requests to control delivery for each line.
-
Supply acting instructions: Use utterance options like
description
,speed
, andtrailing_silence
to shape the delivery. See our acting instructions guide for best practices. -
Provide context: Use the context field to maintain continuity across requests. For more on how context influences output, see our continuation guide.
Example implementation:
For a working Standalone TTS implementation, see our LiveKit Agents example project.
Constraints
-
Audio format support: The Hume TTS plugin supports WAV, MP3, and PCM audio formats. If no audio format is specified, it will default to MP3.
-
Fixed sample rate: The Hume TTS API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.
-
Output data limitations: The plugin returns audio data with encoding details only. Additional Hume API response data (such as
generation_id
) is not included. -
Normalized TTS interface: Due to the LiveKit Agents SDK’s normalized interface, each request must contain a single utterance. Split multi-utterance text into separate requests.
-
Plugin options persist: The configuration set when initializing the plugin is applied to each TTS request. To update these options during a session, use the plugin’s
update_options
method. See the LiveKit documentation for usage details.
Resources
Explore the source code or contribute to the Hume TTS plugin for LiveKit Agents on GitHub.
Reference the official LiveKit docs for a full list of plugin options and additional configuration details.
Use a working example to get started with Hume TTS and LiveKit Agents in Python.
Learn more about Hume’s Speech language model, and features of Hume’s TTS API.