LiveKit is an open-source platform for low-latency, bi-directional audio streaming and real-time media orchestration. With LiveKit Agents, developers can compose voice pipelines using modular components like speech-to-text (STT), large language models (LLMs), and text-to-speech (TTS).
Hume’s expressive TTS can be integrated into your LiveKit Agents pipelines using the Hume LiveKit Agents TTS plugin. This guide covers setup instructions, integration modes, and configuration best practices.
Wanna get right to the code? See our complete LiveKit example project on GitHub.
To use the Hume TTS plugin with LiveKit Agents, you’ll need both Hume and LiveKit credentials. Follow these steps to obtain your credentials and set up environment variables.
To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.
Deploy a LiveKit server or use LiveKit Cloud. In your project dashboard, copy the following:
LIVEKIT_URL – your server URLLIVEKIT_API_KEY – your project’s API keyLIVEKIT_API_SECRET – your API secretThe Hume TTS plugin for LiveKit Agents can be used for two use cases: Agent Sessions and Standalone TTS.
When using the Hume TTS plugin within an AgentSession, follow these guidelines to ensure responsive performance and consistent voice behavior throughout the session:
Enable instant_mode: Reduces latency significantly. Note that enabling instant_mode requires explicitly
specifying a voice.
Specify a voice: Select from Hume’s extensive Voice Library or a custom voice for voice consistency.
Omit crafting parameters: Parameters for crafting TTS output like description, speed, trailing_silence,
and context are globally applied to all session responses. These parameters are best suited for standalone TTS.
Example implementation:
For a complete AgentSession implementation, see our LiveKit Agents example project.
The Hume TTS plugin can be used independently for direct text-to-speech synthesis, outside of an AgentSession. This is useful when you don’t need STT or LLM components and want full control over voice selection, expressiveness, and timing on a per-request basis.
When using the plugin this way, keep the following in mind:
One utterance per request: LiveKit processes only a single utterance per request. Split multi-part dialogue into separate requests to control delivery for each line.
Supply acting instructions: Use utterance options like description, speed, and trailing_silence to shape
the delivery. See our acting instructions guide for best practices.
Provide context: Use the context field to maintain continuity across requests. For more on how context influences output, see our continuation guide.
Example implementation:
For a working Standalone TTS implementation, see our LiveKit Agents example project.
Audio format support: The Hume TTS plugin supports WAV, MP3, and PCM audio formats. If no audio format is specified, it will default to MP3.
Fixed sample rate: The Hume TTS API outputs audio at a fixed sample rate of 48kHz. Ensure compatibility with your audio processing pipeline.
Output data limitations: The plugin returns audio data with encoding details only. Additional Hume API response
data (such as generation_id) is not included.
Normalized TTS interface: Due to the LiveKit Agents SDK’s normalized interface, each request must contain a single utterance. Split multi-utterance text into separate requests.
Plugin options persist: The configuration set when initializing the plugin is applied to each TTS request. To
update these options during a session, use the plugin’s update_options method. See the
LiveKit documentation for usage
details.
Explore the source code or contribute to the Hume TTS plugin for LiveKit Agents on GitHub.
Reference the official LiveKit docs for a full list of plugin options and additional configuration details.
Use a working example to get started with Hume TTS and LiveKit Agents in Python.
Learn more about Hume’s speech-language model, and features of Hume’s TTS API.