Pipecat
Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. With Pipecat, developers can orchestrate audio and video, AI services, different transports, and conversation pipelines using a modular, frame-based architecture.
Hume’s expressive TTS can be integrated into your Pipecat pipelines using the HumeTTSService. This guide covers setup instructions, integration patterns, and configuration best practices.
Wanna get right to the code? See our complete Pipecat example project on GitHub.
Authentication
To use the Hume TTS service with Pipecat, you’ll need your Hume API credentials. Follow these steps to obtain your credentials and set up environment variables.
Get your Hume API key
To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.
Get your Hume voice ID
Browse the Hume Voice Library to select a voice for your agent. Copy the voice ID for use in your configuration.
Usage
The HumeTTSService in Pipecat can be used for conversational agents with STT → LLM → TTS pipelines. It supports word-level timestamps for precise audio-text synchronization and dynamic updates of voice and synthesis parameters at runtime.
Basic Pipeline Integration
When using HumeTTSService within a Pipecat pipeline, follow these guidelines to ensure responsive performance and proper voice configuration:
-
Specify a voice: Select from Hume’s extensive Voice Library or use a custom voice ID for voice consistency.
-
Configure audio sample rate: Hume TTS streams at 48kHz. Ensure your pipeline’s
audio_out_sample_ratematches this for optimal performance. -
Enable word timestamps: The service supports word-level timestamps by default, which are useful for synchronizing audio with text display.
Example implementation:
For a complete Pipecat implementation, see our Pipecat example project.
Advanced Configuration
The HumeTTSService supports advanced configuration options including acting instructions (currently only supported in Octave 1, so this will switch your model from Octave 2 to Octave 1), speed control, and trailing silence:
Runtime Configuration Updates
You can update voice and synthesis parameters at runtime using TTSUpdateSettingsFrame:
Word Timestamps
The HumeTTSService supports word-level timestamps for precise audio-text synchronization. Use observers like DebugLogObserver to log timestamps or RTVIObserver to display them in your UI:
Constraints
-
Audio format support: The
HumeTTSServicestreams PCM audio at 48kHz. Downstream processors can resample if needed. -
Frame-based architecture: Pipecat uses a frame-based pipeline system. The service emits
TTSAudioRawFrameframes suitable for Pipecat transports. -
Word timestamps: Word-level timestamps are enabled by default and provide precise timing information for each word in the generated speech.
-
Instant mode: The service always uses instant mode for low-latency streaming. This is not user-configurable.
Resources
Explore the source code or contribute to Pipecat on GitHub.
Reference the official Pipecat docs for framework architecture and additional configuration details.
Use a working example to get started with Hume TTS and Pipecat in Python.
Learn more about Hume’s speech-language model, and features of Hume’s TTS API.

