Pipecat | Hume API

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. With Pipecat, developers can orchestrate audio and video, AI services, different transports, and conversation pipelines using a modular, frame-based architecture.

Hume’s expressive TTS can be integrated into your Pipecat pipelines using the HumeTTSService. This guide covers setup instructions, integration patterns, and configuration best practices.

Wanna get right to the code? See our complete Pipecat example project on GitHub.

Authentication

To use the Hume TTS service with Pipecat, you’ll need your Hume API credentials. Follow these steps to obtain your credentials and set up environment variables.

Get your Hume API key

To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.

Get your Hume voice ID

Browse the Hume Voice Library to select a voice for your agent. Copy the voice ID for use in your configuration.

Configure environment variables

Create a .env file in your project and define the required environment variables. The service reads your Hume API key from the HUME_API_KEY variable.

.env

HUME_API_KEY=...
HUME_VOICE_ID=...

Usage

The HumeTTSService in Pipecat can be used for conversational agents with STT → LLM → TTS pipelines. It supports word-level timestamps for precise audio-text synchronization and dynamic updates of voice and synthesis parameters at runtime.

Basic Pipeline Integration

When using HumeTTSService within a Pipecat pipeline, follow these guidelines to ensure responsive performance and proper voice configuration:

Specify a voice: Select from Hume’s extensive Voice Library or use a custom voice ID for voice consistency.
Configure audio sample rate: Hume TTS streams at 48kHz. Ensure your pipeline’s audio_out_sample_rate matches this for optimal performance.
Enable word timestamps: The service supports word-level timestamps by default, which are useful for synchronizing audio with text display.

Example implementation:

For a complete Pipecat implementation, see our Pipecat example project.

Basic Pipeline

1 import os
2 from dotenv import load_dotenv
3 from pipecat.services.hume.tts import HUME_SAMPLE_RATE, HumeTTSService
4 from pipecat.services.openai.llm import OpenAILLMService
5 from pipecat.services.deepgram.stt import DeepgramSTTService
6 from pipecat.pipeline.pipeline import Pipeline
7 from pipecat.pipeline.runner import PipelineRunner
8 from pipecat.pipeline.task import PipelineParams, PipelineTask
9 
10 load_dotenv(override=True)
11 
12 async def run_bot(transport, runner_args):
13     # 1. Configure the Hume TTS service
14     tts = HumeTTSService(
15         api_key=os.getenv("HUME_API_KEY"),
16         voice_id=os.getenv("HUME_VOICE_ID"),
17     )
18 
19     # 2. Configure STT and LLM services
20     stt = # your STT service provider here
21     llm = # your LLM service provider here
22 
23     # 3. Create your pipeline
24     pipeline = Pipeline([
25         transport.input(),
26         stt,
27         context_aggregator.user(),
28         llm,
29         tts,  # Hume TTS with word timestamps
30         transport.output(),
31         context_aggregator.assistant(),
32     ])
33 
34     # 4. Configure task with matching sample rate
35     task = PipelineTask(
36         pipeline,
37         params=PipelineParams(
38             enable_metrics=True,
39             enable_usage_metrics=True,
40             audio_out_sample_rate=HUME_SAMPLE_RATE,  # 48000 Hz
41         ),
42     )
43 
44     # 5. Run the pipeline
45     runner = PipelineRunner()
46     await runner.run(task)

Advanced Configuration

The HumeTTSService supports advanced configuration options including acting instructions (currently only supported in Octave 1, so this will switch your model from Octave 2 to Octave 1), speed control, and trailing silence:

Advanced Configuration

1 from pipecat.services.hume.tts import HumeTTSService, HumeTTSService.InputParams
2 
3 tts = HumeTTSService(
4     api_key=os.getenv("HUME_API_KEY"),
5     voice_id=os.getenv("HUME_VOICE_ID"),
6     params=HumeTTSService.InputParams(
7         description="calm, pedagogical",  # Acting instructions
8         speed=0.8,  # Speaking-rate multiplier (0.5-2.0)
9         trailing_silence=2.0,  # Seconds of silence to append (0-5)
10     ),
11 )

Runtime Configuration Updates

You can update voice and synthesis parameters at runtime using TTSUpdateSettingsFrame:

Runtime Updates

1 from pipecat.frames.frames import TTSUpdateSettingsFrame
2 
3 # Update voice
4 await task.queue_frames([
5     TTSUpdateSettingsFrame(settings={"voice_id": "new-voice-id"})
6 ])
7 
8 # Update synthesis parameters
9 await task.queue_frames([
10     TTSUpdateSettingsFrame(settings={
11         "description": "excited, enthusiastic",
12         "speed": 1.2,
13     })
14 ])

Word Timestamps

The HumeTTSService supports word-level timestamps for precise audio-text synchronization. Use observers like DebugLogObserver to log timestamps or RTVIObserver to display them in your UI:

Word Timestamps

1 from pipecat.observers.loggers.debug_log_observer import (
2     DebugLogObserver,
3     FrameEndpoint,
4 )
5 from pipecat.transports.base_output import BaseOutputTransport
6 from pipecat.frames.frames import TTSTextFrame
7 
8 task = PipelineTask(
9     pipeline,
10     params=PipelineParams(
11         enable_metrics=True,
12         enable_usage_metrics=True,
13         audio_out_sample_rate=HUME_SAMPLE_RATE,
14     ),
15     observers=[
16         DebugLogObserver(
17             frame_types={
18                 TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
19             }
20         ),
21     ],
22 )

Constraints

Audio format support: The HumeTTSService streams PCM audio at 48kHz. Downstream processors can resample if needed.
Frame-based architecture: Pipecat uses a frame-based pipeline system. The service emits TTSAudioRawFrame frames suitable for Pipecat transports.
Word timestamps: Word-level timestamps are enabled by default and provide precise timing information for each word in the generated speech.
Instant mode: The service always uses instant mode for low-latency streaming. This is not user-configurable.

Resources

Pipecat Source Code

Explore the source code or contribute to Pipecat on GitHub.

Pipecat Documentation

Reference the official Pipecat docs for framework architecture and additional configuration details.

Pipecat Example Project

Use a working example to get started with Hume TTS and Pipecat in Python.

Hume TTS Documentation

Learn more about Hume’s speech-language model, and features of Hume’s TTS API.