Pipecat

Guide to integrating Hume TTS via the Pipecat framework.

Pipecat is an open-source Python framework for building real-time voice and multimodal conversational agents. With Pipecat, developers can orchestrate audio and video, AI services, different transports, and conversation pipelines using a modular, frame-based architecture.

Hume’s expressive TTS can be integrated into your Pipecat pipelines using the HumeTTSService. This guide covers setup instructions, integration patterns, and configuration best practices.

Wanna get right to the code? See our complete Pipecat example project on GitHub.

Authentication

To use the Hume TTS service with Pipecat, you’ll need your Hume API credentials. Follow these steps to obtain your credentials and set up environment variables.

1

Get your Hume API key

To get your Hume API key, sign in to the Hume Platform and follow the Getting your API key guide.

2

Get your Hume voice ID

Browse the Hume Voice Library to select a voice for your agent. Copy the voice ID for use in your configuration.

3

Configure environment variables

Create a .env file in your project and define the required environment variables. The service reads your Hume API key from the HUME_API_KEY variable.

.env
HUME_API_KEY=...
HUME_VOICE_ID=...

Usage

The HumeTTSService in Pipecat can be used for conversational agents with STT → LLM → TTS pipelines. It supports word-level timestamps for precise audio-text synchronization and dynamic updates of voice and synthesis parameters at runtime.

Basic Pipeline Integration

When using HumeTTSService within a Pipecat pipeline, follow these guidelines to ensure responsive performance and proper voice configuration:

  • Specify a voice: Select from Hume’s extensive Voice Library or use a custom voice ID for voice consistency.

  • Configure audio sample rate: Hume TTS streams at 48kHz. Ensure your pipeline’s audio_out_sample_rate matches this for optimal performance.

  • Enable word timestamps: The service supports word-level timestamps by default, which are useful for synchronizing audio with text display.

Example implementation:

For a complete Pipecat implementation, see our Pipecat example project.

Basic Pipeline
1import os
2from dotenv import load_dotenv
3from pipecat.services.hume.tts import HUME_SAMPLE_RATE, HumeTTSService
4from pipecat.services.openai.llm import OpenAILLMService
5from pipecat.services.deepgram.stt import DeepgramSTTService
6from pipecat.pipeline.pipeline import Pipeline
7from pipecat.pipeline.runner import PipelineRunner
8from pipecat.pipeline.task import PipelineParams, PipelineTask
9
10load_dotenv(override=True)
11
12async def run_bot(transport, runner_args):
13 # 1. Configure the Hume TTS service
14 tts = HumeTTSService(
15 api_key=os.getenv("HUME_API_KEY"),
16 voice_id=os.getenv("HUME_VOICE_ID"),
17 )
18
19 # 2. Configure STT and LLM services
20 stt = # your STT service provider here
21 llm = # your LLM service provider here
22
23 # 3. Create your pipeline
24 pipeline = Pipeline([
25 transport.input(),
26 stt,
27 context_aggregator.user(),
28 llm,
29 tts, # Hume TTS with word timestamps
30 transport.output(),
31 context_aggregator.assistant(),
32 ])
33
34 # 4. Configure task with matching sample rate
35 task = PipelineTask(
36 pipeline,
37 params=PipelineParams(
38 enable_metrics=True,
39 enable_usage_metrics=True,
40 audio_out_sample_rate=HUME_SAMPLE_RATE, # 48000 Hz
41 ),
42 )
43
44 # 5. Run the pipeline
45 runner = PipelineRunner()
46 await runner.run(task)

Advanced Configuration

The HumeTTSService supports advanced configuration options including acting instructions (currently only supported in Octave 1, so this will switch your model from Octave 2 to Octave 1), speed control, and trailing silence:

Advanced Configuration
1from pipecat.services.hume.tts import HumeTTSService, HumeTTSService.InputParams
2
3tts = HumeTTSService(
4 api_key=os.getenv("HUME_API_KEY"),
5 voice_id=os.getenv("HUME_VOICE_ID"),
6 params=HumeTTSService.InputParams(
7 description="calm, pedagogical", # Acting instructions
8 speed=0.8, # Speaking-rate multiplier (0.5-2.0)
9 trailing_silence=2.0, # Seconds of silence to append (0-5)
10 ),
11)

Runtime Configuration Updates

You can update voice and synthesis parameters at runtime using TTSUpdateSettingsFrame:

Runtime Updates
1from pipecat.frames.frames import TTSUpdateSettingsFrame
2
3# Update voice
4await task.queue_frames([
5 TTSUpdateSettingsFrame(settings={"voice_id": "new-voice-id"})
6])
7
8# Update synthesis parameters
9await task.queue_frames([
10 TTSUpdateSettingsFrame(settings={
11 "description": "excited, enthusiastic",
12 "speed": 1.2,
13 })
14])

Word Timestamps

The HumeTTSService supports word-level timestamps for precise audio-text synchronization. Use observers like DebugLogObserver to log timestamps or RTVIObserver to display them in your UI:

Word Timestamps
1from pipecat.observers.loggers.debug_log_observer import (
2 DebugLogObserver,
3 FrameEndpoint,
4)
5from pipecat.transports.base_output import BaseOutputTransport
6from pipecat.frames.frames import TTSTextFrame
7
8task = PipelineTask(
9 pipeline,
10 params=PipelineParams(
11 enable_metrics=True,
12 enable_usage_metrics=True,
13 audio_out_sample_rate=HUME_SAMPLE_RATE,
14 ),
15 observers=[
16 DebugLogObserver(
17 frame_types={
18 TTSTextFrame: (BaseOutputTransport, FrameEndpoint.SOURCE),
19 }
20 ),
21 ],
22)

Constraints

  • Audio format support: The HumeTTSService streams PCM audio at 48kHz. Downstream processors can resample if needed.

  • Frame-based architecture: Pipecat uses a frame-based pipeline system. The service emits TTSAudioRawFrame frames suitable for Pipecat transports.

  • Word timestamps: Word-level timestamps are enabled by default and provide precise timing information for each word in the generated speech.

  • Instant mode: The service always uses instant mode for low-latency streaming. This is not user-configurable.

Resources