Hume MCP Server | Hume API

The Hume MCP Server implements the Model Context Protocol (MCP) for Hume AI’s TTS API, allowing you to use MCP-compatible clients like Claude Desktop, Cursor, and Windsurf to collaborate with AI assistants on your voice projects.

Quickstart

To get started with the Hume MCP Server, you’ll need to configure your MCP Client Application to use it:

Cursor

Claude Desktop

Windsurf

Click or add the following to your .cursor/mcp.json:

Cursor Configuration

1 {
2     "mcpServers": {
3         "hume": {
4             "command": "npx",
5             "args": [
6                 "@humeai/mcp-server"
7             ],
8             "env": {
9                 "HUME_API_KEY": "<your_hume_api_key>"
10             }
11         }
12     }
13 }

What for?

If you hope to narrate a large source text, such as a book, play, or long-form video, there’s a lot more to the project than just converting the text to speech. You have to

Design voices
Break the text into pieces
Assign each line of dialogue to a voice
Separate acting instructions from spoken text

LLMs can perform some of these tasks and help you keep these efforts organized. MCP is an industry protocol that lets you easily give an AI assistant the ability to use tools like Octave TTS on your behalf.

Available tools

The Hume MCP Server exposes the following tools to compatible MCP clients:

Tool	Description
`tts`	Synthesize (and play) speech from text. This is the primary tool for generating speech with optional voice selection, acting instructions, and playback control.
`play_previous_audio`	Replay previously generated audio by referencing its generation ID. Useful for comparing different versions or revisiting earlier speech samples.
`list_voices`	List all available voices in your account’s library, including both custom voices and Hume-provided preset voices.
`save_voice`	Save a generated voice to your library for reuse in future TTS requests, allowing you to build a collection of customized voices.
`delete_voice`	Remove a voice from your custom voice library when it’s no longer needed.

Prerequisites

Before using the Hume MCP Server, make sure you have the following:

An Hume account and API Key.
Node.js installed on your machine.
(Optional) A command-line audio player.
- We recommend ffplay from FFMpeg.
- The server will try to auto-detect and use any of several common players.

The MCP server calls Hume APIs on your behalf and will use credits from your account, incurring costs just as if you were making the API calls directly or using Hume’s TTS through the web interface.

Source code

The Hume MCP Server is open source. You can view and contribute to the source code in the GitHub repository.

Prompt examples

Here are some example prompts to help you get started with the Hume MCP Server.

These examples assume that the assistant has the ability to read and write from a filesystem. This usually already the case for MCP clients like Cursor that are attached to an editor. For standalone chat apps like Claude Desktop, you can give the assistant filesystem access through the Filesystem MCP Server.

Basic Voice Generation

Ask the assistant to create a voice with specific characteristics:

1 Create a warm, friendly female voice with a slight Irish accent
2 that would be good for narrating a children's story.
3 
4 Produce a good voice description and sample text by asking
5 me questions about the my desired voice qualities.
6 
7 Then, give me several options and iterate based on my feedback.

Reader Instructions

Have the assistant read you content.

1 I have the text of a blog post that I'd like to listen to in my
2 Downloads folder. Can you read it to me in an appropriate voice?

Audiobook Narration Project

This comprehensive prompt helps the assistant break down an audiobook chapter into segments and design appropriate voices:

1 <Goal>
2   Narrate the audiobook chapter in my text with high quality 
3   AI-generated speech according to my artistic vision.
4 </Goal>
5 
6 <Steps>
7   1. Break the text down into segments
8   2. Design and save a base voice for the narrator.
9   3. Design *variants* of the narrator voice for each character.
10   4. Convert the text of each segment to speech.
11 </Steps>
12 
13 <Segmentation>
14   * Every line of quoted dialogue should be its own segment
15   * Quotation marks should be removed from segments that are 
16     solely dialogue.
17   * Use the following formatting for segments
18 
19   ## Segment 1
20     voice_name: ...
21     text: ...
22     description: ...
23   ## Segment 2
24     voice_name: ...
25     text: ...
26     (no description)
27 </Segmentation>
28 
29 <ToolCalls>
30   ALWAYS stop to collect feedback and ask for confirmation before
31   performing a 'tts' tool call.
32 </ToolCalls>
33 
34 <VoiceDesign>
35   * Descriptions for a new voice should be 2 sentences MAX.
36     Sample text should be 2 sentences MAX.
37   * Don't use source text for the sample text -- invent new 
38     text that is stylized to reflect the character and emotion
39     of the desired voice.
40   * To generate a variant, ALWAYS specify the base voice as
41     `voiceName`.
42   * Descriptions should be VERY short and describe one or two
43     voice qualities (masculinity, pitch, pace) that should vary
44     from the base voice.
45 </VoiceDesign>
46 
47 <Narration>
48   * ALWAYS use continuation and voiceName.
49   * Never send acting instructions "description" unless it is
50     provided in the script.
51 </Narration>
52 
53 Let's get started with step 1!

Voice Variant Chaining

This prompt explains how to create distinct character voices through a technique called “variant chaining”:

1 To make it sound like the narrator is "doing a voice" you have to create
2 a voice with more distance from the base narrator voice than you can get
3 by generating a single iteration of providing acting instructions to 
4 modulate the voice. You can do this through "variant chaining".
5 
6 * Start with the base voice.
7 * Pick one or two qualities of the voice that are different than the base
8   voice to emphasize in the acting instructions and source text.
9 * Create and save {variant_voice}_0.
10 * Create new acting instructions and source text, use them create and 
11   save {variant_voice}_1 using {variant_voice}_0 as a base.
12 * Repeat until the results are satisfactory.
13 
14 Often times 2 variants is enough for a character of the same gender. You 
15 might need 3 or more variants emphasizing masculinity for a character of 
16 the opposite gender.

Command line options

The Hume MCP Server accepts several command line options to customize its behavior:

Command	Description
`--workdir, -w <path>`	Set working directory for audio files (default: system temp)
`--(no-)embedded-audio-mode`	Enable/disable embedded audio mode (default: false)
`--(no-)instant-mode`	Enable/disable instant mode (default: true)
`--help, -h`	Show help message

Environment variables

You can configure the behavior of the Hume MCP Server using these environment variables:

Variable	Description
`HUME_API_KEY`	Your Hume AI API key (required). You can obtain this from the Hume AI Platform.
`WORKDIR`	Working directory for audio files (default: OS temp directory + “/hume-tts”). This is where generated audio files will be stored.
`EMBEDDED_AUDIO_MODE`	Enable/disable embedded audio mode (default: false, set to ‘true’ to enable). Embedded audio files are a new addition to the MCP specification and most MCP client application do not yet support them. This can be useful if you are designing an MCP client specifically to work with Hume.
`INSTANT_MODE`	Enable/disable instant mode (default: `true`). This setting overrides the default `instant_mode` parameter sent to the TTS API.

Default API parameters

The MCP Server applies several default parameters to API requests for convenience:

Tool	Parameter	Default	Description
`tts`	`strip_headers`	`true`	Headers and non-speech text are automatically removed from the input.
	`format.type`	`"wav"`	All audio is generated in WAV format for best compatibility with audio players.
	`instant_mode`	`true`	Instant mode is enabled by default for the TTS API for faster synthesis. This default can be overridden by setting the global instant mode option through the command line flag or environment variable.
`list_voices`	`page_size`	`100`	Returns up to 100 voices per request (API default is 10) to minimize pagination needs.

TTS Overview

Learn more about Hume’s Octave TTS capabilities and features.

Prompting Guide

Best practices for prompting Octave for voice creation and voice modulation.

Acting Instructions

Guide to controlling voice expression in Octave TTS.