Hume MCP Server

Use Hume AI's Octave TTS with your favorite MCP clients like Claude Desktop, Cursor, and Windsurf.

The Hume MCP Server implements the Model Context Protocol (MCP) for Hume AI’s TTS API, allowing you to use MCP-compatible clients like Claude Desktop, Cursor, and Windsurf to collaborate with AI assistants on your voice projects.

What for?

If you hope to narrate a large source text, such as a book, play, or long-form video, there’s a lot more to the project than just converting the text to speech. You have to

  • Design voices
  • Break the text into pieces
  • Assign each line of dialogue to a voice
  • Separate acting instructions from spoken text

LLMs can perform some of these tasks and help you keep these efforts organized. MCP is an industry protocol that lets you easily give an AI assistant the ability to use tools like Octave TTS on your behalf.

Available tools

The Hume MCP Server exposes the following tools to compatible MCP clients:

ToolDescription
tts

Synthesize (and play) speech from text. This is the primary tool for generating speech with optional voice selection, acting instructions, and playback control.

play_previous_audio

Replay previously generated audio by referencing its generation ID. Useful for comparing different versions or revisiting earlier speech samples.

list_voices

List all available voices in your account’s library, including both custom voices and Hume-provided preset voices.

save_voice

Save a generated voice to your library for reuse in future TTS requests, allowing you to build a collection of customized voices.

delete_voice

Remove a voice from your custom voice library when it’s no longer needed.

Prerequisites

Before using the Hume MCP Server, make sure you have the following:

  1. An Hume account and API Key.
  2. Node.js installed on your machine.
  3. (Optional) A command-line audio player.
    • We recommend ffplay from FFMpeg.
    • The server will try to auto-detect and use any of several common players.

The MCP server calls Hume APIs on your behalf and will use credits from your account, incurring costs just as if you were making the API calls directly or using Hume’s TTS through the web interface.

Configure your MCP client

To get started with the Hume MCP Server, you’ll need to configure your MCP Client Application to use it:

Add the following to the .mcpServers property in claude_desktop_config.json configuration file.

Claude Desktop Configuration
1{
2 "mcpServers": {
3 "hume": {
4 "command": "npx",
5 "args": [
6 "@humeai/mcp-server"
7 ],
8 "env": {
9 "HUME_API_KEY": "<your_hume_api_key>"
10 }
11 }
12 }
13}

Source code

The Hume MCP Server is open source. You can view and contribute to the source code in the GitHub repository.

Prompt examples

Here are some example prompts to help you get started with the Hume MCP Server.

These examples assume that the assistant has the ability to read and write from a filesystem. This usually already the case for MCP clients like Cursor that are attached to an editor. For standalone chat apps like Claude Desktop, you can give the assistant filesystem access through the Filesystem MCP Server.

Ask the assistant to create a voice with specific characteristics:

1Create a warm, friendly female voice with a slight Irish accent
2that would be good for narrating a children's story.
3
4Produce a good voice description and sample text by asking
5me questions about the my desired voice qualities.
6
7Then, give me several options and iterate based on my feedback.

Have the assistant read you content.

1I have the text of a blog post that I'd like to listen to in my
2Downloads folder. Can you read it to me in an appropriate voice?

This comprehensive prompt helps the assistant break down an audiobook chapter into segments and design appropriate voices:

1<Goal>
2 Narrate the audiobook chapter in my text with high quality
3 AI-generated speech according to my artistic vision.
4</Goal>
5
6<Steps>
7 1. Break the text down into segments
8 2. Design and save a base voice for the narrator.
9 3. Design *variants* of the narrator voice for each character.
10 4. Convert the text of each segment to speech.
11</Steps>
12
13<Segmentation>
14 * Every line of quoted dialogue should be its own segment
15 * Quotation marks should be removed from segments that are
16 solely dialogue.
17 * Use the following formatting for segments
18
19 ## Segment 1
20 voice_name: ...
21 text: ...
22 description: ...
23 ## Segment 2
24 voice_name: ...
25 text: ...
26 (no description)
27</Segmentation>
28
29<ToolCalls>
30 ALWAYS stop to collect feedback and ask for confirmation before
31 performing a 'tts' tool call.
32</ToolCalls>
33
34<VoiceDesign>
35 * Descriptions for a new voice should be 2 sentences MAX.
36 Sample text should be 2 sentences MAX.
37 * Don't use source text for the sample text -- invent new
38 text that is stylized to reflect the character and emotion
39 of the desired voice.
40 * To generate a variant, ALWAYS specify the base voice as
41 `voiceName`.
42 * Descriptions should be VERY short and describe one or two
43 voice qualities (masculinity, pitch, pace) that should vary
44 from the base voice.
45</VoiceDesign>
46
47<Narration>
48 * ALWAYS use continuation and voiceName.
49 * Never send acting instructions "description" unless it is
50 provided in the script.
51</Narration>
52
53Let's get started with step 1!

This prompt explains how to create distinct character voices through a technique called “variant chaining”:

1To make it sound like the narrator is "doing a voice" you have to create
2a voice with more distance from the base narrator voice than you can get
3by generating a single iteration of providing acting instructions to
4modulate the voice. You can do this through "variant chaining".
5
6* Start with the base voice.
7* Pick one or two qualities of the voice that are different than the base
8 voice to emphasize in the acting instructions and source text.
9* Create and save {variant_voice}_0.
10* Create new acting instructions and source text, use them create and
11 save {variant_voice}_1 using {variant_voice}_0 as a base.
12* Repeat until the results are satisfactory.
13
14Often times 2 variants is enough for a character of the same gender. You
15might need 3 or more variants emphasizing masculinity for a character of
16the opposite gender.

Command line options

The Hume MCP Server accepts several command line options to customize its behavior:

CommandDescription
--workdir, -w <path>Set working directory for audio files (default: system temp)
--(no-)embedded-audio-modeEnable/disable embedded audio mode (default: false)
--(no-)instant-modeEnable/disable instant mode (default: false) (incurs 10% additional cost)
--help, -hShow help message

Environment variables

You can configure the behavior of the Hume MCP Server using these environment variables:

VariableDescription
HUME_API_KEY

Your Hume AI API key (required). You can obtain this from the Hume AI Platform.

WORKDIR

Working directory for audio files (default: OS temp directory + “/hume-tts”). This is where generated audio files will be stored.

EMBEDDED_AUDIO_MODE

Enable/disable embedded audio mode (default: false, set to ‘true’ to enable).

Embedded audio files are a new addition to the MCP specification and most MCP client application do not yet support them. This can be useful if you are designing an MCP client specifically to work with Hume.

INSTANT_MODE

Enable/disable instant mode (default: false, set to ‘true’ to enable). Instant mode allows for faster TTS generation but incurs a 10% additional cost. This setting overrides the default instant_mode parameter sent to the TTS API.

Default API parameters

The MCP Server applies several default parameters to API requests for convenience:

ToolParameterDefaultDescription
ttsstrip_headerstrue

Headers and non-speech text are automatically removed from the input.

format.type"wav"

All audio is generated in WAV format for best compatibility with audio players.

instant_modetrue

Instant mode is enabled by default for the TTS API (API default is false) for faster synthesis. This default can be overridden by setting the global instant mode option through the command line flag or environment variable.

list_voicespage_size100

Returns up to 100 voices per request (API default is 10) to minimize pagination needs.