TTS CLI Quickstart Guide
Step-by-step guide for integrating the TTS API using Hume’s CLI.
The Hume CLI provides a simple interface for generating speech, saving voices, and exploring the features of the Hume TTS API. This guide shows how to get started using Hume’s Text-to-Speech capabilities using the Hume CLI. It demonstrates:
- Converting text to speech with a new voice.
- Saving a voice to your voice library for future use.
- Giving “acting instructions” to modulate the voice.
- Generating multiple variations of the same text at once.
- Providing context to maintain consistency across multiple generations.
Installation
Install the Hume CLI using npm:
See usage information by running hume tts --help
.
Authentication
Authenticate using the CLI:
This will open a browser window to the Hume AI platform, where you can retrieve your API key, and then prompt you to enter your API key.
Calling Text-to-Speech
To use Hume TTS,
- Provide the text you want to speak as a positional argument.
- Provide the optional
--description
flag to control how the voice sounds. If you don’t provide a description, Hume will examine the text and attempt to determine an appropriate voice.
By default, the CLI will
- save the audio to the output directory (defaults to
./hume-tts-output
) - attempt to play it automatically.
- display the
generation_id
for the speech, for future reference
Saving voices
When you find a voice you like, use the hume voices create
command to give it a name and save it to your voice library for future use. You can specify the generation ID:
or, alternatively, use the --last
flag to save the most recent generation.
Continuity
To use a voice from your library, specify its name.
If the speech should sound like it follows from previous speech, you can provide the --context-generation-id
flag with the generation_id
of the previous speech.
Alternatively, use the --last
flag to continue from the most recent generation.
Acting Instructions
If you specify both a voice and a description, the description acts as “acting instructions”. It will keep the character of the specified voice, but modulated to match the description.
Generating multiple variations
To generate multiple variations of the same text at once, use the --num-generations
flag.