TTS CLI Quickstart Guide

The Hume CLI provides a simple interface for generating speech, saving voices, and exploring the features of the Hume TTS API. This guide shows how to get started using Hume’s Text-to-Speech capabilities using the Hume CLI. It demonstrates:

Converting text to speech with a new voice.
Saving a voice to your voice library for future use.
Giving “acting instructions” to modulate the voice.
Generating multiple variations of the same text at once.
Providing context to maintain consistency across multiple generations.

Installation

Install the Hume CLI using npm:

$ npm install -g @humeai/cli

See usage information by running hume tts --help.

Authentication

Authenticate using the CLI:

$ hume login

This will open a browser window to the Hume AI platform, where you can retrieve your API key, and then prompt you to enter your API key.

Calling Text-to-Speech

To use Hume TTS,

Provide the text you want to speak as a positional argument.
Provide the optional --description flag to control how the voice sounds. If you don’t provide a description, Hume will examine the text and attempt to determine an appropriate voice.

$ hume tts "Take an arrow from the quiver." \
>   --description "A refined, British aristocrat"

By default, the CLI will

save the audio to the output directory (defaults to ./hume-tts-output)
attempt to play it automatically.
display the generation_id for the speech, for future reference

Saving voices

When you find a voice you like, use the hume voices create command to give it a name and save it to your voice library for future use. You can specify the generation ID:

$ hume voices create \ 
>   --name aristocrat \
>   --generation-id GENERATION_ID

or, alternatively, use the --last flag to save the most recent generation.

$ hume voices create --name aristocrat --last

Continuity

To use a voice from your library, specify its name.

$ hume tts "Now take a bow." --voice-name aristocrat

If the speech should sound like it follows from previous speech, you can provide the --context-generation-id flag with the generation_id of the previous speech.

$ # For example if PREVIOUS_GENERATION_ID refers to speech
> # about archery, 'bow' will be pronounced to rhyme with
> # 'toe' and not 'cow'.
> hume tts "Now take a bow." \
>   --voice-name aristocrat \
>   --context-generation-id GENERATION_ID

Alternatively, use the --last flag to continue from the most recent generation.

$ hume tts "Now take a bow." --voice-name aristocrat --last

Acting Instructions

If you specify both a voice and a description, the description acts as “acting instructions”. It will keep the character of the specified voice, but modulated to match the description.

$ hume tts "Does he even know how to use that thing?" \
>   --voice-name aristocrat \
>   --description "Murmured softly, with a heavy dose of sarcasm and contempt"

Generating multiple variations

To generate multiple variations of the same text at once, use the --num-generations flag.

$ hume tts "Now aim at the bulleye, nock your arrow, draw, and..." \
>   --voice-name aristocrat \
>   --num-generations 3

Other features

$ # Read from stdin
> cat poem.txt | hume tts -
> 
> # Machine-readable output
> hume tts "Hello" --reporter-mode json
> 
> # Session settings last for the duration of the terminal session
> hume session set tts.voiceName aristocrat
> hume session set tts.outputDir ~/audio
> 
> # Global settings persist until changed
> hume config set tts.play none
> hume config set reporterMode json
> # Clear them like this (will also log you out).
> hume config reset