Acting Instructions Guide

Guide to controlling voice expression in Octave TTS through acting instructions, speed settings, and silence parameters.

The description field for acting instructions is available for Octave 1 only. Support for description with Octave 2 is coming soon. The speed and trailing_silence fields are supported in all models.

Octave supports supplying acting instructions to guide aspects of speech delivery:

  • Emotional tone: happiness, sadness, excitement, nervousness, etc.
  • Delivery style: whispering, shouting, rushed speaking, measured pace, etc.
  • Performance context: speaking to a crowd, intimate conversation, etc.
  • Speaking rate: the rate at which the speech is delivered, faster or slower.
  • Trailing silence: injecting pauses in the speech for a specified duration in seconds.

See our Prompting Guide for more detailed information and best practices for prompting Octave.

In the following section, we’ll explore the ways in which you can provide acting instructions to Octave through the API.

Providing acting instructions

The TTS API offers parameters which allow you to control how an individual utterance is performed. These parameters can be used individually or combined for precise control over speech output:

  • description: provide acting instructions in natural language.
  • speed: adjust the relative speaking rate on a non-linear scale from 0.5 (much slower) to 2.0 (much faster), where 1.0 represents normal speaking pace. Note that changes are not proportional to the value provided - for example, setting speed to 2.0 will make speech faster but not exactly twice as fast as the default.
  • trailing_silence: specify a duration of trailing silence (in seconds) to add to an utterance.

In this section we’ll leverage acting instructions to guide Octave’s speech output for guided meditation.

Before we apply acting instructions, let’s first take a look at a request that does not contain any acting instructions:

1curl "https://api.hume.ai/v0/tts/stream/json" \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [
5 {
6 "text": "Let us begin by taking a deep breath in.",
7 "voice": {
8 "name": "Ava Song",
9 "provider": "HUME_AI"
10 }
11 },
12 {
13 "text": "Now, slowly exhale.",
14 "voice": {
15 "name": "Ava Song",
16 "provider": "HUME_AI"
17 }
18 }
19 ]
20 }'

Without acting instructions, Octave will infer how to deliver the speech from the base voice’s description and the provided text input.

In the following steps, we’ll iteratively improve Octave’s delivery by specifying different types of acting instructions to better simulate guided meditation.

1

Guide delivery with natural language

Let’s begin by providing a description to guide the delivery of these utterances to be calmer and more instructive:

1curl "https://api.hume.ai/v0/tts/stream/json" \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [
5 {
6 "text": "Let us begin by taking a deep breath in.",
7 "description": "calm, pedagogical",
8 "voice": {
9 "name": "Ava Song",
10 "provider": "HUME_AI"
11 }
12 },
13 {
14 "text": "Now, slowly exhale.",
15 "description": "calm, serene",
16 "voice": {
17 "name": "Ava Song",
18 "provider": "HUME_AI"
19 }
20 }
21 ]
22 }'

When you don’t specify a voice, the description field serves as a voice prompt for creating a new voice. See our Voice Design guide for details.

2

Control speed of delivery

While the descriptions help to make the voice sound more appropriate for our use case, we now want to adjust the speed of delivery to be slower to create an atmosphere better suited for meditation:

1curl "https://api.hume.ai/v0/tts/stream/json" \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [
5 {
6 "text": "Let us begin by taking a deep breath in.",
7 "description": "calm, pedagogical",
8 "voice": {
9 "name": "Ava Song",
10 "provider": "HUME_AI"
11 },
12 "speed": 0.65
13 },
14 {
15 "text": "Now, slowly exhale.",
16 "description": "calm, serene",
17 "voice": {
18 "name": "Ava Song",
19 "provider": "HUME_AI"
20 },
21 "speed": 0.65
22 }
23 ]
24 }'
3

Injecting pauses

Finally, in this guided meditation, it would be helpful to give the participants some time to actually take a breath! To achieve this we can introduce a pause between utterances by specifying a trailing silence duration for the first utterance.

To inject natural breaks within an utterance, try using [pause] or [long pause] in your text. Example: “Haha [pause] I didn’t realize this was going to be a formal event.”

1curl "https://api.hume.ai/v0/tts" \
2 -H "X-Hume-Api-Key: $HUME_API_KEY" \
3 --json '{
4 "utterances": [
5 {
6 "text": "Let us begin by taking a deep breath in.",
7 "description": "calm, pedagogical",
8 "voice": {
9 "name": "Ava Song",
10 "provider": "HUME_AI"
11 },
12 "speed": 0.65,
13 "trailing_silence": 4
14 },
15 {
16 "text": "Now, slowly exhale.",
17 "description": "calm, serene",
18 "speed": 0.65
19 }
20 ]
21 }'

Combine natural language descriptions, speed adjustments, and pauses to control Octave’s delivery. In the meditation example, these settings turn a simple line into naturally paced speech. Tune these controls together to match your intended delivery.

Best practices

  • Keep it concise: Short instructions work best—aim for no more than 100 characters. Instead of long phrases like “The speaker is scared and in a hurry to leave”, write “frightened, rushed”.
  • Use precise emotions: Instead of broad terms like “sad”, use specific emotions like “melancholy” or “frustrated”.
  • Combine for nuance: Pair emotions with delivery styles, e.g., “excited but whispering” or “confident, professional tone”.
  • Indicate pacing: Use terms like “rushed”, “measured”, “deliberate pause” to adjust speech rhythm.
  • Specify the audience: Instructions like “speaking to a child” or “addressing a large crowd” help shape delivery.
  • Use speed for adjusting speech rate: Rather than using the description field to instruct slower or faster speech, leverage the speed parameter.

Examples

The table below demonstrates how acting instructions can transform the same text into different delivery styles:

Text InputActing InstructionOutput Style
”Are you serious?“whispering, hushedSoft, secretive tone
”We need to move, now!“urgent, panickedFast, tense delivery
”Welcome, everyone.”warm, invitingFriendly, engaging tone
”I can’t believe this…“sarcasticDry, exaggerated inflection