Continuation Guide
Octave supports continuation across generations. It carries context from earlier output into the next generation, keeping long-form speech coherent across multiple utterances so delivery stays natural, consistent, and emotionally continuous.
Ways to continue
1. Chain utterances in one request
- Put multiple items in the
utterances
array. - Each utterance continues only from the immediate previous utterance in the same request.
2. Continue from a previous call
Pass context in the context
field using one of:
generation_id
: continue from the most recent generation you specify.- Context utterances: supply reference utterances that guide delivery.
Aspects of continuation
Narrative coherence
For long-form audio such as audiobooks, continuation keeps the narrative cohesive across utterances. It prevents abrupt shifts in delivery, pacing, and emotion, carries energy and emotional progression forward, and lets each segment build naturally on the last for a more authentic listen.
In the examples below, the same line is delivered with different emotions based on the context set by the preceding utterance.
With positive context (excited interpretation)
With negative context (disappointed interpretation)
Linguistic context
Continuation also provides linguistic context for proper pronunciation, particularly with homographs—words that are spelled the same but pronounced differently based on meaning. For example, Octave can correctly differentiate between:
- “Take a bow.” (
/bau/
) vs. “Take a bow and arrow.” (/bō/
) - “Play the bass guitar.” (
/bās/
) vs. “Go bass fishing.” (/bas/
) - “I read the book yesterday.” (
/red/
) vs. “I will read the book tomorrow.” (/rēd/
)
Try these examples to see how Octave intelligently distinguishes between different pronunciations of the word “bow” based on contextual understanding:
With /bau/
pronunciation
With /bō/
pronunciation
Consistent voice
When continuing from an utterance, Octave intelligently handles voice consistency:
- Octave automatically continues using the same voice from the previous utterance.
- You only need to specify a voice when you want to change from the currently established one.
Below are sample requests which show how you can continue with the same voice:
For more information on specifying a voice in your request, see our voices guide.
Multiple utterances in a single request
Continuing from previous generation using context
Changing voices mid-conversation
This intelligent handling of voice consistency saves development effort and ensures a seamless listening experience, making it easier to create dynamic, multi-character narratives without redundant voice specifications.
Notes and constraints
- Continuation is scoped to the immediate preceding utterance only. It does not skip back to earlier utterances or generations.
- Only items in utterances are synthesized. Items in context are reference-only.
- Context utterances add latency because Octave must first generate the speech tokens it will continue from.
- Octave supports multi-speaker continuation. You can keep the current voice or continue from speech generated with a different voice.