Controls how audio output is segmented in the response.
-
When enabled (true
), input utterances are automatically split into natural-sounding speech segments.
-
When disabled (false
), the response maintains a strict one-to-one mapping between input utterances and output snippets.
This setting affects how the snippets
array is structured in the response, which may be important for applications that need to track the relationship between input text and generated audio segments. When setting to false
, avoid including utterances with long text
, as this can result in distorted output.