Turn detection
Tune voice activity detection and end-of-turn behavior.
Turn detection controls how EVI picks up speech and determines when a user has finished their turn. Configure
these settings through the optional turn_detection field on an EVI config. All fields fall back to defaults
when omitted.
Tuning turn detection lets you shape the pacing of a conversation. You can make EVI wait longer before responding, react more quickly when the user finishes speaking, or filter out background noise more aggressively depending on your application’s needs.
Supported parameters
1. End of turn silence
end_of_turn_silence_ms: How long the user must be silent before EVI considers their turn complete and begins
generating a response.
- Lower values lead to faster responses but increase the chance of EVI responding during a mid-thought pause.
- Higher values give users more room to pause without triggering a response.
Default: 800ms | Range: 500ms to 3,000ms
2. Speech detection threshold
speech_detection_threshold: How confident the system must be that audio contains speech before it begins processing.
- Lower values increase sensitivity, capturing softer speech at the cost of more noise-triggered processing.
- Higher values require clearer audio to register as speech, reducing false activations from background noise but potentially missing quieter speech.
Default: 0.5 | Range: 0.0 to 1.0
3. Prefix padding
prefix_padding_ms: The duration of audio captured before the detected start of speech. This ensures the beginning of
an utterance is not clipped.
- Higher values preserve more of the lead-in to speech, which can improve transcription accuracy for words that start abruptly.
Default: 300ms | Range: 0ms to 1,000ms
Example configurations
Patient turn-taking
Give users more time to pause mid-thought without EVI treating the silence as a completed turn. This is useful when users need to think before finishing a sentence, such as in language learning or interview preparation.
For a fully patient interaction, pair this with a higher min_interruption_ms in the
interruption config so hesitations don’t cut off EVI
mid-response.
Responsive turn-taking
Make EVI begin responding sooner after the user stops speaking. This creates a snappier conversational feel suited to fast-paced interactions like scheduling or quick Q&A.
Noise-tolerant detection
Require clearer audio before treating input as speech and capture more lead-in audio to prevent clipping in noisy conditions.

