Turn detection

Tune voice activity detection and end-of-turn behavior.

Turn detection controls how EVI picks up speech and determines when a user has finished their turn. Configure these settings through the optional turn_detection field on an EVI config. All fields fall back to defaults when omitted.

Tuning turn detection lets you shape the pacing of a conversation. You can make EVI wait longer before responding, react more quickly when the user finishes speaking, or filter out background noise more aggressively depending on your application’s needs.

Supported parameters

1. End of turn silence

end_of_turn_silence_ms: How long the user must be silent before EVI considers their turn complete and begins generating a response.

  • Lower values lead to faster responses but increase the chance of EVI responding during a mid-thought pause.
  • Higher values give users more room to pause without triggering a response.

Default: 800ms | Range: 500ms to 3,000ms

2. Speech detection threshold

speech_detection_threshold: How confident the system must be that audio contains speech before it begins processing.

  • Lower values increase sensitivity, capturing softer speech at the cost of more noise-triggered processing.
  • Higher values require clearer audio to register as speech, reducing false activations from background noise but potentially missing quieter speech.

Default: 0.5 | Range: 0.0 to 1.0

3. Prefix padding

prefix_padding_ms: The duration of audio captured before the detected start of speech. This ensures the beginning of an utterance is not clipped.

  • Higher values preserve more of the lead-in to speech, which can improve transcription accuracy for words that start abruptly.

Default: 300ms | Range: 0ms to 1,000ms

Example configurations

Patient turn-taking

Give users more time to pause mid-thought without EVI treating the silence as a completed turn. This is useful when users need to think before finishing a sentence, such as in language learning or interview preparation.

1{
2 "turn_detection": {
3 "end_of_turn_silence_ms": 2000
4 }
5}

For a fully patient interaction, pair this with a higher min_interruption_ms in the interruption config so hesitations don’t cut off EVI mid-response.

Responsive turn-taking

Make EVI begin responding sooner after the user stops speaking. This creates a snappier conversational feel suited to fast-paced interactions like scheduling or quick Q&A.

1{
2 "turn_detection": {
3 "end_of_turn_silence_ms": 500
4 }
5}

Noise-tolerant detection

Require clearer audio before treating input as speech and capture more lead-in audio to prevent clipping in noisy conditions.

1{
2 "turn_detection": {
3 "speech_detection_threshold": 0.7,
4 "prefix_padding_ms": 500
5 }
6}