Conversational Controls

Guide to managing the dynamics of a chat session with the Empathic Voice Interface (EVI) API.

This guide covers some key tools for creating more interactive and contextually aware EVI chat experiences: dynamic variables, context injection, pausing responses, and resuming chats.

Dynamic variables

Dynamic variables are placeholders you put in the system prompt, that you can fill with specific values at the beginning of the chat, and update to new values as the chat progresses. They are especially useful for giving EVI context that might change depending on the user or on the session - like the date, the user’s name role, or account balance, or any other dynamic or session-specific information.

Using variables in the prompt

To set up dynamic variables, first include placeholders for them in your system prompt. Use double curly braces ({{variable_name}}) to mark where each variable should appear in the text. This allows EVI to replace these placeholders dynamically with the specified values.

Sample prompt with dynamic variables
1Address the user by their name, {{name}}.
2If relevant, reference their age: {{age}}.
3It is {{is_philosopher}} that this user is a philosopher.

Visit our prompting guide for more details on adding dynamic variables to your prompt.

Assigning values in session settings

After adding placeholders for dynamic variables in your prompt, set their values by sending a Session Settings message over the WebSocket within an active Chat session. This message includes a variables parameter, with each key matching a placeholder in your prompt and each value specifying the text EVI will use.

Session settings
1{
2 "type": "session_settings",
3 "variables": {
4 "name": "David Hume",
5 "age": 65,
6 "is_philosopher": true
7 }
8}

Variable values can be strings, numbers, or booleans; however, each value is ultimately converted to a string when injected into your system prompt.

To ensure dynamic variables are recognized correctly, follow these guidelines:

  • Only assign values to referenced variables: If a variable is given a value in the “variables” field but is not referenced in the system prompt, EVI will not use it in the conversation.
  • Define all referenced variables: If a variable is referenced in the system prompt but lacks a value in the variables field, warning W0106 can be expected: "No values have been specified for the variables [variable_name], which can lead to incorrect text formatting. Please assign them values." This warning is also expected if there are spelling inconsistencies between the variable names in variables and those in the prompt.

Default dynamic variables

Hume provides built-in dynamic variables that are automatically populated and can be referenced in system prompts without needing to set their values in SessionSettings. The currently supported default variable is:

  • now: The current UTC datetime (e.g., "Nov 08, 2024 09:25 PM UTC")

    You can reference now in your system prompt to dynamically include the current UTC date and time, as shown below.

    Time-aware prompt example
    1The current datetime is {{now}}. Mention this time at the start of
    2the call, or if the user asks what time it is. Convert this UTC
    3datetime to other time zones if requested.

If you set a custom value for a default variable in SessionSettings, it will override the default value. For example, specifying a value for now in SessionSettings will replace the automatic UTC datetime with your custom value, offering flexibility when needed.

Context injection

EVI supports context injection via a Session Settings messages. The context field in a Session Settings message allows you to silently add information to the conversation, guiding EVI without triggering a response. This context is appended to the end of each user_message, ensuring that it is consistently referenced throughout the session.

Injected context can be used to remind EVI of its role, keep important details active in the conversation, or add relevant updates as needed. This method is ideal for adapting EVI’s tone or focus based on real-time changes, helping it respond more accurately without requiring repetitive input from the user.

Injected context is only active within the current session. If a chat is resumed, any previously injected context will not be carried over and must be re-injected if necessary.

Setting up context

To inject context, send a Session Settings message with a context object that includes two fields:

  • text: The content you want to inject, providing specific guidance for EVI. For example, if the user expresses frustration, you might set the context to encourage an empathetic response.
  • type: Defines how long the context remains active. Options include:
    • persistent: Appended to all user messages throughout the session, ideal for consistent guidance.
    • temporary: Applies only to the next user message, suitable for one-time adjustments.
    • editable: Allows updates to the context over time, useful for evolving needs.

    If type is not specified, it defaults to temporary.

Example: Supporting travel planning context

To tailor EVI’s responses for a travel planning scenario, you can inject context at different persistence levels based on user actions and session needs:

This context provides EVI with a consistent focus on vacation planning, helping it to make relevant suggestions or ask guiding questions throughout the session.

Session settings
1{
2 "type": "session_settings",
3 "context": {
4 "text": "The user is trying to find a destination for their next vacation.",
5 "type": "persistent"
6 }
7}

Managing context during a session

  • Clearing context: Send a Session Settings message with “context”: null to remove the injected context when it’s no longer needed.
  • Updating context dynamically: Use editable context if you need to adjust context over time, allowing for real-time updates without additional messages.

Handling interruption

Interruptibility is a key feature of EVI, allowing seamless, real-time interactions even when the user interjects mid-response. EVI handles interruptions on the backend (stopping response generation) and supports interruption on the frontend (managing audio playback) to maintain a natural conversation flow.

EVI stops generating audio when interrupted, but you are responsible for stopping playback of any audio already received on the client side to ensure a seamless, responsive experience.

How interruption works

EVI sends responses in chunks as assistant_messages, each accompanied by corresponding audio_output messages. The assistant messages contain both the content and expression measurement predictions, while the audio_output messages contain the generated audio. Once EVI completes generating a response, it sends an assistant_end message to indicate that the response is finished.

When a user message is detected during response generation, EVI stops generating the current response and sends a user_interrupt message to signal this event. This user_interrupt message instructs the client to halt audio playback, clear any remaining audio in the queue, and prepare for new input from the user.

Handling interruptions on the client side

While backend interruptions are managed by EVI, frontend interruptions—specifically stopping audio playback—require client-side handling. Both user_interruption messages (during response generation) and user_message events (after the response is complete) should trigger the client to stop audio playback for the previous response.

To handle interruptions consistently, the client should perform the following actions whenever a user_interruption or user_message is received:

  • Stop audio playback: Immediately halt playback of any ongoing audio from the previous response.
  • Clear queued audio: Remove any remaining audio segments in the queue to prevent overlap with new responses.

This approach ensures that any user interaction interrupts audio playback as expected, maintaining a natural flow by promptly responding to new user input.

If you’re using our React SDK, interruption handling is built-in. To see how it’s implemented, you can review the source code here.

Pausing responses

The pausing feature allows you to halt EVI’s audio output while keeping the session active, which is useful for managing conversation flow. For instance, a developer might create a button that lets users pause EVI’s responses if they need time to brainstorm or reflect without interruption. During this pause, EVI continues to listen and transcribe, allowing the user to interject or resume the conversation without disrupting the session. When the user is ready, they can resume EVI’s response to continue the interaction seamlessly.

How to pause responses

To pause EVI’s responses, send a pause_assistant_message, which holds all Assistant messages until a resume_assistant_message is received. When resumed, EVI responds with consideration of any user input received during the pause.

1import React from 'react';
2import { useVoice } from "@humeai/voice-react";
3
4export default function Controls() {
5 const { sendPauseAssistantMessage, sendResumeAssistantMessage } = useVoice();
6
7 return (
8 <div>
9 <button onClick={sendPauseAssistantMessage}>Pause EVI</button>
10 <button onClick={sendResumeAssistantMessage}>Resume EVI</button>
11 </div>
12 );
13}

EVI while paused

  • Response generation stops: EVI stops the generation and sending of new responses. (assistant_message and audio_output messages will not be received while paused.)
  • Tool use is disabled: Any response involving tool use will also be disabled while paused. (tool_call_message, tool_response_message, and tool_error_message messages will not be received while paused.)
  • Queued messages sent: Messages and audio queued before the pause_assistant_message are still processed and sent.
  • Continued listening: EVI continues to “listen” and transcribe user input during the pause. Transcription of user audio is are saved and are sent to the LLM as User messages.

Charges will continue to accrue while EVI is paused. If you wish to completely pause both input and output you should instead disconnect and resume the chat when ready.

EVI when resumed

When EVI receives a resume_assistant_message, it generates a response that takes into account all user input received during the pause.

  • Pausing vs. muting: Pausing EVI’s responses is distinct from muting user input. With muted input, EVI does not “hear” the user’s audio and therefore cannot respond to it. While paused, however, EVI continues to process user input and can respond when resumed.
  • Response to paused input: Upon resuming, EVI may respond to multiple points or questions raised during the pause. However, by default, EVI prioritizes the latest user input rather than attempting to address all earlier points. For instance, if the user asks two questions while EVI is paused, EVI will generally respond to the second question, unless instructed to address each item.

Resuming chats

The resumability feature allows you to reconnect to an ongoing chat session, preserving all prior conversation context. This is especially useful in cases of unexpected network failures or when a user wishes to pick up the conversation at a later time, enabling continuity without losing progress.

Implementing resumability

See steps below for how to resume a chat:

  1. Establish initial connection: Make the initial handshake request to establish the WebSocket connection. Upon successful connection, you will receive a ChatMetadata message:

    Chat metadata
    1{
    2 "type": "chat_metadata",
    3 "chat_group_id": "8859a139-d98a-4e2f-af54-9dd66d8c96e1",
    4 "chat_id": "2c3a8636-2dde-47f1-8f9e-cea27791fd2e"
    5}
  2. Store the ChatGroup reference: Save the chat_group_id from the ChatMetadata message for future use.

  3. Resume chat: To resume a chat, include the stored chat_group_id in the resumed_chat_group_id query parameter of subsequent handshake requests.

    1"use client";
    2import { VoiceProvider } from "@humeai/voice-react";
    3
    4export default function ClientComponent({
    5 accessToken,
    6}: {
    7 accessToken: string;
    8}) {
    9 return (
    10 <VoiceProvider
    11 auth={{ type: "accessToken", value: accessToken }}
    12 resumedChatGroupId='<YOUR_CHAT_GROUP_ID>'
    13 >
    14 // ...etc.
    15 </VoiceProvider>
    16 );
    17}

    When resuming a chat, you can specify a different EVI configuration than the one used in the previous session. However, changing the system prompt or supplemental LLM may result in unexpected behavior from EVI.

    Additionally, if data retention is disabled, the ability to resume chats will not be supported.