Conversational Controls
Guide to managing the dynamics of a chat session with the Empathic Voice Interface (EVI) API.
This guide covers some key tools for creating more interactive and contextually aware EVI chat experiences: dynamic variables, context injection, pausing responses, and resuming chats.
Dynamic variables
Dynamic variables are placeholders you put in the system prompt, that you can fill with specific values at the beginning of the chat, and update to new values as the chat progresses. They are especially useful for giving EVI context that might change depending on the user or on the session - like the date, the user’s name role, or account balance, or any other dynamic or session-specific information.
Using variables in the prompt
To set up dynamic variables, first include placeholders for them in your system prompt. Use double curly braces ({{variable_name}}) to mark where each variable should appear in the text. This allows EVI to replace these placeholders dynamically with the specified values.
Visit our prompting guide for more details on adding dynamic variables to your prompt.
Assigning values in session settings
After adding placeholders for dynamic variables in your prompt, set their values by sending a Session Settings message over the WebSocket within an active Chat session. This message includes a variables parameter, with each key matching a placeholder in your prompt and each value specifying the text EVI will use.
Variable values can be strings, numbers, or booleans; however, each value is ultimately converted to a string when injected into your system prompt.
To ensure dynamic variables are recognized correctly, follow these guidelines:
- Only assign values to referenced variables: If a variable is given a value in the “variables” field but is not referenced in the system prompt, EVI will not use it in the conversation.
- Define all referenced variables: If a variable is referenced in the system prompt but lacks a value in the
variables
field, warningW0106
can be expected:"No values have been specified for the variables [variable_name], which can lead to incorrect text formatting. Please assign them values."
This warning is also expected if there are spelling inconsistencies between the variable names invariables
and those in the prompt.
Default dynamic variables
Hume provides built-in dynamic variables that are automatically populated and can be referenced in system prompts without needing to set their values
in SessionSettings
. The currently supported default variable is:
-
now: The current UTC datetime (e.g.,
"Nov 08, 2024 09:25 PM UTC"
)You can reference
now
in your system prompt to dynamically include the current UTC date and time, as shown below.Time-aware prompt example
If you set a custom value for a default variable in SessionSettings
, it will override the default value. For example, specifying a value for now
in SessionSettings
will replace the automatic UTC datetime with your custom value, offering flexibility when needed.
Context injection
EVI supports context injection via a Session Settings messages. The context field in a Session Settings message allows you to silently add information to the conversation, guiding EVI without triggering a response. This context is appended to the end of each user_message, ensuring that it is consistently referenced throughout the session.
Injected context can be used to remind EVI of its role, keep important details active in the conversation, or add relevant updates as needed. This method is ideal for adapting EVI’s tone or focus based on real-time changes, helping it respond more accurately without requiring repetitive input from the user.
Injected context is only active within the current session. If a chat is resumed, any previously injected context will not be carried over and must be re-injected if necessary.
Setting up context
To inject context, send a Session Settings message with a context
object that includes two fields:
- text: The content you want to inject, providing specific guidance for EVI. For example, if the user expresses frustration, you might set the context to encourage an empathetic response.
- type: Defines how long the context remains active. Options include:
- persistent: Appended to all user messages throughout the session, ideal for consistent guidance.
- temporary: Applies only to the next user message, suitable for one-time adjustments.
- editable: Allows updates to the context over time, useful for evolving needs.
If
type
is not specified, it defaults totemporary
.
Example: Supporting travel planning context
To tailor EVI’s responses for a travel planning scenario, you can inject context at different persistence levels based on user actions and session needs:
Persistant
Editable
Temporary
This context provides EVI with a consistent focus on vacation planning, helping it to make relevant suggestions or ask guiding questions throughout the session.
Managing context during a session
- Clearing context: Send a Session Settings message with “context”: null to remove the injected context when it’s no longer needed.
- Updating context dynamically: Use editable context if you need to adjust context over time, allowing for real-time updates without additional messages.
Handling interruption
Interruptibility is a key feature of EVI, allowing seamless, real-time interactions even when the user interjects mid-response. EVI handles interruptions on the backend (stopping response generation) and supports interruption on the frontend (managing audio playback) to maintain a natural conversation flow.
EVI stops generating audio when interrupted, but you are responsible for stopping playback of any audio already received on the client side to ensure a seamless, responsive experience.
How interruption works
EVI sends responses in chunks as assistant_messages, each accompanied by
corresponding audio_output messages. The assistant messages
contain both the content and expression measurement predictions, while the audio_output
messages contain the generated audio. Once EVI completes generating a
response, it sends an assistant_end message to indicate that the response is
finished.
When a user message is detected during response generation, EVI stops generating the current response and sends a user_interrupt message to signal this event. This user_interrupt message instructs the client to halt audio playback, clear any remaining audio in the queue, and prepare for new input from the user.
Handling interruptions on the client side
While backend interruptions are managed by EVI, frontend interruptions—specifically stopping audio playback—require client-side handling. Both user_interruption
messages (during response generation) and user_message
events (after the response is complete) should trigger the client to stop audio playback for the previous
response.
To handle interruptions consistently, the client should perform the following actions whenever a user_interruption
or user_message
is received:
- Stop audio playback: Immediately halt playback of any ongoing audio from the previous response.
- Clear queued audio: Remove any remaining audio segments in the queue to prevent overlap with new responses.
This approach ensures that any user interaction interrupts audio playback as expected, maintaining a natural flow by promptly responding to new user input.
Pausing responses
The pausing feature allows you to halt EVI’s audio output while keeping the session active, which is useful for managing conversation flow. For instance, a developer might create a button that lets users pause EVI’s responses if they need time to brainstorm or reflect without interruption. During this pause, EVI continues to listen and transcribe, allowing the user to interject or resume the conversation without disrupting the session. When the user is ready, they can resume EVI’s response to continue the interaction seamlessly.
How to pause responses
To pause EVI’s responses, send a pause_assistant_message, which holds all Assistant messages until a resume_assistant_message is received. When resumed, EVI responds with consideration of any user input received during the pause.
EVI while paused
- Response generation stops: EVI stops the generation and sending of new responses. (assistant_message and audio_output messages will not be received while paused.)
- Tool use is disabled: Any response involving tool use will also be disabled while paused. (tool_call_message, tool_response_message, and tool_error_message messages will not be received while paused.)
- Queued messages sent: Messages and audio queued before the
pause_assistant_message
are still processed and sent. - Continued listening: EVI continues to “listen” and transcribe user input during the pause. Transcription of user audio is saved and are sent to the LLM as User messages.
Charges will continue to accrue while EVI is paused. If you wish to completely pause both input and output you should instead disconnect and resume the chat when ready.
EVI when resumed
When EVI receives a resume_assistant_message
, it generates a response that takes into account all user
input received during the pause.
- Pausing vs. muting: Pausing EVI’s responses is distinct from muting user input. With muted input, EVI does not “hear” the user’s audio and therefore cannot respond to it. While paused, however, EVI continues to process user input and can respond when resumed.
- Response to paused input: Upon resuming, EVI may respond to multiple points or questions raised during the pause. However, by default, EVI prioritizes the latest user input rather than attempting to address all earlier points. For instance, if the user asks two questions while EVI is paused, EVI will generally respond to the second question, unless instructed to address each item.
Resuming chats
The resumability feature allows you to reconnect to an ongoing chat session, preserving all prior conversation context. This is especially useful in cases of unexpected network failures or when a user wishes to pick up the conversation at a later time, enabling continuity without losing progress.
Implementing resumability
See steps below for how to resume a chat:
-
Establish initial connection: Make the initial handshake request to establish the WebSocket connection. Upon successful connection, you will receive a ChatMetadata message:
Chat metadata -
Store the ChatGroup reference: Save the chat_group_id from the
ChatMetadata
message for future use. -
Resume chat: To resume a chat, include the stored
chat_group_id
in the resumed_chat_group_id query parameter of subsequent handshake requests.When resuming a chat, you can specify a different EVI configuration than the one used in the previous session. However, changing the system prompt or supplemental LLM may result in unexpected behavior from EVI.
Additionally, if data retention is disabled, the ability to resume chats will not be supported.