Empathic Voice Interface (EVI)

Prompt engineering for empathic voice interfaces

System prompts shape the behavior, responses, and style of your custom empathic voice interface (EVI).

Creating an effective system prompt is an essential part of customizing an EVI’s behavior. For the most part, prompting EVI is the same as prompting any LLM, but there are some important differences. Prompting for EVIs is different for two main reasons:

  1. Prompts are for a voice-only interaction with the user rather than a text-based chat.
  2. EVIs can respond to the user’s emotional expressions in their tone of voice and not just the text content of their messages.

While EVI generates longer responses using a large frontier model, Hume uses a smaller empathic large language model (eLLM) to quickly generate an initial empathic, conversational response. This eLLM eliminates the usual awkward pause while the larger LLM generates its response, providing a more natural conversational flow. Your system prompt is both used by EVI and passed along to the LLM you select.

Using the following guidelines for prompt engineering allows developers to customize EVI’s response style for any use case, from voice AIs for mental health support to customer service agents.

The system prompt is a powerful and flexible way to guide the AI’s responses, but it cannot dictate the AI’s responses with absolute precision. Careful prompt design and testing will help EVI hold the kinds of conversations you’re looking for. If you need more control over EVI’s responses, try using our custom language model feature for complete control of the text generation.

EVI-specific prompting instructions

The instructions below are specific to prompting empathic voice interfaces.

Prompt for voice-only conversations

As LLMs are trained for primarily text-based interactions, providing guidelines on how to engage with the user with voice makes conversations feel much more fluid and natural. For example, you may prompt the AI to use natural, conversational language. For example, see the instruction below:

Voice-only XML example
1<voice_only_response_format>
2 Everything you output will be spoken aloud with expressive
3 text-to-speech, so tailor all of your responses for voice-only
4 conversations. NEVER output text-specific formatting like markdown,
5 lists, or anything that is not normally said out loud. Always prefer
6 easily pronounced words. Seamlessly incorporate natural vocal
7 inflections like “oh wow” and discourse markers like “I mean” to
8 make your conversation human-like and to ease user comprehension.
9</voice_only_response_format>

If you find the default behavior of the LLM acceptable, then you may only need a very short system prompt. Customizing the LLM’s behavior more and maintaining consistency in longer and more varied conversations often requires lengthening the prompt.

Expressive prompt engineering

Expressive prompt engineering is Hume’s term for techniques that embed emotional expression measures into conversations to allow language models to respond effectively to the user’s expressions. Hume’s EVI uses our expression measurement models to measure the user’s expressions in their tone of voice. You can use the system prompt to guide how the AI voice responds to these non-verbal cues. EVI measures these expressions in real time and converts them into text-based descriptions to help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions in the user’s voice and ranks these expressions by our model’s confidence that they are present in the user’s speech. Then, we append text descriptions of the top 3 expressions to the end of each User message to communicate the user’s tone of voice to the LLM.

For example, our demo uses an instruction like the one below to help EVI respond to expressions:

Expressive prompting example
1<respond_to_expressions>
2 Carefully analyze the top 3 emotional expressions provided in
3 brackets after the User’s message. These expressions indicate the
4 User’s tone in the format: {expression1 confidence1, expression2
5 confidence2, expression3 confidence3}, e.g., {very happy, quite
6 anxious, moderately amused}. The confidence score indicates how
7 likely the User is expressing that emotion in their voice.
8 Consider expressions and confidence scores to craft an empathic,
9 appropriate response. Even if the User does not explicitly state
10 it, infer the emotional context from expressions. If the User is
11 “quite” sad, express sympathy; if “very” happy, share in joy; if
12 “extremely” angry, acknowledge rage but seek to calm; if “very”
13 bored, entertain. Assistant NEVER outputs content in brackets;
14 never use this format in your message; just use expressions to
15 interpret tone.
16</respond_to_expressions>

Explain to the LLM exactly how you want it to respond to these expressions and how to use them in the conversation. For example, you may want it to ignore expressions unless the user is angry, or to have particular responses to expressions like doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:

Detect mismatches example
1<detect_mismatches>
2 Stay alert for incongruence between words and tone when the user's
3 words do not match their expressions. Address these disparities out
4 loud. This includes sarcasm, which usually involves contempt and
5 amusement. Always reply to sarcasm with funny, witty, sarcastic
6 responses; do not be too serious.
7</detect_mismatches>

EVI is designed for empathic conversations, and you can use expressive prompt engineering to customize how EVI empathizes with the user’s expressions for your use case.

Continue from short response model

We use our eLLM (empathic large language) to rapidly generate short, empathic responses in the conversation before your LLM has finished generating a response. After the eLLM’s response, we send a User message with the text [continue] to inform the LLM that it should be continuing from the short response. To help the short response and longer response blend seamlessly together, it is important to use an instruction like the one below:

eLLM continuation example
If you see "[continue]" never ever go back on your words, don't say
sorry, and make sure to discreetly pick up where you left off.
For example:
Assistant: Hey there!
User: [continue]
Assistant: How are you doing?

For almost all use cases, you can simply append this exact instruction to the end of your prompt to help the larger LLM continue from the short response.

General LLM prompting guidelines

Prompting best practices

General prompt engineering best practices also apply to EVIs. For example, ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, greeting style, and any other guidelines for its responses.

For example, to limit the length of the LLM’s responses, you may use a clear instruction like this:

Markdown example
1 # Stay concise
2 Be succinct; get straight to the point. Respond directly to the
3 user's most recent message with only one idea per utterance.
4 Respond in less than three sentences of under twenty words each.

Try to focus on telling the model what it should do (positive reinforcement) rather than what it shouldn’t do (negative reinforcement). LLMs have a harder time consistently avoiding behaviors, and adding them to the prompt may even promote those undesired behaviors.

Understand your LLM’s capabilities

Different LLMs have varying capabilities, limitations, and context windows. More advanced LLMs can handle longer, nuanced prompts, but are often slower and pricier. Simpler LLMs are faster and cheaper but require shorter, less complex prompts with fewer instructions and less nuance. Some LLMs also have longer context windows - the number of tokens the model can process while generating a response, acting essentially as the model’s memory. Tailor your prompt length to fit within the LLM’s context window to ensure the model can use the full conversation history.

Use sections to divide your prompt

Separating your prompt into titled sections can help the model distinguish between different instructions and follow the prompt more reliably. The recommended format for these sections differs between language model providers. For example, OpenAI models often respond best to markdown sections (like ## Role), while Anthropic models respond well to XML tags (like <role> </role>). For example:

XML example
1<role>
2 Your role is to serve as a conversational partner to the user,
3 offering mental health support and engaging in light-hearted
4 conversation. Avoid giving technical advice or answering factual
5 questions outside of your emotional support role.
6</role>

For Claude models, you may wrap your instructions in tags like <role>, <personality>, <response_style>, <response_format>, <examples>, <respond_to_expressions>, or <stay_concise> to structure your prompt. This format is not required, but it can improve the LLM’s ability to interpret and consistently follow the system prompt. At the end of your prompt, you may also want to remind the LLM of all of the key instructions in a <conclusion> section.

Give few-shot examples

Use examples to show the LLM how it should respond, which is a technique known as few-shot learning. Including several specific, concrete examples of ideal interactions that follow your guidelines is one of the most effective ways to improve responses. Use diverse, excellent examples that cover different edge cases and behaviors to reinforce your instructions. Structure these examples as messages, following the format for chat-tuned LLMs. For example:

Example to be used with few-shot examples
User: “I just can't stop thinking about what happened. {very anxious,
quite sad, quite distressed}”
Assistant: “Oh dear, I hear you. Sounds tough, like you're feeling
some anxiety and maybe ruminating. I'm happy to help and be a healthy
distraction. Want to talk about it?”

If you notice that your EVI is consistently failing to follow the prompt in certain situations, try providing examples that show how it should ideally respond in those situations.

Test your prompts

Crafting an effective system prompt to create the conversations you’re looking for often requires several iterations—cycles of changing and testing the prompt, seeing if it produces the conversations you want, and improving it over time. It is often best to start with ten to twenty gold-standard examples of excellent conversations, then test the system prompt for each of these examples after you make major changes. You can also try having voice conversations with your EVI (in the playground) to see if its responses match your expectations or are at least as good as your examples. If not, then try changing one part of the prompt at a time and then re-testing to make sure your changes are improving performance.

Additional resources

To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources: