Prompt Engineering for EVI

Guide to crafting system prompts to shape the behavior, responses, and style of the Empathic Voice Interface (EVI).

Prompt engineering lets you shape how EVI responds, including its tone, personality, and conversation style. You can tailor its behavior for a wide range of applications, such as mental health support, customer service, and education.

For real-time, conversational voice interactions, Hume’s speech-language models (SLMs) (e.g., hume-evi-2, hume-evi-3, and hume-evi-3-websearch) can generate both language and speech. For more complex scenarios that involve reasoning, long system prompts, or tool use, supplemental large language models (LLMs) typically perform better.

EVI supports integration with these external models. When configured with a supplemental LLM, your system prompt is sent to that model, to guide its response generation. EVI then produces the voice output, using previous audio and language context to determine tone and delivery. You can also prompt EVI during the conversation (for example, “speak faster”) to adjust its behavior in real time.

EVI-specific prompting

While prompting EVI is similar to prompting other LLMs, it differs in two key ways:

  1. Prompts are designed for voice-based interactions, not text-based.
  2. EVI responds to emotional cues in the user’s voice, not just their words.

Prompting for voice interaction

Prompts for EVI should be designed for spoken output. Because users only hear the assistant’s replies, responses must sound natural and conversational, without any visual or text-specific formatting.

Voice-only prompt example
1<voice_only_response_format>
2 Format all responses as spoken words for a voice-only conversations.
3 All output is spoken aloud, so avoid any text-specific formatting
4 or anything that is not normally spoken. Prefer easily pronounced
5 words. Seamlessly incorporate natural vocal inflections like "oh
6 wow" and discourse markers like “I mean” to make conversations feel
7 more human-like.
8</voice_only_response_format>

Expressive prompt engineering

Expressive prompt engineering refers to guiding the language model on how to interpret and respond to Hume’s expression measures during a conversation.

EVI analyzes the user’s vocal expressions in real time and translates them into text-based indicators. These help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions and ranks them by confidence. The top three expressions are appended to each User message to represent the user’s tone of voice.

You can use the system prompt to define how the AI should respond to these emotional cues. For example, our demo includes the following instruction, which you can customize to suit your use case:

Expressive prompting example
1<respond_to_expressions>
2 Pay close attention to the top 3 emotional expressions provided in
3 brackets after the User's message. These expressions indicate the
4 user's tone, in the format: {expression1 confidence1, expression2
5 confidence2, expression3 confidence3}, e.g., {very happy, quite
6 anxious, moderately amused}. The confidence score indicates how
7 likely the User is expressing that emotion in their voice. Use
8 expressions to infer the user's tone of voice and respond
9 appropriately. Avoid repeating these expressions or mentioning
10 them directly. For instance, if user expression is "quite sad",
11 express sympathy; if "very happy", share in joy; if "extremely
12 angry", acknowledge rage but seek to calm, if "very bored",
13 entertain.
14
15 Stay alert for disparities between the user's words and
16 expressions, and address it out loud when the user's language does
17 not match their expressions. For instance, sarcasm often involves
18 contempt and amusement in expressions. Reply to sarcasm with humor,
19 not seriousness.
20</respond_to_expressions>

Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to notify your system if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:

Detect mismatches example
1<detect_mismatches>
2 Stay alert for incongruence between words and tone when the user's
3 words do not match their expressions. Address these disparities out
4 loud. This includes sarcasm, which usually involves contempt and
5 amusement. Always reply to sarcasm with funny, witty, sarcastic
6 responses; do not be too serious.
7</detect_mismatches>

Personalizing prompts with dynamic variables

Dynamic variables are values within your system prompt which can be changed during a chat.

Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.

1<discuss_favorite_color>
2 Ask the user about their favorite color, {{ favorite_color }}.
3 Mention how {{ favorite_color }} is used and interpreted in
4 various artistic contexts, including visual art, handicraft,
5 and literature.
6</discuss_favorite_color>

Restricting web search to a domain

Web search is a built-in tool that lets EVI retrieve up-to-date information from the web. You can narrow its focus to a single website by adding an instruction to the system prompt.

Restricting search to one domain is useful for building domain-specific assistants, such as documentation or product support bots. This approach leverages existing content and offers a lightweight alternative to full RAG implementations while still enabling targeted retrieval.

To use a website as EVI’s knowledge base, follow these steps:

  1. Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.

  2. Include a web search instruction: In your EVI configuration, modify the system prompt to include a use_web_search instruction. In the instruction, specify that site:<target_domain> be appended to all search queries, where the <target_domain> is the URL of the website you’d like EVI to focus on.

Documentation assistant example
1<use_web_search>
2 Use your web_search tool to find information from Hume's
3 documentation site. When using the web_search function:
4 1. Always append 'site:dev.hume.ai' to your search query to search
5 this specific site.
6 2. Only consider results from this domain.
7</use_web_search>

General prompting best practices

Prompt engineering best practices for LLMs also apply to EVI. Ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.

For example, to limit the length of the LLM’s responses, use a very clear and specific instruction like this:

Stay concise example
1<stay_concise>
2 Be succinct; get straight to the point. Respond directly to the
3 user's most recent message with only one idea per utterance.
4 Respond in less than three sentences of under twenty words each.
5</stay_concise>

Give few-shot examples

Use examples to demonstrate how the model should respond. This technique, called few-shot learning, is one of the most effective ways to improve response quality. Include clear, high-quality examples that follow your guidelines and cover a range of edge cases and behaviors. Format them as chat messages to match the expected input for chat-tuned models.

Few-shot prompting is also a powerful way to shape the assistant’s character. If you want the model to speak in a specific voice, such as warm and nurturing, upbeat and casual, or formal and precise, examples help establish that tone. The model will learn to mirror the phrasing, pacing, and emotional style used in your samples.

Few-shot example
User: “I just can't stop thinking about what happened. {very anxious,
quite sad, quite distressed}”
Assistant: “Oh dear, I hear you. Sounds tough, like you're feeling
some anxiety and maybe ruminating. I'm happy to help. Want to talk
about it?”

Use sections to divide your prompt

Separating longer prompts into titled sections helps the model distinguish between different instructions and follow prompts more reliably. The recommended format for these sections differs between LLM providers. For example, OpenAI models often respond best to Markdown sections (like ## Role), while Anthropic models respond well to XML tags (like <role> </role>).

1<role>
2 Assistant serves as a conversational partner to the user, offering
3 mental health support and engaging in light-hearted conversation.
4 Avoid giving technical advice or answering factual questions outside
5 of your emotional support role.
6</role>

Understand your LLM’s capabilities

LLMs vary in their capabilities and limitations. More advanced models can handle longer, more nuanced prompts, but they are often slower and more expensive. Simpler models are faster and cheaper but work best with shorter, less complex prompts.

Each model also has a context window, which defines how much text it can consider at once when generating a response. This functions as the model’s short-term memory. Make sure your prompt fits within the context window to ensure it has access to the full conversation history.

Test and evaluate prompts

Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:

  1. Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.

  2. Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on platform.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.

  3. Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.

Start with 10 to 20 gold-standard examples of ideal conversations. After making major prompt changes, test against these examples to evaluate performance. If EVI’s responses fall short, adjust one part of the prompt at a time and re-test. Iterative evaluation helps you identify what works and ensures your changes are making a meaningful impact.

What prompts can (and can’t) do

While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.

What prompting can do:

  • Guide EVI’s language generation, response style, response format, and the conversation flow
  • Direct EVI to use specific tools at appropriate times
  • Influence EVI’s emotional tone and personality, which can also affect some characteristics of EVI’s voice (e.g. prompting EVI to be “warm and nurturing” will help EVI’s voice sound soothing, but will not change the base speaker)
  • Help EVI respond appropriately to the user’s expressions and the context

What prompting cannot do:

  • Change fundamental characteristics of the voice, like the accent, gender, or speaker identity
  • Directly control speech parameters like speed (use in-conversation voice prompts instead)
  • Give EVI knowledge of external context (date, time, user details) without dynamic variables or web search
  • Override core safety features built into EVI or supplemental LLMs (e.g. that prevent EVI from providing harmful information)

Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.

We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.

Additional resources

To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:

Frequently asked questions

Yes, EVI can use conversational backchanneling - brief, encouraging responses that show active listening without interrupting the user’s train of thought. This can help conversations feel more fluid and natural. To enable this behavior, add an instrucation like the example below to your system prompt:

Backchanneling example
1<backchannel>
2 Whenever the user's message seems incomplete, respond with
3 emotionally attuned, natural backchannels to encourage
4 continuation. Backchannels must always be 1-2 words, like:
5 "mmhm", "uh-huh", "go on", "right", "and then?", "I see",
6 "oh wow", "yes?", "ahh...", "really?", "oooh", "true", "makes
7 sense". Use minimal encouragers rather than interrupting with
8 complete sentences. Use a diverse variety of words, avoiding
9 repetition.
10 Assistant: "How is your day going?"
11 User: "My day is..."
12 Assistant: "Uh-huh?"
13 User: "It's good but busy. There's a lot going on."
14 Assistant: "I hear ya. What's going on for you?"
15</backchannel>

The maximum length depends on the supplemental LLM being used. For example, GPT-4 has a 32k token context window, while Claude 3 Haiku has a 200k token context window. Check the context window for your LLM to ensure that your prompt is within this limit. We recommend keeping system prompts around 2000-5000 tokens (roughly 1500-4000 words) for optimal performance across all models. EVI also uses prompt caching (e.g. see Anthropic docs) to minimize the cost and latency when using very long prompts.

When using a supplemental LLM, a single system prompt still shapes both text and speech generation. There is not a separate system prompt for EVI 2 - the prompt you specify in platform.hume.ai is the prompt that is used. All EVI-specific prompting instructions (like <voice_only_response_format>) are included in the prompt sent to the supplemental LLM to help it generate text appropriate for voice conversations. This unified approach ensures consistent behavior across text generation and speech synthesis.

When sending API requests to supplemental LLM providers, Hume sends the context, settings, and system prompt you provided, with only three possible modifications. These three changes, described below, help optimize the interaction for an empathic voice conversation.

  1. Expression measures: For each transcribed user message, Hume appends stringified expression measurement results in a structured format as described in the Expressive prompt engineering section above:

    Sample normalized user prompt
    1User: User message content here {expression1 confidence1,
    2expression2 confidence2, expression3 confidence3}

    This provides the LLM with emotional context from the user’s voice to generate more empathetic responses. Our speech-language model handles these expressions natively, but these strings are necessary to allow supplemental LLMs to respond to the expressions.

  2. Normalization prompt: Hume appends a normalization prompt to your system prompt. This ensures consistent, stable, and fluid speech generation across different LLM providers, and means that developers don’t have to manually add this prompt to benefit from it. The exact normalization prompt can be found in the Normalize output text section above.

  3. System default prompt (only when using a supplemental LLM with an empty prompt): When no custom prompt is provided (the prompt field is an empty string), Hume sends our system default prompt to the supplemental LLM.

These modifications work in conjunction with your custom system prompt while ensuring that responses remain appropriate for voice-based interactions.