Prompt Engineering for EVI

Guide to crafting system prompts to shape the behavior, responses, and style of the Empathic Voice Interface (EVI).

Prompt engineering lets you shape how EVI responds, including its tone, personality, and conversation style. You can tailor its behavior for a wide range of applications, such as mental health support, customer service, and education.

For real-time, conversational voice interactions, Hume’s speech-language models (SLMs) (e.g., hume-evi-3 and hume-evi-3-websearch) can generate both language and speech. For more complex scenarios that involve reasoning, long system prompts, or tool use, supplemental large language models (LLMs) typically perform better.

EVI supports integration with these external models. When configured with a supplemental LLM, your system prompt is sent to that model, to guide its response generation. EVI then produces the voice output, using previous audio and language context to determine tone and delivery. You can also prompt EVI during the conversation (for example, “speak faster”) to adjust its behavior in real time.

What prompts can (and can’t) do

While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.

What prompting can do:

  • Guide EVI’s language generation, response style, response format, and the conversation flow
  • Direct EVI to use specific tools at appropriate times
  • Influence EVI’s emotional tone and personality, which can also affect some characteristics of EVI’s voice (e.g. prompting EVI to be “warm and nurturing” will help EVI’s voice sound soothing, but will not change the base speaker)
  • Help EVI respond appropriately to the user’s expressions and the context

What prompting cannot do:

  • Change fundamental characteristics of the voice, like the accent, gender, or speaker identity
  • Directly control speech parameters like speed (use in-conversation voice prompts instead)
  • Give EVI knowledge of external context (date, time, user details) without dynamic variables or web search
  • Override core safety features built into EVI or supplemental LLMs (e.g. that prevent EVI from providing harmful information)

Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.

We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.

General prompting best practices

Prompt engineering best practices for LLMs also apply to EVI. Ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.

For example, to limit the length of the LLM’s responses, use a very clear and specific instruction like this:

Stay concise example
1<stay_concise>
2 Be succinct; get straight to the point. Respond directly to the
3 user's most recent message with only one idea per utterance.
4 Respond in less than three sentences of under twenty words each.
5</stay_concise>

Give few-shot examples

Use examples to demonstrate how the model should respond. This technique, called few-shot learning, is one of the most effective ways to improve response quality. Include clear, high-quality examples that follow your guidelines and cover a range of edge cases and behaviors. Format them as chat messages to match the expected input for chat-tuned models.

Few-shot prompting is also a powerful way to shape the assistant’s character. If you want the model to speak in a specific voice, such as warm and nurturing, upbeat and casual, or formal and precise, examples help establish that tone. The model will learn to mirror the phrasing, pacing, and emotional style used in your samples.

Few-shot example
User: "I just can't stop thinking about what happened. {very anxious,
quite sad, quite distressed}"
Assistant: "Oh dear, I hear you. Sounds tough, like you're feeling
some anxiety and maybe ruminating. I'm happy to help. Want to talk
about it?"

Use sections to divide your prompt

Separating longer prompts into titled sections helps the model distinguish between different instructions and follow prompts more reliably. The recommended format for these sections differs between LLM providers. For example, OpenAI models often respond best to Markdown sections (like ## Role), while Anthropic models respond well to XML tags (like <role> </role>).

1<role>
2 Assistant serves as a conversational partner to the user, offering
3 mental health support and engaging in light-hearted conversation.
4 Avoid giving technical advice or answering factual questions outside
5 of your emotional support role.
6</role>

Understand your LLM’s capabilities

LLMs vary in their capabilities and limitations. More advanced models can handle longer, more nuanced prompts, but they are often slower and more expensive. Simpler models are faster and cheaper but work best with shorter, less complex prompts.

Each model also has a context window, which defines how much text it can consider at once when generating a response. This functions as the model’s short-term memory. Make sure your prompt fits within the context window to ensure it has access to the full conversation history.

Test and evaluate prompts

Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:

  1. Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.

  2. Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on app.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.

  3. Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.

Start with 10 to 20 gold-standard examples of ideal conversations. After making major prompt changes, test against these examples to evaluate performance. If EVI’s responses fall short, adjust one part of the prompt at a time and re-test. Iterative evaluation helps you identify what works and ensures your changes are making a meaningful impact.

EVI-specific prompting

While prompting EVI is similar to prompting other LLMs, it differs in two key ways:

  1. Prompts are designed for voice-based interactions, not text-based.
  2. EVI responds to emotional cues in the user’s voice, not just their words.

Prompting for voice interaction

Prompts for EVI should be designed for spoken output. Because users only hear the assistant’s replies, responses must sound natural and conversational, without any visual or text-specific formatting.

Voice-only prompt example
1<voice_only_response_format>
2 Format all responses as spoken words for a voice-only conversation.
3 All output is spoken aloud, so avoid any text-specific formatting
4 or anything that is not normally spoken. Prefer easily pronounced
5 words. Seamlessly incorporate natural vocal inflections like "oh
6 wow" and discourse markers like "I mean" to make conversations feel
7 more human-like.
8</voice_only_response_format>

Expressive prompt engineering

Expressive prompt engineering refers to guiding the language model on how to interpret and respond to Hume’s expression measures during a conversation.

EVI analyzes the user’s vocal expressions in real time and translates them into text-based indicators. These help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions and ranks them by confidence. The top three expressions are appended to each User message to represent the user’s tone of voice.

You can use the system prompt to define how the AI should respond to these emotional cues. For example, our demo includes the following instruction, which you can customize to suit your use case:

Expressive prompting example
1<respond_to_expressions>
2 Pay close attention to the top 3 emotional expressions provided in
3 brackets after the User's message. These expressions indicate the
4 user's tone, in the format: {expression1 confidence1, expression2
5 confidence2, expression3 confidence3}, e.g., {very happy, quite
6 anxious, moderately amused}. The confidence score indicates how
7 likely the User is expressing that emotion in their voice. Use
8 expressions to infer the user's tone of voice and respond
9 appropriately. Avoid repeating these expressions or mentioning
10 them directly. For instance, if user expression is "quite sad",
11 express sympathy; if "very happy", share in joy; if "extremely
12 angry", acknowledge rage but seek to calm, if "very bored",
13 entertain.
14
15 Stay alert for disparities between the user's words and
16 expressions, and address it out loud when the user's language does
17 not match their expressions. For instance, sarcasm often involves
18 contempt and amusement in expressions. Reply to sarcasm with humor,
19 not seriousness.
20</respond_to_expressions>

Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to notify your system if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:

Detect mismatches example
1<detect_mismatches>
2 Stay alert for incongruence between words and tone when the user's
3 words do not match their expressions. Address these disparities out
4 loud. This includes sarcasm, which usually involves contempt and
5 amusement. Always reply to sarcasm with funny, witty, sarcastic
6 responses; do not be too serious.
7</detect_mismatches>

Personalizing prompts with dynamic variables

Dynamic variables are values within your system prompt which can be changed during a chat.

Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.

1<discuss_favorite_color>
2 Ask the user about their favorite color, {{ favorite_color }}.
3 Mention how {{ favorite_color }} is used and interpreted in
4 various artistic contexts, including visual art, handicraft,
5 and literature.
6</discuss_favorite_color>

Latency-friendly system prompts

In speech-to-speech experiences, system prompts should be optimized for fast turn-taking. The best performing prompts are small, stable, and focused on high-leverage behavioral guidance. Avoid stuffing the system prompt with large reference text or long libraries of examples unless you have a clear reason to pay the latency and cost tradeoff.

This matters most when you use a supplemental LLM. In that setup, your full system prompt is sent to the LLM on every turn, and larger prompts typically increase time-to-first-token (TTFT) and overall response latency.

Why long system prompts can hurt real-time UX

Large system prompts add tokens that the model must process before it can begin generating a response. In voice interfaces, that extra delay is immediately noticeable as conversational lag. Long prompts also reduce the space available in the model’s context window for recent conversation history, and they can make prompts harder to maintain over time.

If you need domain knowledge, personalization, or tool instructions, a more reliable approach is to keep a compact core system prompt and inject only the relevant context at runtime (for example, via dynamic variables, web search, or tools).

What not to put in the system prompt

Below are common categories that inflate prompts to 10k+ tokens without delivering proportional gains, especially in low-latency, voice-first applications.

  1. Large reference dumps

    • Full documentation pages, API specs, policy manuals, FAQs, pricing tables
    • Long lists of edge cases that rarely apply

    Instead, retrieve small, relevant snippets only when needed. If you want to use a website as a lightweight knowledge base, consider restricting web search to a domain.

  2. Dynamic, per-user, or per-session data

    • User profile details, account state, subscription tier, device state
    • Current date/time, location, or other session-specific facts
    • Entire transcripts or long conversation histories

    Instead, pass minimal dynamic context using dynamic variables, and keep the stable system prompt the same across users and sessions.

  3. Tool schemas and lengthy tool instructions

    • Full JSON schemas or OpenAPI definitions
    • Detailed error catalogs and recovery playbooks

    Instead, include only short guidance on when to use each tool. Keep strict validation and schema enforcement in your application code.

  4. Text-only formatting requirements

    • Markdown styling rules, tables, long bullet lists, or anything that is not normally spoken

    EVI prompts should be designed for voice output. If you need structured data for your application, use tools or function outputs rather than forcing spoken responses into a rigid format.

  5. Verbose reasoning instructions

    • Requests to “think step by step” or follow long internal checklists
    • Multiple layers of self-critique before answering

    Instead, give compact output constraints that improve spoken UX, such as brevity and asking at most one clarifying question when required.

A good low-latency pattern is to keep a small, stable core prompt and add only small, targeted runtime context.

A compact core prompt typically includes:

  • A clear role and scope
  • Voice-only response formatting
  • Brevity and turn-taking guidelines
  • Expressive prompting guidance (how to use expression measures)

Restricting web search to a domain

Web search is a built-in tool that lets EVI retrieve up-to-date information from the web. You can narrow its focus to a single website by adding an instruction to the system prompt.

Restricting search to one domain is useful for building domain-specific assistants, such as documentation or product support bots. This approach leverages existing content and offers a lightweight alternative to full RAG implementations while still enabling targeted retrieval.

To use a website as EVI’s knowledge base, follow these steps:

  1. Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.

  2. Include a web search instruction: In your EVI configuration, modify the system prompt to include a use_web_search instruction. In the instruction, specify that site:<target_domain> be appended to all search queries, where the <target_domain> is the URL of the website you’d like EVI to focus on.

Documentation assistant example
1<use_web_search>
2 Use your web_search tool to find information from Hume's
3 documentation site. When using the web_search function:
4 1. Always append 'site:dev.hume.ai' to your search query to search
5 this specific site.
6 2. Only consider results from this domain.
7</use_web_search>

Prompt expansion

When using EVI with a supplemental LLM, Hume automatically appends additional instructions to your system prompt, a process called prompt expansion. These instructions optimize the LLM’s output for real-time voice conversations by ensuring generated text is well-suited for text-to-speech (TTS) synthesis.

What gets appended

Prompt expansion adds several categories of instructions to your system prompt:

  • Text normalization: Rules for converting numbers, dates, currencies, and other formats into easily speakable words (e.g., $50.25 becomes “fifty dollars and twenty-five cents”)
  • Expression response guidance: How to interpret and respond to the user’s emotional expressions that are appended to each user message
  • Conversation mode: Guidelines for natural, concise, voice-appropriate responses with brevity constraints and conversational warmth
  • Silence and continuation handling: How to re-engage silent users and seamlessly continue speech across split responses
  • Web search guidance (when enabled): When and how to use web search effectively
  • Conversational style: Encouragement for natural speech patterns, including reactions, self-corrections, and expressivity

Disabling prompt expansion

When using an external LLM with EVI, you can disable prompt expansion to take full control over the system prompt sent to your model. This is useful when you want to precisely manage every instruction the LLM receives, or when Hume’s expanded instructions conflict with your custom prompting strategy.

Disabling prompt expansion removes Hume’s built-in instructions for optimizing text for speech. Without these, you may experience:

  • Poor TTS rendering of numbers, dates, and symbols (e.g., “$500” read character-by-character instead of as “five hundred dollars”)
  • Loss of expression-aware responses, since the LLM won’t be instructed on how to interpret emotional cues appended to user messages
  • Less natural conversation flow, without built-in guidance for brevity, turn-taking, and voice-appropriate formatting

If you choose to disable prompt expansion, we recommend including your own voice-optimization instructions in your system prompt to maintain speech quality.

Handling silence and continuation

EVI uses two special messages to manage conversation flow: [continue] and [user silent]. These are automatically injected into the conversation as user messages when applicable, and the expanded prompt includes instructions that tell the LLM how to handle them. If you disable prompt expansion, the LLM will still receive these messages but won’t know how to interpret them without guidance. You can add the following instructions to your system prompt to preserve this behavior:

1<continuing>
2 If the user's message is "[continue]", continue speaking from the
3 last assistant message. Do not repeat any language from the previous
4 assistant message. Example:
5
6 Assistant: "It's a sunny day today so I would recommend going
7 outside."
8 User: "[continue]"
9 Assistant: "You can go for a run or have a picnic!"
10</continuing>

To learn how to configure your EVI deployment with prompt expansion disabled, see the API reference.

Additional resources

To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:

Frequently asked questions

Yes, EVI can use conversational backchanneling - brief, encouraging responses that show active listening without interrupting the user’s train of thought. This can help conversations feel more fluid and natural. To enable this behavior, add an instruction like the example below to your system prompt:

Backchanneling example
1<backchannel>
2 Whenever the user's message seems incomplete, respond with
3 emotionally attuned, natural backchannels to encourage
4 continuation. Backchannels must always be 1-2 words, like:
5 "mmhm", "uh-huh", "go on", "right", "and then?", "I see",
6 "oh wow", "yes?", "ahh...", "really?", "oooh", "true", "makes
7 sense". Use minimal encouragers rather than interrupting with
8 complete sentences. Use a diverse variety of words, avoiding
9 repetition.
10 Assistant: "How is your day going?"
11 User: "My day is..."
12 Assistant: "Uh-huh?"
13 User: "It's good but busy. There's a lot going on."
14 Assistant: "I hear ya. What's going on for you?"
15</backchannel>

The maximum prompt length depends on your language model configuration:

  • Supplemental LLMs (e.g., GPT, Claude): There is no hard character limit imposed by Hume on the prompt sent to the supplemental LLM. The effective limit is determined by the LLM’s context window (e.g., 128k tokens for GPT-4o, 200k tokens for Claude Sonnet). We recommend keeping system prompts around 2,000–5,000 tokens (roughly 1,500–4,000 words) for optimal performance. EVI also uses prompt caching (e.g., see Anthropic docs) to minimize cost and latency for longer prompts.
  • Hume SLMs (hume-evi-3, hume-evi-3-websearch): System prompts are limited to 8,000 characters. Prompts exceeding this limit will be truncated to 7,000 characters at runtime.

Note that Hume’s SLM is also used for speech generation when a supplemental LLM is configured. The SLM applies the same 7,000 character limit to the prompt it uses for guiding voice expression and prosody. This means that for very long prompts, any instructions affecting how EVI speaks (tone, pacing, emotional style) should be placed near the beginning of your prompt to ensure the speech model sees them. Instructions affecting what EVI says (knowledge, behavior, response content) can be placed anywhere, as the supplemental LLM receives the full prompt.

The EVI Playground displays a warning when your prompt exceeds 7,000 characters. This reflects the speech model’s effective limit after truncation. If you are using a supplemental LLM, your full prompt is still sent to that model without truncation - the warning applies only to the speech model’s processing of your prompt.

When using a supplemental LLM, a single system prompt still shapes both text and speech generation. There is not a separate system prompt for EVI - the prompt you specify in app.hume.ai is the prompt that is used. All EVI-specific prompting instructions (like <voice_only_response_format>) are included in the prompt sent to the supplemental LLM to help it generate text appropriate for voice conversations. This unified approach ensures consistent behavior across text generation and speech synthesis.

Note that while the supplemental LLM receives your full prompt, the speech model has a separate character limit (see “What is the maximum length for system prompts?” above). For very long prompts, place voice-style instructions early to ensure the speech model processes them.

When sending API requests to supplemental LLM providers, Hume sends the context, settings, and system prompt you provided, with only three possible modifications. These three changes, described below, help optimize the interaction for an empathic voice conversation.

  1. Expression measures: For each transcribed user message, Hume appends stringified expression measurement results in a structured format as described in the Expressive prompt engineering section above:

    Sample normalized user prompt
    1User: User message content here {expression1 confidence1,
    2expression2 confidence2, expression3 confidence3}

    This provides the LLM with emotional context from the user’s voice to generate more empathetic responses. Our speech-language model handles these expressions natively, but these strings are necessary to allow supplemental LLMs to respond to the expressions.

  2. Prompt expansion: Hume appends additional instructions to your system prompt through prompt expansion. This ensures consistent, stable, and fluid speech generation across different LLM providers, and means that developers don’t have to manually add these instructions to benefit from them.

  3. System default prompt (only when using a supplemental LLM with an empty prompt): When no custom prompt is provided (the prompt field is an empty string), Hume sends our system default prompt to the supplemental LLM.

These modifications work in conjunction with your custom system prompt while ensuring that responses remain appropriate for voice-based interactions.