Prompt Engineering for EVI
Guide to crafting system prompts to shape the behavior, responses, and style of the Empathic Voice Interface (EVI).
Guide to crafting system prompts to shape the behavior, responses, and style of the Empathic Voice Interface (EVI).
Prompt engineering lets you shape how EVI responds, including its tone, personality, and conversation style. You can tailor its behavior for a wide range of applications, such as mental health support, customer service, and education.
For real-time, conversational voice interactions, Hume’s speech-language models (SLMs) (e.g., hume-evi-3 and
hume-evi-3-websearch) can generate both language and speech. For more complex scenarios that
involve reasoning, long system prompts, or tool use, supplemental large language models (LLMs) typically perform
better.
EVI supports integration with these external models. When configured with a supplemental LLM, your system prompt is sent to that model, to guide its response generation. EVI then produces the voice output, using previous audio and language context to determine tone and delivery. You can also prompt EVI during the conversation (for example, “speak faster”) to adjust its behavior in real time.
While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.
What prompting can do:
What prompting cannot do:
Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.
We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.
Prompt engineering best practices for LLMs also apply to EVI. Ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.
For example, to limit the length of the LLM’s responses, use a very clear and specific instruction like this:
Use examples to demonstrate how the model should respond. This technique, called few-shot learning, is one of the most effective ways to improve response quality. Include clear, high-quality examples that follow your guidelines and cover a range of edge cases and behaviors. Format them as chat messages to match the expected input for chat-tuned models.
Few-shot prompting is also a powerful way to shape the assistant’s character. If you want the model to speak in a specific voice, such as warm and nurturing, upbeat and casual, or formal and precise, examples help establish that tone. The model will learn to mirror the phrasing, pacing, and emotional style used in your samples.
Separating longer prompts into titled sections helps the model distinguish between different instructions and follow
prompts more reliably. The recommended format for these sections differs between LLM providers. For example,
OpenAI models often respond best to Markdown sections (like ## Role), while Anthropic models respond well to XML tags
(like <role> </role>).
LLMs vary in their capabilities and limitations. More advanced models can handle longer, more nuanced prompts, but they are often slower and more expensive. Simpler models are faster and cheaper but work best with shorter, less complex prompts.
Each model also has a context window, which defines how much text it can consider at once when generating a response. This functions as the model’s short-term memory. Make sure your prompt fits within the context window to ensure it has access to the full conversation history.
Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:
Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.
Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on app.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.
Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.
Start with 10 to 20 gold-standard examples of ideal conversations. After making major prompt changes, test against these examples to evaluate performance. If EVI’s responses fall short, adjust one part of the prompt at a time and re-test. Iterative evaluation helps you identify what works and ensures your changes are making a meaningful impact.
While prompting EVI is similar to prompting other LLMs, it differs in two key ways:
Prompts for EVI should be designed for spoken output. Because users only hear the assistant’s replies, responses must sound natural and conversational, without any visual or text-specific formatting.
Expressive prompt engineering refers to guiding the language model on how to interpret and respond to Hume’s expression measures during a conversation.
EVI analyzes the user’s vocal expressions in real time and translates them into text-based indicators. These help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions and ranks them by confidence. The top three expressions are appended to each User message to represent the user’s tone of voice.
You can use the system prompt to define how the AI should respond to these emotional cues. For example, our demo includes the following instruction, which you can customize to suit your use case:
Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to notify your system if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:
Dynamic variables are values within your system prompt which can be changed during a chat.
Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.
In speech-to-speech experiences, system prompts should be optimized for fast turn-taking. The best performing prompts are small, stable, and focused on high-leverage behavioral guidance. Avoid stuffing the system prompt with large reference text or long libraries of examples unless you have a clear reason to pay the latency and cost tradeoff.
This matters most when you use a supplemental LLM. In that setup, your full system prompt is sent to the LLM on every turn, and larger prompts typically increase time-to-first-token (TTFT) and overall response latency.
Large system prompts add tokens that the model must process before it can begin generating a response. In voice interfaces, that extra delay is immediately noticeable as conversational lag. Long prompts also reduce the space available in the model’s context window for recent conversation history, and they can make prompts harder to maintain over time.
If you need domain knowledge, personalization, or tool instructions, a more reliable approach is to keep a compact core system prompt and inject only the relevant context at runtime (for example, via dynamic variables, web search, or tools).
Below are common categories that inflate prompts to 10k+ tokens without delivering proportional gains, especially in low-latency, voice-first applications.
Large reference dumps
Instead, retrieve small, relevant snippets only when needed. If you want to use a website as a lightweight knowledge base, consider restricting web search to a domain.
Dynamic, per-user, or per-session data
Instead, pass minimal dynamic context using dynamic variables, and keep the stable system prompt the same across users and sessions.
Tool schemas and lengthy tool instructions
Instead, include only short guidance on when to use each tool. Keep strict validation and schema enforcement in your application code.
Text-only formatting requirements
EVI prompts should be designed for voice output. If you need structured data for your application, use tools or function outputs rather than forcing spoken responses into a rigid format.
Verbose reasoning instructions
Instead, give compact output constraints that improve spoken UX, such as brevity and asking at most one clarifying question when required.
A good low-latency pattern is to keep a small, stable core prompt and add only small, targeted runtime context.
A compact core prompt typically includes:
Web search is a built-in tool that lets EVI retrieve up-to-date information from the web. You can narrow its focus to a single website by adding an instruction to the system prompt.
Restricting search to one domain is useful for building domain-specific assistants, such as documentation or product support bots. This approach leverages existing content and offers a lightweight alternative to full RAG implementations while still enabling targeted retrieval.
To use a website as EVI’s knowledge base, follow these steps:
Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.
Include a web search instruction: In your EVI configuration, modify the system prompt to include a
use_web_search instruction. In the instruction, specify that site:<target_domain> be appended to all search
queries, where the <target_domain> is the URL of the website you’d like EVI to focus on.
When using EVI with a supplemental LLM, Hume automatically appends additional instructions to your system prompt, a process called prompt expansion. These instructions optimize the LLM’s output for real-time voice conversations by ensuring generated text is well-suited for text-to-speech (TTS) synthesis.
Prompt expansion adds several categories of instructions to your system prompt:
$50.25 becomes “fifty dollars and twenty-five cents”)When using an external LLM with EVI, you can disable prompt expansion to take full control over the system prompt sent to your model. This is useful when you want to precisely manage every instruction the LLM receives, or when Hume’s expanded instructions conflict with your custom prompting strategy.
Disabling prompt expansion removes Hume’s built-in instructions for optimizing text for speech. Without these, you may experience:
If you choose to disable prompt expansion, we recommend including your own voice-optimization instructions in your system prompt to maintain speech quality.
EVI uses two special messages to manage conversation flow: [continue] and [user silent]. These are
automatically injected into the conversation as user messages when applicable, and the expanded prompt includes
instructions that tell the LLM how to handle them. If you disable prompt expansion, the LLM will still receive
these messages but won’t know how to interpret them without guidance. You can add the following instructions to
your system prompt to preserve this behavior:
To learn how to configure your EVI deployment with prompt expansion disabled, see the API reference.
To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:
Yes, EVI can use conversational backchanneling - brief, encouraging responses that show active listening without interrupting the user’s train of thought. This can help conversations feel more fluid and natural. To enable this behavior, add an instruction like the example below to your system prompt:
The maximum prompt length depends on your language model configuration:
hume-evi-3, hume-evi-3-websearch): System prompts are limited to 8,000 characters. Prompts
exceeding this limit will be truncated to 7,000 characters at runtime.Note that Hume’s SLM is also used for speech generation when a supplemental LLM is configured. The SLM applies the same 7,000 character limit to the prompt it uses for guiding voice expression and prosody. This means that for very long prompts, any instructions affecting how EVI speaks (tone, pacing, emotional style) should be placed near the beginning of your prompt to ensure the speech model sees them. Instructions affecting what EVI says (knowledge, behavior, response content) can be placed anywhere, as the supplemental LLM receives the full prompt.
The EVI Playground displays a warning when your prompt exceeds 7,000 characters. This reflects the speech model’s effective limit after truncation. If you are using a supplemental LLM, your full prompt is still sent to that model without truncation - the warning applies only to the speech model’s processing of your prompt.
When using a supplemental LLM, a single system prompt still shapes both text and speech generation. There is not a
separate system prompt for EVI - the prompt you specify in app.hume.ai is the prompt that is used. All
EVI-specific prompting instructions (like <voice_only_response_format>) are included in the prompt sent to the
supplemental LLM to help it generate text appropriate for voice conversations. This unified approach ensures
consistent behavior across text generation and speech synthesis.
Note that while the supplemental LLM receives your full prompt, the speech model has a separate character limit (see “What is the maximum length for system prompts?” above). For very long prompts, place voice-style instructions early to ensure the speech model processes them.
When sending API requests to supplemental LLM providers, Hume sends the context, settings, and system prompt you provided, with only three possible modifications. These three changes, described below, help optimize the interaction for an empathic voice conversation.
Expression measures: For each transcribed user message, Hume appends stringified expression measurement results in a structured format as described in the Expressive prompt engineering section above:
This provides the LLM with emotional context from the user’s voice to generate more empathetic responses. Our speech-language model handles these expressions natively, but these strings are necessary to allow supplemental LLMs to respond to the expressions.
Prompt expansion: Hume appends additional instructions to your system prompt through prompt expansion. This ensures consistent, stable, and fluid speech generation across different LLM providers, and means that developers don’t have to manually add these instructions to benefit from them.
System default prompt (only when using a supplemental LLM with an empty prompt): When no custom prompt is provided (the prompt field is an empty string), Hume sends our system default prompt to the supplemental LLM.
These modifications work in conjunction with your custom system prompt while ensuring that responses remain appropriate for voice-based interactions.