Prompt Engineering for EVI
Guide to crafting system prompts to shape the behavior, responses, and style of the Empathic Voice Interface (EVI).
Prompt engineering lets you shape how EVI responds, including its tone, personality, and conversation style. You can tailor its behavior for a wide range of applications, such as mental health support, customer service, and education.
For real-time, conversational voice interactions, Hume’s speech-language models (SLMs) (e.g., hume-evi-2
,
hume-evi-3
, and hume-evi-3-websearch
) can generate both language and speech. For more complex scenarios that
involve reasoning, long system prompts, or tool use, supplemental large language models (LLMs) typically perform
better.
EVI supports integration with these external models. When configured with a supplemental LLM, your system prompt is sent to that model, to guide its response generation. EVI then produces the voice output, using previous audio and language context to determine tone and delivery. You can also prompt EVI during the conversation (for example, “speak faster”) to adjust its behavior in real time.
EVI-specific prompting
While prompting EVI is similar to prompting other LLMs, it differs in two key ways:
- Prompts are designed for voice-based interactions, not text-based.
- EVI responds to emotional cues in the user’s voice, not just their words.
Prompting for voice interaction
Prompts for EVI should be designed for spoken output. Because users only hear the assistant’s replies, responses must sound natural and conversational, without any visual or text-specific formatting.
Expressive prompt engineering
Expressive prompt engineering refers to guiding the language model on how to interpret and respond to Hume’s expression measures during a conversation.
EVI analyzes the user’s vocal expressions in real time and translates them into text-based indicators. These help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions and ranks them by confidence. The top three expressions are appended to each User message to represent the user’s tone of voice.
You can use the system prompt to define how the AI should respond to these emotional cues. For example, our demo includes the following instruction, which you can customize to suit your use case:
Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to notify your system if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:
Personalizing prompts with dynamic variables
Dynamic variables are values within your system prompt which can be changed during a chat.
Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.
Restricting web search to a domain
Web search is a built-in tool that lets EVI retrieve up-to-date information from the web. You can narrow its focus to a single website by adding an instruction to the system prompt.
Restricting search to one domain is useful for building domain-specific assistants, such as documentation or product support bots. This approach leverages existing content and offers a lightweight alternative to full RAG implementations while still enabling targeted retrieval.
To use a website as EVI’s knowledge base, follow these steps:
-
Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.
-
Include a web search instruction: In your EVI configuration, modify the system prompt to include a
use_web_search
instruction. In the instruction, specify thatsite:<target_domain>
be appended to all search queries, where the<target_domain>
is the URL of the website you’d like EVI to focus on.
General prompting best practices
Prompt engineering best practices for LLMs also apply to EVI. Ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.
For example, to limit the length of the LLM’s responses, use a very clear and specific instruction like this:
Give few-shot examples
Use examples to demonstrate how the model should respond. This technique, called few-shot learning, is one of the most effective ways to improve response quality. Include clear, high-quality examples that follow your guidelines and cover a range of edge cases and behaviors. Format them as chat messages to match the expected input for chat-tuned models.
Few-shot prompting is also a powerful way to shape the assistant’s character. If you want the model to speak in a specific voice, such as warm and nurturing, upbeat and casual, or formal and precise, examples help establish that tone. The model will learn to mirror the phrasing, pacing, and emotional style used in your samples.
Use sections to divide your prompt
Separating longer prompts into titled sections helps the model distinguish between different instructions and follow
prompts more reliably. The recommended format for these sections differs between LLM providers. For example,
OpenAI models often respond best to Markdown sections (like ## Role
), while Anthropic models respond well to XML tags
(like <role> </role>
).
XML example
Markdown example
Understand your LLM’s capabilities
LLMs vary in their capabilities and limitations. More advanced models can handle longer, more nuanced prompts, but they are often slower and more expensive. Simpler models are faster and cheaper but work best with shorter, less complex prompts.
Each model also has a context window, which defines how much text it can consider at once when generating a response. This functions as the model’s short-term memory. Make sure your prompt fits within the context window to ensure it has access to the full conversation history.
Test and evaluate prompts
Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:
-
Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.
-
Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on platform.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.
-
Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.
Start with 10 to 20 gold-standard examples of ideal conversations. After making major prompt changes, test against these examples to evaluate performance. If EVI’s responses fall short, adjust one part of the prompt at a time and re-test. Iterative evaluation helps you identify what works and ensures your changes are making a meaningful impact.
What prompts can (and can’t) do
While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.
What prompting can do:
- Guide EVI’s language generation, response style, response format, and the conversation flow
- Direct EVI to use specific tools at appropriate times
- Influence EVI’s emotional tone and personality, which can also affect some characteristics of EVI’s voice (e.g. prompting EVI to be “warm and nurturing” will help EVI’s voice sound soothing, but will not change the base speaker)
- Help EVI respond appropriately to the user’s expressions and the context
What prompting cannot do:
- Change fundamental characteristics of the voice, like the accent, gender, or speaker identity
- Directly control speech parameters like speed (use in-conversation voice prompts instead)
- Give EVI knowledge of external context (date, time, user details) without dynamic variables or web search
- Override core safety features built into EVI or supplemental LLMs (e.g. that prevent EVI from providing harmful information)
Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.
We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.
Additional resources
To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:
- EVI prompt examples: See examples of EVI prompts, including the full Hume default prompt.
- Hume EVI playground: Test out your system prompts in live conversations with EVI, and see how it responds differently when you change configuration options.
- OpenAI tokenizer: Useful for counting the number of tokens in a system prompt for OpenAI models, which use the same tokenizer (tiktoken).
- OpenAI prompt engineering guidelines:
For prompting OpenAI models like GPT-4.
- OpenAI playground: For testing and evaluating OpenAI prompts in a chat interface, including running evaluations.
- Anthropic prompt engineering guidelines: For
prompting Anthropic models like Claude 3 Haiku.
- Anthropic console: For testing and evaluating Anthropic prompts in a chat interface, including evaluations and an automatic prompt improver.
- Fireworks model playground: For testing out open-source models served on Fireworks.
- Vercel AI playground: Try multiple prompts and LLMs in parallel to compare their responses.
- Perplexity Labs: Try different models, including open-source LLMs, to evaluate their responses and their latency.
- Prompt engineering guide: An open-source guide from DAIR.ai with general methods and advanced techniques for prompting a wide variety of LLMs.
- Artificial analysis benchmarks: Compare LLM characteristics and performance across different benchmarks, latency metrics, and more.
Frequently asked questions
Can EVI use backchanneling to avoid interrupting the user?
Yes, EVI can use conversational backchanneling - brief, encouraging responses that show active listening without interrupting the user’s train of thought. This can help conversations feel more fluid and natural. To enable this behavior, add an instrucation like the example below to your system prompt:
What is the maximum length for system prompts?
The maximum length depends on the supplemental LLM being used. For example, GPT-4 has a 32k token context window, while Claude 3 Haiku has a 200k token context window. Check the context window for your LLM to ensure that your prompt is within this limit. We recommend keeping system prompts around 2000-5000 tokens (roughly 1500-4000 words) for optimal performance across all models. EVI also uses prompt caching (e.g. see Anthropic docs) to minimize the cost and latency when using very long prompts.
How do system prompts work with supplemental LLMs?
When using a supplemental LLM, a single system prompt still shapes both text and speech generation. There is not a
separate system prompt for EVI 2 - the prompt you specify in platform.hume.ai is the prompt that is used. All
EVI-specific prompting instructions (like <voice_only_response_format>
) are included in the prompt sent to the
supplemental LLM to help it generate text appropriate for voice conversations. This unified approach ensures
consistent behavior across text generation and speech synthesis.
How exactly does Hume change the payload (the transcript and prompt) sent to the LLM provider?
When sending API requests to supplemental LLM providers, Hume sends the context, settings, and system prompt you provided, with only three possible modifications. These three changes, described below, help optimize the interaction for an empathic voice conversation.
-
Expression measures: For each transcribed user message, Hume appends stringified expression measurement results in a structured format as described in the Expressive prompt engineering section above:
Sample normalized user promptThis provides the LLM with emotional context from the user’s voice to generate more empathetic responses. Our speech-language model handles these expressions natively, but these strings are necessary to allow supplemental LLMs to respond to the expressions.
-
Normalization prompt: Hume appends a normalization prompt to your system prompt. This ensures consistent, stable, and fluid speech generation across different LLM providers, and means that developers don’t have to manually add this prompt to benefit from it. The exact normalization prompt can be found in the Normalize output text section above.
-
System default prompt (only when using a supplemental LLM with an empty prompt): When no custom prompt is provided (the prompt field is an empty string), Hume sends our system default prompt to the supplemental LLM.
These modifications work in conjunction with your custom system prompt while ensuring that responses remain appropriate for voice-based interactions.