Prompt engineering for empathic voice interfaces
System prompts shape the behavior, responses, and style of your custom empathic voice interface (EVI).
Creating an effective system prompt is an essential part of customizing an EVI’s behavior. For the most part, prompting EVI is the same as prompting any LLM, but there are some important differences. Prompting for EVIs is different for two main reasons:
- Prompts are for a voice-only interaction with the user, rather than a text-based chat.
- EVIs can respond to the user’s emotional expressions in their tone of voice, not just the text content of their messages.
Further, EVI is interoperable with any supplemental LLM, allowing developers to select the best model for their use case. For fast, conversational, relatively simple interactions, Hume’s voice-language model EVI 2 can handle text generation. However, frontier LLMs will perform better for more complex use cases involving reasoning, long or nuanced prompts, tool use, and other requirements.
If you select a supplemental LLM, your system prompt is sent to this LLM, which then generates all of the language in the chat while EVI generates the voice. EVI’s voice-language model will still take into account the previous language and audio context to generate the appropriate tone of voice. It can also still be prompted in the chat to change its behavior (e.g. “speak faster”).
Prompt engineering allows developers to customize EVI’s response style for any use case, from voice AIs for mental health support to customer service agents and beyond.
The system prompt is a powerful and flexible way to guide EVI’s responses, but it cannot dictate AI responses with absolute precision. See the limits of prompting section for more information. Careful prompt design and testing will help EVI behave as intended. If you need more control over EVI’s responses, try using our custom language model feature for complete control of text generation.
EVI-specific prompting instructions
The instructions below are specific to prompting empathic voice interfaces - where the language model has to respond in a voice conversation to the user’s speech and their emotional expressions.
When a supplemental LLM is selected but no custom prompt is provided in the EVI API, we send our system default prompt to the LLM provider. You can use this prompt as a reference or starting point.
For examples of these prompting principles in action, see our EVI prompt examples repository.
If you find the default behavior of the LLM acceptable, then you may only need a very short system prompt. Customizing the LLM’s behavior more and maintaining consistency in longer and more varied conversations often requires longer prompts.
Normalize output text
Our speech-language model works better with normalized text - text that can be easily spoken aloud. Non-normalized text like numbers, dates, equations, and special formatting can cause issues with speech synthesis. To ensure high quality speech output, all text should be converted into a natural, speakable format before being spoken aloud.
Hume automatically appends the text normalization prompt below to all prompts sent to supplemental LLMs. You do not need to include these instructions in your own prompt, as doing so would result in duplicate instructions.
Expressive prompt engineering
Expressive prompt engineering is our term for instructing language models on how to use Hume’s expression measures in conversations. EVI measures the user’s vocal expressions in real time and converts them into text-based indicators to help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions in the user’s voice and ranks these expressions by our model’s confidence that they are present. Text-based descriptions of the user’s top 3 expressions are appended to the end of each User message
to indicate the user’s tone of voice. You can use the system prompt to guide how the AI voice responds to these non-verbal cues of the user’s emotional expressions.
For example, our demo uses an instruction like the one below to help EVI respond to expressions. You can customize this to explain to EVI how it should respond to the emotional expressions.
Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to alert you over email if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:
EVI is designed for empathic conversations, and you can use expressive prompt engineering to customize how EVI empathizes with the user’s expressions for your use case.
Using dynamic variables in your prompt
Dynamic variables are values which can change during a conversation with EVI.
In order to function, dynamic variables must be manually defined within a chat’s session settings. To learn how to do so, visit our Conversational controls page.
Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.
In other words, dynamic variables may be used to customize EVI conversations with specific context for each user and each conversation. For example, you can adjust your system prompt to include conversation-specific information, such as a user’s favorite color or travel plans:
Using a website as EVI’s knowledge base
Web search is a built-in tool that allows EVI to search the web for up-to-date information. However, instead of searching the entire web, you can configure EVI to search within a single website using a system prompt.
Constraining EVI’s knowledge to a specific website enables creating domain-specific chatbots. For example, you could use this approach to create documentation assistants or product-specific support bots. By leveraging existing web content, it provides a quick alternative to full RAG implementations while still offering targeted information retrieval.
To use a website as EVI’s knowledge base, follow these steps:
-
Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.
-
Include a web search instruction: In your EVI configuration, modify the system prompt to include a
use_web_search
instruction. -
Specify a target domain: In the instruction, specify that
site:<target_domain>
be appended to all search queries, where the<target_domain>
is the URL of the website you’d like EVI to focus on. For example, you can create a documentation assistant using an instruction like the one below:
General LLM prompting guidelines
Best practices for prompt engineering also apply to EVIs. For example, ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.
For example, to limit the length of the LLM’s responses, you may use a very clear and specific instruction like this:
Try to focus on telling the model what it should do (positive reinforcement) rather than what it shouldn’t do (negative reinforcement). LLMs have a harder time consistently avoiding behaviors, and adding undesired behaviors to the prompt may unintentionally promote them.
Test and evaluate prompts
Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:
-
Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.
-
Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on platform.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.
-
Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.
Start with 10-20 gold-standard examples of excellent conversations. Test the system prompt against these examples after making major changes. If the EVI’s responses don’t meet your expectations, adjust one part of the prompt at a time and re-test to ensure your changes are improving performance. Evaluation is a vital component of prompting, and it’s the best way to ensure your changes are making an impact.
Understand your LLM’s capabilities
Different LLMs have varying capabilities, limitations, and context windows. More advanced LLMs can handle longer, nuanced prompts, but are often slower and pricier. Simpler LLMs are faster and cheaper but require shorter, less complex prompts with fewer instructions and less nuance.
Some LLMs also have longer context windows - the number of tokens the model can process while generating a response, acting essentially as the model’s memory. Context windows range from 8k tokens (Gemma 7B), to 128k (GPT-4o), to 200k (Claude 3), to 2 million tokens (Gemini 1.5 Pro). Tailor your prompt length to fit within the LLM’s context window to ensure the model can use the full conversation history.
Use sections to divide your prompt
Separating longer prompts into titled sections helps the model distinguish between different instructions and follow prompts more reliably. The recommended format for these sections differs between language model providers. For example, OpenAI models often respond best to Markdown sections (like ## Role
), while Anthropic models respond well to XML tags (like <role> </role>
). For example:
For Claude models, you may wrap your instructions in tags like <role>
, <personality>
, <response_style>
, or <examples>
, to structure your prompt. This format is not required, but it can improve the LLM’s instruction-following. At the end of your prompt, it may also be helpful to remind the LLM of key instructions.
Give few-shot examples
Use examples to show the LLM how it should respond - a technique known as few-shot learning. Including several concrete examples of ideal interactions that follow your guidelines is one of the most effective ways to improve responses. Use excellent examples that cover different edge cases and behaviors to reinforce your instructions. Structure these examples as messages, following the format for chat-tuned LLMs. For example:
If you notice that your EVI consistently fails to follow the prompt in certain situations, try providing examples that show how it should ideally respond in those situations.
The limits of prompting
While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.
What prompting can do:
- Guide EVI’s language generation, response style, response format, and the conversation flow
- Direct EVI to use specific tools at appropriate times
- Influence EVI’s emotional tone and personality, which can also affect some characteristics of EVI’s voice (e.g. prompting EVI to be “warm and nurturing” will help EVI’s voice sound soothing, but will not change the base speaker)
- Help EVI respond appropriately to the user’s expressions and the context
What prompting cannot do:
- Change fundamental characteristics of the voice, like the accent, gender, or speaker identity
- Directly control speech parameters like speed (use in-conversation voice prompts instead)
- Give EVI knowledge of external context (date, time, user details) without dynamic variables or web search
- Override core safety features built into EVI or supplemental LLMs (e.g. that prevent EVI from providing harmful information)
Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.
We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.
Additional resources
To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:
- EVI prompt examples: See examples of EVI prompts, including the full Hume default prompt.
- Hume EVI playground: Test out your system prompts in live conversations with EVI, and see how it responds differently when you change configuration options.
- OpenAI tokenizer: Useful for counting the number of tokens in a system prompt for OpenAI models, which use the same tokenizer (tiktoken).
- OpenAI prompt engineering guidelines: For prompting OpenAI models like GPT-4.
- OpenAI playground: For testing and evaluating OpenAI prompts in a chat interface, including running evaluations.
- Anthropic prompt engineering guidelines: For prompting Anthropic models like Claude 3 Haiku.
- Anthropic console: For testing and evaluating Anthropic prompts in a chat interface, including evaluations and an automatic prompt improver.
- Fireworks model playground: For testing out open-source models served on Fireworks.
- Vercel AI playground: Try multiple prompts and LLMs in parallel to compare their responses.
- Perplexity Labs: Try different models, including open-source LLMs, to evaluate their responses and their latency.
- Prompt engineering guide: An open-source guide from DAIR.ai with general methods and advanced techniques for prompting a wide variety of LLMs.
- Artificial analysis benchmarks: Compare LLM characteristics and performance across different benchmarks, latency metrics, and more.
Frequently asked questions
Can EVI use backchanneling to avoid interrupting when the user pauses or has an incomplete thought?
Yes, EVI can use conversational backchanneling - brief, encouraging responses that show active listening without interrupting the user’s train of thought. This can help conversations feel more fluid and natural. To enable this behavior, add an instrucation like the example below to your system prompt:
What is the maximum length for system prompts?
The maximum length depends on the supplemental LLM being used. For example, GPT-4 has a 32k token context window, while Claude 3 Haiku has a 200k token context window. Check the context window for your LLM to ensure that your prompt is within this limit. We recommend keeping system prompts around 2000-5000 tokens (roughly 1500-4000 words) for optimal performance across all models. EVI also uses prompt caching (e.g. see Anthropic docs) to minimize the cost and latency when using very long prompts.
How do system prompts work with supplemental LLMs?
When using a supplemental LLM, a single system prompt still shapes both text and speech generation. There is not a separate system prompt for EVI 2 - the prompt you specify in platform.hume.ai is the prompt that is used. All EVI-specific prompting instructions (like <voice_only_response_format>
) are included in the prompt sent to the supplemental LLM to help it generate text appropriate for voice conversations. This unified approach ensures consistent behavior across text generation and speech synthesis.
How exactly does Hume change the payload (the transcript and prompt) sent to the LLM provider?
When sending API requests to supplemental LLM providers, Hume sends the context, settings, and system prompt you provided, with only three possible modifications. These three changes, described below, help optimize the interaction for an empathic voice conversation.
-
Expression measures: For each transcribed user message, Hume appends stringified expression measurement results in a structured format as described in the Expressive prompt engineering section above:
This provides the LLM with emotional context from the user’s voice to generate more empathetic responses. Our speech-language model handles these expressions natively, but these strings are necessary to allow supplemental LLMs to respond to the expressions.
-
Normalization prompt: Hume appends a normalization prompt to your system prompt. This ensures consistent, stable, and fluid speech generation across different LLM providers, and means that developers don’t have to manually add this prompt to benefit from it. The exact normalization prompt can be found in the Normalize output text section above.
-
System default prompt (only when using a supplemental LLM with an empty prompt): When no custom prompt is provided (the prompt field is an empty string), Hume sends our system default prompt to the supplemental LLM.
These modifications work in conjunction with your custom system prompt while ensuring that responses remain appropriate for voice-based interactions.