Prompt engineering for empathic voice interfaces
System prompts shape the behavior, responses, and style of your custom empathic voice interface (EVI).
Creating an effective system prompt is an essential part of customizing an EVI’s behavior. For the most part, prompting EVI is the same as prompting any LLM, but there are some important differences. Prompting for EVIs is different for two main reasons:
- Prompts are for a voice-only interaction with the user, rather than a text-based chat.
- EVIs can respond to the user’s emotional expressions in their tone of voice, not just the text content of their messages.
Further, EVI is interoperable with any supplemental LLM, allowing developers to select the best model for their use case. For fast, conversational, relatively simple interactions, Hume’s voice-language model EVI 2 can handle text generation. However, frontier LLMs will perform better for more complex use cases involving reasoning, long or nuanced prompts, tool use, and other requirements.
If you select a supplemental LLM, your system prompt is sent to this LLM, which then generates all of the language in the chat while EVI generates the voice. EVI’s voice-language model will still take into account the previous language and audio context to generate the appropriate tone of voice. It can also still be prompted in the chat to change its behavior (e.g. “speak faster”).
Prompt engineering allows developers to customize EVI’s response style for any use case, from voice AIs for mental health support to customer service agents and beyond.
The system prompt is a powerful and flexible way to guide EVI’s responses, but it cannot dictate AI responses with absolute precision. See the limits of prompting section for more information. Careful prompt design and testing will help EVI behave as intended. If you need more control over EVI’s responses, try using our custom language model feature for complete control of text generation.
EVI-specific prompting instructions
The instructions below are specific to prompting empathic voice interfaces. For examples of these principles in action, see our EVI prompt examples repository.
Prompt for voice conversations
Most LLMs are trained for text-based interactions. Thus, providing guidelines on how the LLM should speak helps voice conversations with EVI feel much more fluid and natural. For example, see the instruction below:
If you find the default behavior of the LLM acceptable, then you may only need a very short system prompt. Customizing the LLM’s behavior more and maintaining consistency in longer and more varied conversations often requires longer prompts.
Expressive prompt engineering
Expressive prompt engineering is our term for instructing language models on how to use Hume’s expression measures in conversations. EVI measures the user’s vocal expressions in real time and converts them into text-based indicators to help the LLM understand not just what the user said, but how they said it. EVI detects 48 distinct expressions in the user’s voice and ranks these expressions by our model’s confidence that they are present. Text-based descriptions of the user’s top 3 expressions are appended to the end of each User message
to indicate the user’s tone of voice. You can use the system prompt to guide how the AI voice responds to these non-verbal cues of the user’s emotional expressions.
For example, our demo uses an instruction like the one below to help EVI respond to expressions:
Explain to the LLM exactly how to respond to expressions. For example, you may want EVI to use a tool to alert you over email if the user is very frustrated, or to explain a concept in depth whenever the user expresses doubt or confusion. You can also instruct EVI to detect and respond to mismatches between the user’s tone of voice and the text content of their speech:
EVI is designed for empathic conversations, and you can use expressive prompt engineering to customize how EVI empathizes with the user’s expressions for your use case.
Using dynamic variables in your prompt
Dynamic variables are values which can change during a conversation with EVI.
In order to function, dynamic variables must be manually defined within a chat’s session settings. To learn how to do so, visit our Conversational controls page.
Embedding dynamic variables into your system prompt can help personalize the user experience to reflect user-specific or changing information such as names, preferences, the current date, and other details.
In other words, dynamic variables may be used to customize EVI conversations with specific context for each user and each conversation. For example, you can adjust your system prompt to include conversation-specific information, such as a user’s favorite color or travel plans:
Using a website as EVI’s knowledge base
Web search is a built-in tool that allows EVI to search the web for up-to-date information. However, instead of searching the entire web, you can configure EVI to search within a single website using a system prompt.
Constraining EVI’s knowledge to a specific website enables creating domain-specific chatbots. For example, you could use this approach to create documentation assistants or product-specific support bots. By leveraging existing web content, it provides a quick alternative to full RAG implementations while still offering targeted information retrieval.
To use a website as EVI’s knowledge base, follow these steps:
-
Enable web search: Before you begin, ensure web search is enabled as a built-in tool in your EVI configuration. For detailed instructions, visit our Tool Use page.
-
Include a web search instruction: In your EVI configuration, modify the system prompt to include a
use_web_search
instruction. -
Specify a target domain: In the instruction, specify that
site:<target_domain>
be appended to all search queries, where the<target_domain>
is the URL of the website you’d like EVI to focus on. For example, you can create a documentation assistant using an instruction like the one below:
General LLM prompting guidelines
Best practices for prompt engineering also apply to EVIs. For example, ensure your prompts are clear, detailed, direct, and specific. Include necessary instructions and examples in the EVI’s system prompt to set expectations for the LLM. Define the context of the conversation, EVI’s role, personality, tone, and any other guidelines for its responses.
For example, to limit the length of the LLM’s responses, you may use a very clear and specific instruction like this:
Try to focus on telling the model what it should do (positive reinforcement) rather than what it shouldn’t do (negative reinforcement). LLMs have a harder time consistently avoiding behaviors, and adding undesired behaviors to the prompt may unintentionally promote them.
Test and evaluate prompts
Crafting an effective, robust system prompt often requires several iterations. Here are some key techniques for testing prompts:
-
Use gold standard examples for evaluation: Create a bank of ideal responses, then generate responses with EVI (or the supplemental LLM you use) and compare them to your gold standards. You can use a “judge LLM” for automated evaluations or compare the results yourself.
-
Test in real voice conversations: There’s no substitute for actually testing the EVI in live conversations on platform.hume.ai to ensure it sounds right, has the appropriate tone, and feels natural.
-
Isolate prompt components: Test each part of the prompt separately to confirm they are all working as intended. This helps identify which specific elements are effective or need improvement.
Start with 10-20 gold-standard examples of excellent conversations. Test the system prompt against these examples after making major changes. If the EVI’s responses don’t meet your expectations, adjust one part of the prompt at a time and re-test to ensure your changes are improving performance. Evaluation is a vital component of prompting, and it’s the best way to ensure your changes are making an impact.
Understand your LLM’s capabilities
Different LLMs have varying capabilities, limitations, and context windows. More advanced LLMs can handle longer, nuanced prompts, but are often slower and pricier. Simpler LLMs are faster and cheaper but require shorter, less complex prompts with fewer instructions and less nuance.
Some LLMs also have longer context windows - the number of tokens the model can process while generating a response, acting essentially as the model’s memory. Context windows range from 8k tokens (Gemma 7B), to 128k (GPT-4o), to 200k (Claude 3), to 2 million tokens (Gemini 1.5 Pro). Tailor your prompt length to fit within the LLM’s context window to ensure the model can use the full conversation history.
Use sections to divide your prompt
Separating longer prompts into titled sections helps the model distinguish between different instructions and follow prompts more reliably. The recommended format for these sections differs between language model providers. For example, OpenAI models often respond best to markdown sections (like ## Role
), while Anthropic models respond well to XML tags (like <role> </role>
). For example:
For Claude models, you may wrap your instructions in tags like <role>
, <personality>
, <response_style>
, or <examples>
, to structure your prompt. This format is not required, but it can improve the LLM’s instruction-following. At the end of your prompt, it may also be helpful to remind the LLM of key instructions.
Give few-shot examples
Use examples to show the LLM how it should respond - a technique known as few-shot learning. Including several concrete examples of ideal interactions that follow your guidelines is one of the most effective ways to improve responses. Use excellent examples that cover different edge cases and behaviors to reinforce your instructions. Structure these examples as messages, following the format for chat-tuned LLMs. For example:
If you notice that your EVI consistently fails to follow the prompt in certain situations, try providing examples that show how it should ideally respond in those situations.
The limits of prompting
While prompting is a powerful tool for customizing EVI’s behavior, it has certain limitations. Below are some details on what prompting can and cannot accomplish.
What prompting can do:
- Guide EVI’s language generation, response style, response format, and the conversation flow
- Direct EVI to use specific tools at appropriate times
- Influence EVI’s emotional tone and personality, which can also affect some characteristics of EVI’s voice (e.g. prompting EVI to be “warm and nurturing” will help EVI’s voice sound soothing, but will not change the base speaker)
- Help EVI respond appropriately to the user’s expressions and the context
What prompting cannot do:
- Change fundamental characteristics of the voice, like the accent, gender, or speaker identity
- Directly control speech parameters like speed (use in-conversation voice prompts instead)
- Give EVI knowledge of external context (date, time, user details) without dynamic variables or web search
- Override core safety features built into EVI or supplemental LLMs (e.g. that prevent EVI from providing harmful information)
Importantly, the generated language does influence how the voice sounds - for example, excited text (e.g. “Oh wow, that’s so interesting!”) will make EVI’s voice sound excited. However, to fundamentally change the voice characteristics, use our voice customization feature instead.
We are actively working on expanding EVI’s ability to follow system prompts for both language and voice generation. For now, focus prompting on guiding EVI’s conversational behavior and responses while working within these constraints.
Additional resources
To learn more about prompt engineering in general or to understand how to prompt different LLMs, please refer to these resources:
- EVI prompt examples: See examples of EVI prompts, including the full Hume default prompt.
- Hume EVI playground: Test out your system prompts in live conversations with EVI, and see how it responds differently when you change configuration options.
- OpenAI tokenizer: Useful for counting the number of tokens in a system prompt for OpenAI models, which use the same tokenizer (tiktoken).
- OpenAI prompt engineering guidelines: For prompting OpenAI models like GPT-4.
- OpenAI playground: For testing and evaluating OpenAI prompts in a chat interface.
- Anthropic prompt engineering guidelines: For prompting Anthropic models like Claude 3 Haiku
- Anthropic console: For testing and evaluating Anthropic prompts in a chat interface.
- Fireworks model playground: For testing out open-source models served on Fireworks.
- Vercel AI playground: Try multiple prompts and LLMs in parallel to compare their responses.
- Perplexity Labs: Try different models, including open-source LLMs, to evaluate their responses and their latency.
- Prompt engineering guide: An open-source guide from DAIR.ai with general methods and advanced techniques for prompting a wide variety of LLMs.
- Artificial analysis benchmarks: Compare LLM characteristics and performance across different benchmarks, latency metrics, and more.