Empathic Voice Interface (EVI)

Using a custom language model

For more customization, you can generate your own text using a custom language model.

The information on this page lays out how our custom language model functionality works at a high level; however, for detailed instructions and commented code, please see our example GitHub repository.

Overview

The custom language model feature enables developers to integrate their own language models with Hume’s Empathic User Interface (EVI), facilitating the creation of highly configurable and personalized user experiences. Developers create a socket that receives Hume conversation thread history, and your socket sends us the next text to say. Your backend socket can handle whatever custom business logic you have, and you just send the response back to us, which is then passed to the user.

Using your own LLM is intended for developers who need deep configurability for their use case. This includes full text customization for use cases like:

  • Advanced conversation steering: Implement complex logic to steer conversations beyond basic prompting, including managing multiple system prompts.
  • Regulatory compliance: Directly control and modify text outputs to meet specific regulatory requirements.
  • Context-aware text generation: Leverage dynamic agent metadata, such as remaining conversation time, to inform text generation.
  • Real-time data access: Utilize search engines within conversations to access and incorporate up-to-date information.
  • Retrieval augmented generation (RAG): Employ retrieval augmented generation techniques to enrich conversations by integrating external data without the need to modify the system prompt.

For these cases, function calling alone isn’t customizable enough, and with a custom language model you can create sophisticated workflows for your language model.

Custom language model flow diagram

Setup

Establish a Custom Text Socket

  • Initialization: See our example repository for instructions on setting up a custom text socket. This resource offers detailed guidance on both the setup process and the operational aspects of the code.
  • Hosting: Use Ngrok to publicly serve your socket. This step is needed to connect to the Hume system.
  • Configuration: Create a voice configuration, specifying “Custom language model” as the Language Model, and your socket’s WSS URL as the Custom Language Model URL.
  • Make request: When making your request to the Hume platform, include the config_id parameter, setting its value to the Voice configuration ID of your configuration.

Communication Protocol

  • Receiving data: Your socket will receive JSON payloads containing conversation thread history from the Hume system.
  • Processing: Apply your custom business logic and utilize your language model to generate appropriate responses based on the received conversation history.
  • Sending responses: Transmit the generated text responses back to our platform through the established socket connection to be forwarded to the end user.

For improved clarity and naturalness in generated text, we recommend transforming numerical values and abbreviations into their full verbal counterparts (e.g., converting “3” to “three” and “Dr.” to “doctor”).

Payload Structure

Below is the interface representing the overall structure of the message payloads sent by Hume:

1/*
2 * Represents the overall structure of the Welcome message.
3 */
4export interface Welcome {
5 // Array of message elements
6 messages: MessageElement[];
7 // Unique identifier for the session
8 custom_session_id: string;
9}
10
11/*
12 * Represents a single message element within the session.
13 */
14export interface MessageElement {
15 // Type of the message (e.g., user_message, assistant_message)
16 type: string;
17 // The message content and related details
18 message: Message;
19 // Models related to the message, primarily prosody analysis
20 models: Models;
21 // Optional timestamp details for when the message was sent
22 time?: Time;
23}
24
25/*
26 * Represents the content of the message.
27 */
28export interface Message {
29 // Role of the sender (e.g., user, assistant)
30 role: string;
31 // The textual content of the message
32 content: string;
33}
34
35/*
36 * Represents the models associated with a message.
37 */
38export interface Models {
39 // Prosody analysis details of the message
40 prosody: Prosody;
41}
42
43/*
44 * Represents the prosody analysis scores.
45 */
46export interface Prosody {
47 // Dictionary of prosody scores with emotion categories as keys
48 // and their respective scores as values
49 scores: { [key: string]: number };
50}
51
52/*
53 * Represents the timestamp details of a message.
54 */
55export interface Time {
56 // The start time of the message (in milliseconds)
57 begin: number;
58 // The end time of the message (in milliseconds)
59 end: number;
60}

Custom Session ID

For managing conversational state and connecting your frontend experiences with your backend data and logic, you should pass a custom_session_id in the SessionSettings message. When a custom_session_id is provided from the frontend SessionSettings message, the response sent from Hume to your backend includes this id, so you can correlate frontend users with their incoming messages.

Using a custom_session_id will enable you to:

  • maintain user state on your backend
  • pause/resume conversations
  • persist conversations across sessions
  • match frontend and backend connections

We recommend passing a custom_session_id if you are using a Custom Language Model.


Assistant Input and End Payload Format

These are the formats for sending messages to Hume:

assistant_input

The assistant_input payload is used to send text to the assistant. You can send multiple assistant_input payloads in a sequence to stream text to the assistant.

Format:

1{
2 "type": "assistant_input",
3 "text": "your_text_here"
4}

Example:

1{
2 "type": "assistant_input",
3 "text": "Hello, how are you?"
4}

assistant_end

The assistant_end payload indicates that your turn is over. This signals the end of the current stream of text inputs.

Format:

1{
2 "type": "assistant_end"
3}

Streaming Text to the Assistant

You can send multiple assistant_input payloads consecutively to stream text to the assistant. Once you are done sending inputs, you must send an assistant_end payload to indicate the end of your turn.

Example Sequence:

Step 1: Start streaming text

1{
2 "type": "assistant_input",
3 "text": "This is the first part of the text."
4}

Step 2: Continue streaming text

1{
2 "type": "assistant_input",
3 "text": "Here is the second part of the text."
4}

Step 3: Indicate the end of your turn

1{
2 "type": "assistant_end"
3}

Summary

  1. Send assistant_input payloads to stream text to the assistant.
  2. Send as many assistant_input payloads as needed.
  3. Send an assistant_end payload to indicate that your turn is over. By following this format, you ensure proper communication with the assistant API, enabling smooth and efficient interactions.