Custom Language Model

To get started quickly, please see the custom language model example in our example GitHub repository.

Overview

The custom language model (CLM) feature allows you to use your own language model to drive EVI’s responses. When you configure a custom language model, EVI will send requests to your server with textual conversation history and emotional context. Your server is responsible for responding with the text that EVI should speak next.

A custom language model can be:

A frontier model from an LLM provider like OpenAI or Anthropic “wrapped” with custom pre-processing or post-processing logic.
A language model that you have trained and host yourself.
Anything that produces text: it doesn’t have to be an LLM.

CLMs are appropriate for use cases that involve deep configurability, for example:

Advanced conversation steering: Implement complex logic to steer conversations beyond basic prompting, including managing multiple system prompts or controlling all of the text outputs.
Regulatory compliance: Directly control, post-process, or modify text outputs to meet specific regulatory requirements.
Unreleased LLMs: Custom language models allow organizations to use non-public, proprietary LLMs for all the text generation while using EVI.
Retrieval augmented generation (RAG): Employ retrieval augmented generation techniques to enrich conversations by integrating external data without the need to modify the system prompt.

You should prefer using context injection instead of a CLM for use cases that do not require deep configurability. When Hume connects to an upstream LLM provider directly, it covers the cost of usage, and this results in less latency compared to if Hume connects to your CLM which connects to an upstream LLM provider.

Set up the config

First, create a new config, or update an existing config and select the “custom language model” option in the “Set up LLM” step. Type in the URL of your custom language model endpoint. If you are using the SSE interface (recommended), the URL should start with https:// and end with /chat/completions. If you are using websockets, the URL should start with wss://. The endpoint needs to be accessible from the public internet. If you are developing locally, you can use a service like ngrok to give your local server a publicly accessible URL.

Server-Sent Events

The recommended way to set up a CLM is to expose an POST /chat/completions endpoint that responds with a stream of Server-Sent Events (SSEs) in a format compatible with OpenAI’s POST /v1/chat/completions endpoint

Please reference the project in our examples repository for a runnable example.

What are Server-Sent Events?

Server-Sent Events describe a type of HTTP response that conforms to a certain web standard where

There is a Content-Type: text/event-stream header.
The body is an “Event Stream”, i.e. it follows a specific format that breaks it up into discrete “events”.
The body is transmitted in pieces, as events occur, rather than being buffered until it is complete and sent all at once.
There is no Content-Length header, as the length of the entire response is not known in advance.

Because EVI expects the events to be in the same format as OpenAI’s chat completions, it is straightforward to a build a CLM that simply “wraps” an OpenAI model with preprocessing or postprocessing logic. More effort is required to build a CLM to wrap a model from a different provider: you will have to convert the output of your model to the OpenAI format.

OpenAI-compatible

Other provider

The following example shows how to build a CLM by “wrapping” an upstream LLM provided by OpenAI. The steps are:

Listen for POST requests to /chat/completions.
Parse the request and extract only the role and content fields from each message in the message history. (Hume also supplies prosody information and other metadata. In this example, we simply discard that information, but you might attempt to reflect it by adding or modifying the messages you pass upstream.)
Use the OpenAI SDK to make a request to the upstream OpenAI POST /chat/completions endpoint, passing in the message history and "stream": true.
Reformat the data from OpenAI into Server-Side Events (while the OpenAI API originally sends data in the form of SSEs, the OpenAI SDK automatically unwraps them, and so to transmit the data back to Hume you have to rewrap it).
Stream the SSEs back to Hume.

1 from typing import AsyncIterable, Optional
2 import fastapi
3 from fastapi.responses import StreamingResponse
4 from openai.types.chat import ChatCompletionChunk, ChatCompletionMessageParam
5 import openai
6 import os
7 from fastapi import HTTPException, Security
8 from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
9 
10 app = fastapi.FastAPI()
11 
12 """
13 This script creates a FastAPI server that Hume will send requests to, and
14 the server will stream responses back to Hume.
15 To run, use: uvicorn sse.sse:app --reload
16 """
17 
18 client = openai.AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
19 
20 async def get_response(
21     raw_messages: list[dict],
22     custom_session_id: Optional[str],
23 ) -> AsyncIterable[str]:
24     # Remove prosody scores and other Hume metadata
25     messages: list[ChatCompletionMessageParam] = [
26         {"role": m["role"], "content": m["content"]} for m in raw_messages
27     ]
28 
29     chat_completion_chunk_stream = await client.chat.completions.create(
30         messages=messages,
31         model="gpt-4o",
32         stream=True,
33     )
34 
35     async for chunk in chat_completion_chunk_stream:
36 	    yield "data: " + chunk.model_dump_json(exclude_none=True) + "\n\n"
37     yield "data: [DONE]\n\n"
38 
39 security = HTTPBearer()
40 API_KEY = "your-secret-key-here"  # Use environment variables in production
41 
42 async def verify_token(credentials: HTTPAuthorizationCredentials = Security(security)):
43     if credentials.credentials != API_KEY:
44         raise HTTPException(status_code=401, detail="Invalid authentication token")
45     return credentials.credentials
46 
47 @app.post("/chat/completions", response_class=StreamingResponse)
48 async def root(
49     request: fastapi.Request,
50     token: str = Security(verify_token)
51 ):
52     """Chat completions endpoint with Bearer token authentication"""
53     request_json = await request.json()
54     messages = request_json["messages"]
55     print(messages)
56 
57     custom_session_id = request.query_params.get("custom_session_id")
58     print(custom_session_id)
59 
60     return StreamingResponse(
61         get_response(messages, custom_session_id=custom_session_id),
62         media_type="text/event-stream",
63     )

Testing your SSE endpoint

To verify that you have successfully implemented an OpenAI-compatible POST /chat/completions endpoint, you can use the OpenAI SDK but pointed at your server, not api.openai.com. Below is an example verification script (assumes your server is running on localhost:8000):

1 import asyncio
2 from openai import AsyncOpenAI
3 
4 client = AsyncOpenAI(
5     base_url="http://localhost:8000",
6     default_query={"custom_session_id": "123"},
7     api_key="your-secret-key-here",  # Sent as a Bearer token
8 )
9 
10 async def main():
11     chat_completion_chunk_stream = await client.chat.completions.create(
12         model="hume",
13         messages=[],
14         stream=True,
15         extra_body={
16             "messages": [
17                 {
18                     "role": "user",
19                     "content": "Hello, how are you?",
20                     "time": {
21                         "begin": 0,
22                         "end": 1000,
23                     },
24                     "models": {
25                         "prosody": {
26                             "scores": {
27                                 "Sadness": 0.1,
28                                 "Joy": 0.2,
29                             },
30                         },
31                     },
32                 },
33             ],
34         },
35     )
36     async for chunk in chat_completion_chunk_stream:
37         print(chunk)
38 
39 if __name__ == "__main__":
40     asyncio.run(main())

Providing an API Key

If your SSE endpoint requires an API key, send it in the language_model_api_key message using a session_settings message when a session begins:

1 {
2   "type": "session_settings",
3   "language_model_api_key": "<your-secret-key-here>"
4 }

This will cause cause a header Authorization: Bearer <your-secret-key-here> to be sent as a request header.

WebSockets

We recommend using the SSE interface for your CLM. SSEs are simpler, allow for better security, and have better latency properties. In the past, the WebSocket interface was the only option, so the instructions are preserved here.

Please reference the project in our examples repository for a runnable example.

To use a CLM with WebSockets, the steps are:

Set up an EVI config

Use the web interface or the /v0/evi/configs API to create a configuration. Select “custom language model” and provide the URL of your WebSocket endpoint. If you are developing locally, you can use a service like ngrok to expose give your local server a publicly accessible URL.

The chat starts

Next, your frontend (or Twilio, if you are using the inbound phone calling endpoint) will connect to EVI via the /v0/evi/chat endpoint, with config_id of that configuration.

EVI connects to your CLM WebSocket endpoint

EVI will open a WebSocket connection to your server, via the URL you provided when setting up the configuration. This connection the CLM socket, as opposed to the Chat socket that is already open between the client and EVI).

EVI sends messages over the CLM socket

As the user interacts with EVI, EVI will send messages over the CLM socket to your server, containing the conversation history and emotional context.

CLM incoming message data format

1 /* Represents the structure of the messages sent over the CLM socket by EVI to your server */
2 interface IncomingMessage {
3   // Array of message elements
4   messages: MessageElement[];
5   // Unique identifier for the session
6   custom_session_id: string;
7 }
8 
9 /* Represents a single message element within the session. */
10 interface MessageElement {
11   // Type of the message (e.g., user_message, assistant_message)
12   type: string;
13   // The message content and related details
14   message: Message;
15   // Models related to the message, primarily prosody analysis
16   models: Models;
17   // Optional timestamp details for when the message was sent
18   time?: Time;
19 }
20 
21 /*
22  * Represents the content of the message.
23  */
24 interface Message {
25   // Role of the sender (e.g., user, assistant)
26   role: string;
27   // The textual content of the message
28   content: string;
29 }
30 
31 /*
32  * Represents the models associated with a message.
33  */
34 interface Models {
35   // Prosody analysis details of the message
36   prosody: Prosody;
37 }
38 
39 /*
40  * Represents the prosody analysis scores.
41  */
42 interface Prosody {
43   // Dictionary of prosody scores with emotion categories as keys
44   // and their respective scores as values
45   scores: { [key: string]: number };
46 }
47 
48 /*
49  * Represents the timestamp details of a message.
50  */
51 interface Time {
52   // The start time of the message (in milliseconds)
53   begin: number;
54   // The end time of the message (in milliseconds)
55   end: number;
56 }

Your server responds

Your server is responsible for sending two types of message back over the CLM socket to EVI:

assistant_input messages containing text to speak, and
assistant_end messages to indicate when the AI has finished responding, yielding the conversational turn back to the user.

CLM outgoing message data format

1 type OutgoingCLMMessage = AssistantInputMessage | AssistantEndMessage;
2 
3 interface AssistantInputMessage {
4   type: "assistant_input",
5   text: string
6 }
7 
8 interface AssistantEndMessage {
9   type: "assistant_end"
10 }

You can send multiple assistant_input payloads consecutively to stream text to the assistant. Once you are done sending inputs, you must send an assistant_end payload to indicate the end of your turn.

Custom Session IDs

For managing conversational state and connecting your frontend experiences with your backend data and logic, you should set a custom_session_id for the chat.

Using a custom_session_id will enable you to:

maintain user state on your backend
pause/resume conversations
persist conversations across sessions
match frontend and backend connections

There are two ways to set a custom_session_id:

From the client: if your frontend connects to EVI via the /chat WebSocket endpoint, you can send a session_settings message over the WebSocket with the custom_session_id field set.
From the CLM endpoint: if your CLM uses the SSE interface, you can set the custom_session_id as a system_fingerprint on the ChatCompletion type within the message events. With WebSockets, you can include the custom_session_id on the assistant_input message. Use this option if you don’t have control over the WebSocket connection to the client (for example, if you are using the /v0/evi/twilio endpoint for inbound phone calling).

SSE

WebSocket

1 async for chunk in chat_completion_chunk_stream:
2   chunk.system_fingerprint = "<your_id_here>"  # Replace with your custom_session_id
3   yield "data: " + chunk.model_dump_json(exclude_none=True) + "\n\n"
4 yield "data: [DONE]\n\n"

You only need to set the custom_session_id once per chat. EVI will remember the custom_session_id for the duration of the conversation.

After you set the custom_session_id, for SSE endpoints, the custom_session_id will be send as a query parameter to your endpoint. For example POST https://api.example.com/chat/completions?custom_session_id=123. For WebSocket endpoints, the custom_session_id will be included as a top-level property on the incoming message.

If you are sourcing your CLM responses from OpenAI, be careful not to inadvertently override your intended custom_session_id with OpenAI’s system_fingerprint. If you are setting your own custom_session_id, you should always either delete system_fingerprint from OpenAI messages before forwarding them to EVI, or override them with the desired custom_session_id.