Using a custom language model

Use a custom language model to generate your own text, for maximum configurability.

To get started quickly, please see the custom language model example in our example GitHub repository.

Overview

The custom language model (CLM) feature allows you to use your own language model to drive EVI’s responses. When you configure a custom language model, EVI will send requests to your server with textual conversation history and emotional context. Your server is responsible for responding with the text that EVI should speak next.

A custom language model can be:

  • A frontier model from an LLM provider like OpenAI or Anthropic “wrapped” with custom pre-processing or post-processing logic.
  • A language model that you have trained and host yourself.
  • Anything that produces text: it doesn’t have to be an LLM.

CLMs are appropriate for use cases that involve deep configurability, for example:

  • Advanced conversation steering: Implement complex logic to steer conversations beyond basic prompting, including managing multiple system prompts or controlling all of the text outputs.
  • Regulatory compliance: Directly control, post-process, or modify text outputs to meet specific regulatory requirements.
  • Unreleased LLMs: Custom language models allow organizations to use non-public, proprietary LLMs for all the text generation while using EVI.
  • Retrieval augmented generation (RAG): Employ retrieval augmented generation techniques to enrich conversations by integrating external data without the need to modify the system prompt.

You should prefer using conversational controls instead of a CLM for use cases that do not require deep configurability. When Hume connects to an upstream LLM provider directly, it covers the cost of usage, and this results in less latency compared to if Hume connects to your CLM which connects to an upstream LLM provider.

Set up the config

First, create a new config, or update an existing config and select the “custom language model” option in the “Set up LLM” step. Type in the URL of your custom language model endpoint. If you are using the SSE interface (recommended), the URL should start with https:// and end with /chat/completions. If you are using websockets, the URL should start with wss://. The endpoint needs to be accessible from the public Internet. If you are developing locally, you can use a service like ngrok to give your local server a publicly accessible URL.

custom language model Configuration

Server-Sent Events

The recommended way to set up a CLM is to expose an POST /chat/completions endpoint that responds with a stream of Server-Sent Events (SSEs) in a format compatible with OpenAI’s POST /v1/chat/completions endpoint

Please reference the project in our examples repo for a runnable example.

Because EVI expects the events to be in the same format as OpenAI’s chat completions, it is straightforward to a build a CLM that simply “wraps” an OpenAI model with preprocessing or postprocessing logic. More effort is required to build a CLM to wrap a model from a different provider: you will have to convert the output of your model to the OpenAI format.

The following example shows how to build a CLM by “wrapping” an upstream LLM provided by OpenAI. The steps are:

  1. Listen for POST requests to /chat/completions.
  2. Parse the request and extract only the role and content fields from each message in the message history. (Hume also supplies prosody information and other metadata. In this example, we simply discard that information, but you might attempt to reflect it by adding or modifying the messages you pass upstream.)
  3. Use the OpenAI SDK to make a request to the upstream OpenAI POST /chat/completions endpoint, passing in the message history and "stream": true.
  4. Reformat the data from OpenAI into Server-Side Events (while the OpenAI API originally sends data in the form of SSEs, the OpenAI SDK automatically unwraps them, and so to transmit the data back to Hume you have to rewrap it).
  5. Stream the SSEs back to Hume.
1from typing import AsyncIterable, Optional
2import fastapi
3from fastapi.responses import StreamingResponse
4from openai.types.chat import ChatCompletionChunk, ChatCompletionMessageParam
5import openai
6import os
7from fastapi import HTTPException, Security
8from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials
9
10app = fastapi.FastAPI()
11
12"""
13This script creates a FastAPI server that Hume will send requests to, and
14the server will stream responses back to Hume.
15To run, use: uvicorn sse.sse:app --reload
16"""
17
18client = openai.AsyncOpenAI(api_key=os.environ["OPENAI_API_KEY"])
19
20async def get_response(
21 raw_messages: list[dict],
22 custom_session_id: Optional[str],
23) -> AsyncIterable[str]:
24 # Remove prosody scores and other Hume metadata
25 messages: list[ChatCompletionMessageParam] = [
26 {"role": m["role"], "content": m["content"]} for m in raw_messages
27 ]
28
29 chat_completion_chunk_stream = await client.chat.completions.create(
30 messages=messages,
31 model="gpt-4o",
32 stream=True,
33 )
34
35 async for chunk in chat_completion_chunk_stream:
36 yield "data: " + chunk.model_dump_json(exclude_none=True) + "\n\n"
37 yield "data: [DONE]\n\n"
38
39security = HTTPBearer()
40API_KEY = "your-secret-key-here" # Use environment variables in production
41
42async def verify_token(credentials: HTTPAuthorizationCredentials = Security(security)):
43 if credentials.credentials != API_KEY:
44 raise HTTPException(status_code=401, detail="Invalid authentication token")
45 return credentials.credentials
46
47@app.post("/chat/completions", response_class=StreamingResponse)
48async def root(
49 request: fastapi.Request,
50 token: str = Security(verify_token)
51):
52 """Chat completions endpoint with Bearer token authentication"""
53 request_json = await request.json()
54 messages = request_json["messages"]
55 print(messages)
56
57 custom_session_id = request.query_params.get("custom_session_id")
58 print(custom_session_id)
59
60 return StreamingResponse(
61 get_response(messages, custom_session_id=custom_session_id),
62 media_type="text/event-stream",
63 )

Testing your SSE endpoint

To verify that you have successfully implemented an OpenAI-compatible POST /chat/completions endpoint, you can use the OpenAI SDK but pointed at your server, not api.openai.com. Below is an example verification script (assumes your server is running on localhost:8000):

1import asyncio
2from openai import AsyncOpenAI
3
4client = AsyncOpenAI(
5 base_url="http://localhost:8000",
6 default_query={"custom_session_id": "123"},
7 api_key="your-secret-key-here", # Sent as a Bearer token
8)
9
10async def main():
11 chat_completion_chunk_stream = await client.chat.completions.create(
12 model="hume",
13 messages=[],
14 stream=True,
15 extra_body={
16 "messages": [
17 {
18 "role": "user",
19 "content": "Hello, how are you?",
20 "time": {
21 "begin": 0,
22 "end": 1000,
23 },
24 "models": {
25 "prosody": {
26 "scores": {
27 "Sadness": 0.1,
28 "Joy": 0.2,
29 },
30 },
31 },
32 },
33 ],
34 },
35 )
36 async for chunk in chat_completion_chunk_stream:
37 print(chunk)
38
39if __name__ == "__main__":
40 asyncio.run(main())

Providing an API Key

If your SSE endpoint requires an API key, send it in the language_model_api_key message using a session_settings message when a session begins:

1{
2 "type": "session_settings",
3 "language_model_api_key": "<your-secret-key-here>"
4}

This will cause

WebSockets

We recommend using the SSE interface for your CLM. SSEs are simpler, allow for better security, and have better latency properties. In the past, the WebSocket interface was the only option, so the instructions are preserved here.

Please reference the project in our examples repo for a runnable example.

To use a CLM with WebSockets, the steps are:

1

Set up an EVI config

Use the web interface or the /v0/evi/configs API to create a configuration. Select “custom language model” and provide the URL of your WebSocket endpoint. If you are developing locally, you can use a service like ngrok to expose give your local server a publicly accessible URL.

2

The chat starts

Next, your frontend (or Twilio, if you are using the inbound phone calling endpoint) will connect to EVI via the /v0/evi/chat endpoint, with config_id of that configuration.

3

EVI connects to your CLM WebSocket endpoint

EVI will open a WebSocket connection to your server, via the URL you provided when setting up the configuration. This connection the CLM socket, as opposed to the Chat socket that is already open between the client and EVI).

4

EVI sends messages over the CLM socket

As the user interacts with EVI, EVI will send messages over the CLM socket to your server, containing the conversation history and emotional context.

5

Your server responds

Your server is responsible for sending two types of message back over the CLM socket to EVI:

  • assistant_input messages containing text to speak, and
  • assistant_end messages to indicate when the AI has finished responding, yielding the conversational turn back to the user.

You can send multiple assistant_input payloads consecutively to stream text to the assistant. Once you are done sending inputs, you must send an assistant_end payload to indicate the end of your turn.

Custom Session IDs

For managing conversational state and connecting your frontend experiences with your backend data and logic, you should set a custom_session_id for the chat.

Using a custom_session_id will enable you to:

  • maintain user state on your backend
  • pause/resume conversations
  • persist conversations across sessions
  • match frontend and backend connections

There are two ways to set a custom_session_id:

  1. From the client: if your frontend connects to EVI via the /chat WebSocket endpoint, you can send a session_settings message over the WebSocket with the custom_session_id field set.
  2. From the CLM endpoint: if your CLM uses the SSE interface, you can set the custom_session_id as a system_fingerprint on the ChatCompletion type within the message events. With WebSockets, you can include the custom_session_id on the assistant_input message. Use this option if you don’t have control over the WebSocket connection to the client (for example, if you are using the /v0/evi/twilio endpoint for inbound phone calling).
1async for chunk in chat_completion_chunk_stream:
2 chunk.system_fingerprint = "<your_id_here>" # Replace with your custom_session_id
3 yield "data: " + chunk.model_dump_json(exclude_none=True) + "\n\n"
4yield "data: [DONE]\n\n"

You only need to set the custom_session_id once per chat. EVI will remember the custom_session_id for the duration of the conversation.

After you set the custom_session_id, for SSE endpoints, the custom_session_id will be send as a query parameter to your endpoint. For example POST https://api.example.com/chat/completions?custom_session_id=123. For WebSocket endpoints, the custom_session_id will be included as a top-level property on the incoming message.

If you are sourcing your CLM responses from OpenAI, be careful not to inadvertently override your intended custom_session_id with OpenAI’s system_fingerprint. If you are setting your own custom_session_id, you should always either delete system_fingerprint from OpenAI messages before forwarding them to EVI, or override them with the desired custom_session_id.