Using a custom language model
For more customization, you can generate your own text using a custom language model.
The information on this page lays out how our custom language model functionality works at a high level; however, for detailed instructions and commented code, please see our example GitHub repository.
Overview
The custom language model feature enables developers to integrate their own language models with Hume’s Empathic User Interface (EVI), facilitating the creation of highly configurable and personalized user experiences. Developers create a socket that receives Hume conversation thread history, and your socket sends us the next text to say. Your backend socket can handle whatever custom business logic you have, and you just send the response back to us, which is then passed to the user.
Using your own LLM is intended for developers who need deep configurability for their use case. This includes full text customization for use cases like:
- Advanced conversation steering: Implement complex logic to steer conversations beyond basic prompting, including managing multiple system prompts.
- Regulatory compliance: Directly control and modify text outputs to meet specific regulatory requirements.
- Context-aware text generation: Leverage dynamic agent metadata, such as remaining conversation time, to inform text generation.
- Real-time data access: Utilize search engines within conversations to access and incorporate up-to-date information.
- Retrieval augmented generation (RAG): Employ retrieval augmented generation techniques to enrich conversations by integrating external data without the need to modify the system prompt.
For these cases, function calling alone isn’t customizable enough, and with a custom language model you can create sophisticated workflows for your language model.
Setup
Establish a Custom Text Socket
- Initialization: See our example repository for instructions on setting up a custom text socket. This resource offers detailed guidance on both the setup process and the operational aspects of the code.
- Hosting: Use Ngrok to publicly serve your socket. This step is needed to connect to the Hume system.
- Configuration: Create a voice configuration, specifying “Custom language model” as the Language Model, and your socket’s WSS URL as the Custom Language Model URL.
- Make request: When making your request to the Hume platform, include the
config_id
parameter, setting its value to the Voice configuration ID of your configuration.
Communication Protocol
- Receiving data: Your socket will receive JSON payloads containing conversation thread history from the Hume system.
- Processing: Apply your custom business logic and utilize your language model to generate appropriate responses based on the received conversation history.
- Sending responses: Transmit the generated text responses back to our platform through the established socket connection to be forwarded to the end user.
For improved clarity and naturalness in generated text, we recommend transforming numerical values and abbreviations into their full verbal counterparts (e.g., converting “3” to “three” and “Dr.” to “doctor”).
Payload Structure
Below is the interface representing the overall structure of the message payloads sent by Hume:
Custom Session ID
For managing conversational state and connecting your frontend experiences with your backend data and logic, you should pass a custom_session_id
in the SessionSettings
message. When a custom_session_id
is provided from the frontend SessionSettings
message, the response sent from Hume to your backend includes this id, so you can correlate frontend users with their incoming messages.
Using a custom_session_id
will enable you to:
- maintain user state on your backend
- pause/resume conversations
- persist conversations across sessions
- match frontend and backend connections
We recommend passing a custom_session_id
if you are using a Custom Language Model.
Assistant Input and End Payload Format
These are the formats for sending messages to Hume:
assistant_input
The assistant_input
payload is used to send text to the assistant. You can send multiple assistant_input
payloads in a sequence to stream text to the assistant.
Format:
Example:
assistant_end
The assistant_end
payload indicates that your turn is over. This signals the end of the current stream of text inputs.
Format:
Streaming Text to the Assistant
You can send multiple assistant_input
payloads consecutively to stream text to the assistant. Once you are done sending inputs, you must send an assistant_end
payload to indicate the end of your turn.
Example Sequence:
Step 1: Start streaming text
Step 2: Continue streaming text
Step 3: Indicate the end of your turn
Summary
- Send
assistant_input
payloads to stream text to the assistant. - Send as many
assistant_input
payloads as needed. - Send an
assistant_end
payload to indicate that your turn is over. By following this format, you ensure proper communication with the assistant API, enabling smooth and efficient interactions.