Empathic Voice Interface (EVI)

Using a custom language model

For more customization, you can use generate your own text using a custom model.

The information on this page lays out how our custom language model functionality works at a high level; however, for detailed instructions and commented code, please see our example GitHub repository.

Overview

The custom language model feature enables developers to integrate their own language models with Hume’s Empathic User Interface (EVI), facilitating the creation of highly configurable and personalized user experiences. Developers create a socket that receives Hume conversation thread history, and your socket sends us the next text to say. Your backend socket can handle whatever custom business logic you have, and you just send the response back to us, which is then passed to the user.

Using your own LLM is intended for developers who need deep configurability for their use case. This includes full text customization for use cases like:

  • Advanced conversation steering: Implement complex logic to steer conversations beyond basic prompting, including managing multiple system prompts.
  • Regulatory compliance: Directly control and modify text outputs to meet specific regulatory requirements.
  • Context-aware text generation: Leverage dynamic agent metadata, such as remaining conversation time, to inform text generation.
  • Real-time data access: Utilize search engines within conversations to access and incorporate up-to-date information.
  • Retrieval augmented generation (RAG): Employ retrieval augmented generation techniques to enrich conversations by integrating external data without the need to modify the system prompt.

For these cases, function calling alone isn’t customizable enough, and with a custom language model you can create sophisticated workflows for your language model.

Setup

Establish a Custom Text Socket

  • Initialization: See our example repository for instructions on setting up a custom text socket. This resource offers detailed guidance on both the setup process and the operational aspects of the code.
  • Hosting: Use Ngrok to publicly serve your socket. This step is needed to connect to the Hume system.
  • Configuration: Create a voice configuration, specifying “Custom language model” as the Language Model, and your socket’s WSS URL as the Custom Language Model URL.
  • Make request: When making your request to the Hume platform, include the config_id parameter, setting its value to the Voice configuration ID of your configuration.

Communication Protocol

  • Receiving data: Your socket will receive JSON payloads containing conversation thread history from the Hume system.
  • Processing: Apply your custom business logic and utilize your language model to generate appropriate responses based on the received conversation history.
  • Sending responses: Transmit the generated text responses back to our platform through the established socket connection to be forwarded to the end user.

For improved clarity and naturalness in generated text, we recommend transforming numerical values and abbreviations into their full verbal counterparts (e.g., converting “3” to “three” and “Dr.” to “doctor”).