Empathic Voice Interface (EVI)

Empathic Voice Interface (EVI)

Hume's Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI.

Hume’s Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI. It accepts live audio input and returns both generated audio and transcripts augmented with measures of vocal expression. By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice. These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.

We provide a suite of tools to integrate and customize EVI for your application, including a WebSocket API that handles audio and text transport, a REST API, and SDKs for TypeScript and Python to simplify integration into web and Python-based projects. Additionally, we provide open-source examples and a web widget as practical starting points for developers to explore and implement EVI’s capabilities within their own projects.

EVI communications flow diagram

Building with EVI

The main way to work with EVI is through a WebSocket connection that sends audio and receives responses in real-time. This enables fluid, bidirectional dialogue where users speak, EVI listens and analyzes their expressions, and EVI generates emotionally intelligent responses.

EVI supports two authentication strategies. Learn more about them at the links below:

Both methods require specifying the chosen authentication strategy and providing the corresponding key in the request parameters of the EVI WebSocket endpoint. Learn more about Hume’s authentication strategies here.

You start a conversation by connecting to the WebSocket and streaming the user’s voice input to EVI. You can also send EVI text, and it will speak that text aloud.

EVI will respond with:

  • The text of EVI’s reply
  • EVI’s expressive audio response
  • A transcript of the user’s message along with their vocal expression measures
  • Messages if the user interrupts EVI
  • A message to let you know if EVI has finished responding
  • Error messages if issues arise

Overview of EVI features

Basic capabilitiesTranscribes speech (ASR)

Fast and accurate ASR in partnership with Deepgram returns a full transcript of the conversation, with Hume’s expression measures tied to each sentence.

Generates language responses (LLM)

Rapid language generation with our eLLM, blended seamlessly with configurable partner APIs (OpenAI, Anthropic, Fireworks).

Generates voice responses (TTS)Streaming speech generation via our proprietary expressive text-to-speech model.
Responds with low latencyImmediate response provided by the fastest models running together on one service.
Empathic AI (eLLM) featuresResponds at the right time

Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.

Understands users’ prosody

Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.

Forms its own natural tone of voice

Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.

Responds to expression

Powered by our empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.

Always interruptible

Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.

Aligned with well-being

Trained on human reactions to optimize for positive expressions like happiness and satisfaction. EVI will continue to learn from users’ reactions using our upcoming fine-tuning endpoint.

Developer toolsWebSocket APIPrimary interface for real-time bidirectional interaction with EVI, handles audio and text transport.
REST API

A configuration API that allows developers to customize their EVI - the system prompt, speaking rate, voice, LLM, tools the EVI can use, and other options. The system prompt shapes an EVI’s behavior and its responses.

TypeScript SDKEncapsulates complexities of audio and WebSockets for seamless integration into web applications.
Python SDKSimplifies the process of integrating EVI into any Python-based project.
Open source examplesExample repositories provide a starting point for developers and demonstrate EVI’s capabilities.
Web widget

An iframe widget that any developer can easily embed in their website, allowing users to speak to a conversational AI voice about your content.

API limits

  • WebSocket connections limit: limited to up to five (5) concurrent connections.
  • WebSocket duration limit: connections are subject to a default timeout after thirty (30) minutes, or after ten (10) minutes of user inactivity. Duration limits may be adjusted by specifying the max_duration and inactivity fields in your EVI configuration.
  • WebSocket message payload size limit: messages cannot exceed 16MB in size.
  • Request rate limit: HTTP requests (e.g. configs endpoints) are limited to fifty (50) requests per second.

To request an increase in your concurrent connection limit, please submit the “Application to Increase EVI Concurrent Connections” found in the EVI section of the Profile Tab.