Empathic Voice Interface (EVI)

Empathic Voice Interface (EVI)

Hume's Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI.

General Availability

EVI will be generally available in April 2024. Sign up here: Notify me of public access!

Hume’s Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI. It accepts live audio input and returns both generated audio and transcripts augmented with measures of vocal expression. By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice. These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.

We provide a suite of tools to integrate and customize EVI for your application, including a WebSocket API that handles audio and text transport, a REST API, and SDKs for Typescript and Python to simplify integration into web and Python-based projects. Additionally, we provide open-source examples and a web widget as practical starting points for developers to explore and implement EVI’s capabilities within their own projects.

Building with EVI

The main way to work with EVI is through a WebSocket connection that sends audio and receives responses in real-time. This enables fluid, bidirectional dialogue where users speak, EVI listens and analyzes their expressions, and EVI generates emotionally intelligent responses.

You start a conversation by connecting to the WebSocket and streaming the user’s voice input to EVI. You can also send EVI text, and it will speak that text aloud.

EVI will respond with:

  • The text of EVI’s reply
  • EVI’s expressive audio response
  • A transcript of the user’s message along with their vocal expression measures
  • Messages if the user interrupts EVI
  • A message to let you know if EVI has finished responding
  • Error messages if issues arise

Detailed technical documentation will be available upon public release. To find out when EVI is released, click this link: Notify me of public access.

Overview of EVI features

Basic capabilitiesTranscribes speech (ASR)

Fast and accurate ASR in partnership with Deepgram returns a full transcript of the conversation, with Hume’s expression measures tied to each sentence.

Generates language responses (LLM)

Rapid language generation with our eLLM, blended seamlessly with configurable partner APIs (OpenAI, Anthropic, Fireworks).

Generates voice responses (TTS)

Streaming speech generation via our proprietary expressive text-to-speech model.

Responds with low latency

Immediate response provided by the fastest models running together on one service.

Empathic AI (eLLM) featuresResponds at the right time

Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.

Understands users’ prosody

Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.

Forms its own natural tone of voice

Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.

Responds to expression

Powered by our empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.

Always interruptible

Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.

Aligned with well-being

Trained on human reactions to optimize for positive expressions like happiness and satisfaction. EVI will continue to learn from users’ reactions using our upcoming fine-tuning endpoint.

Developer toolsWebSocket API

Primary interface for real-time bidirectional interaction with EVI, handles audio and text transport.

REST API

A configuration API that allows developers to customize their EVI - the system prompt, speaking rate, voice, LLM, tools the EVI can use, and other options. The system prompt shapes an EVI’s behavior and its responses.

Typescript SDK

Encapsulates complexities of audio and WebSockets for seamless integration into web applications.

Python SDK

Simplifies the process of integrating EVI into any Python-based project.

Open source examples

Example repositories provide a starting point for developers and demonstrate EVI’s capabilities.

Web widget

An iframe widget that any developer can easily embed in their website, allowing users to speak to a conversational AI voice about your content.