Empathic Voice Interface (EVI)

Empathic Voice Interface (EVI)

Hume's Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI.

Hume’s Empathic Voice Interface (EVI) is the world’s first emotionally intelligent voice AI. It accepts live audio input and returns both generated audio and transcripts augmented with measures of vocal expression. By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice. These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.

We provide a suite of tools to integrate and customize EVI for your application, including a WebSocket API that handles audio and text transport, a REST API, and SDKs for TypeScript and Python to simplify integration into web and Python-based projects. Additionally, we provide open-source examples and a web widget as practical starting points for developers to explore and implement EVI’s capabilities within their own projects.

Building with EVI

The main way to work with EVI is through a WebSocket connection that sends audio and receives responses in real-time. This enables fluid, bidirectional dialogue where users speak, EVI listens and analyzes their expressions, and EVI generates emotionally intelligent responses.

You start a conversation by connecting to the WebSocket and streaming the user’s voice input to EVI. You can also send EVI text, and it will speak that text aloud.

EVI will respond with:

  • The text of EVI’s reply
  • EVI’s expressive audio response
  • A transcript of the user’s message along with their vocal expression measures
  • Messages if the user interrupts EVI
  • A message to let you know if EVI has finished responding
  • Error messages if issues arise

Overview of EVI features

Basic capabilitiesTranscribes speech (ASR)

Fast and accurate ASR in partnership with Deepgram returns a full transcript of the conversation, with Hume’s expression measures tied to each sentence.

Generates language responses (LLM)

Rapid language generation with our eLLM, blended seamlessly with configurable partner APIs (OpenAI, Anthropic, Fireworks).

Generates voice responses (TTS)Streaming speech generation via our proprietary expressive text-to-speech model.
Responds with low latencyImmediate response provided by the fastest models running together on one service.
Empathic AI (eLLM) featuresResponds at the right time

Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.

Understands users’ prosody

Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our eLLM.

Forms its own natural tone of voice

Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.

Responds to expression

Powered by our empathic large language model (eLLM), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.

Always interruptible

Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.

Aligned with well-being

Trained on human reactions to optimize for positive expressions like happiness and satisfaction. EVI will continue to learn from users’ reactions using our upcoming fine-tuning endpoint.

Developer toolsWebSocket APIPrimary interface for real-time bidirectional interaction with EVI, handles audio and text transport.
REST API

A configuration API that allows developers to customize their EVI - the system prompt, speaking rate, voice, LLM, tools the EVI can use, and other options. The system prompt shapes an EVI’s behavior and its responses.

TypeScript SDKEncapsulates complexities of audio and WebSockets for seamless integration into web applications.
Python SDKSimplifies the process of integrating EVI into any Python-based project.
Open source examplesExample repositories provide a starting point for developers and demonstrate EVI’s capabilities.
Web widget

An iframe widget that any developer can easily embed in their website, allowing users to speak to a conversational AI voice about your content.

API limits

  • Request rate limit: limited to fifty (50) requests per second.
  • Payload size limit: messages cannot exceed 16MB in size.
  • WebSocket connections limit: limited to up to two (2) concurrent connections.
  • WebSocket duration limit: connections are subject to a timeout after thirty (30) minutes of activity, or after one (1) minute of inactivity.

To request an increase in your concurrent connection limit, please submit this form.

Authentication

The Empathic Voice Interface (EVI) supports two authentication strategies:

  1. OAuth strategy: this strategy is tailored for client-side development. It involves an additional step of obtaining an access token by generating a client ID and making an API request to fetch the access token. This extra step adds a layer of security to ensure your API key does not get exposed.
  2. API key strategy: designed for server-side development, this strategy allows developers to establish an authenticated WebSocket connection directly using their API key. This eliminates the need for an additional access token request.

Using either strategy, establishing an authenticated connection requires that you specify the authentication strategy and supply the corresponding key in the request parameters of the EVI WebSocket endpoint. See step-by-step instructions for obtaining an access token below:

Obtain API keys

Your API key and client secret can both be accessed from the Portal:

  1. Sign in to Hume
  2. Navigate to the API Keys page

Fetch access token

Using your API key and client secret, a client ID can now be generated. To generate your client ID you’ll need to concatenate your API key and client secret, separated by a colon (:), then base64 encode the string. With your client ID you can now initiate a POST request to https://api.hume.ai/oauth2-cc/token to receive your access token.

1# Configuration variables
2apiKey="${API_KEY}" # Sourced from environment variable or secure store
3clientSecret="${CLIENT_SECRET}" # Sourced from environment variable or secure store
4# Base64 encode API Key and Client Secret
5clientId=$(echo -n "$apiKey:$clientSecret" | base64)
6# Perform the API request
7response=$(curl -s --location 'https://api.hume.ai/oauth2-cc/token' \
8 --header 'Content-Type: application/x-www-form-urlencoded' \
9 --header "Authorization: Basic $clientId" \
10 --data-urlencode 'grant_type=client_credentials')

Your access token can now be used to establish an authenticated Web Socket connection.