Speech-to-speech (EVI)

We’re officially sunsetting EVI versions 1 and 2 on August 30, 2025. To keep things running smoothly, be sure to migrate to EVI 3 before then.

Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI. EVI measures users’ nuanced vocal modulations and responds to them using a speech-language model, which guides language and speech generation.

By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice.

These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.

Explore EVI 3, the latest iteration of the Empathic Voice Interface, in the demo below!

EVI features

Basic capabilities

Feature	Description
Transcription (ASR)	Fast and accurate ASR returns a full transcript of the conversation, with Hume’s expression measures tied to each sentence.
Text response (LLM)	Rapid language generation with our speech-language model, optionally supplemented with configurable partner APIs (Anthropic, OpenAI, Google, Fireworks, and more).
Voice response (TTS)	Streamed speech generation via our speech-language model.
Low latency response	Immediate response provided by the fastest models running together on one service.

Empathic AI Features

Feature	Description
Responds at the right time	Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.
Understands users’ prosody	Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our speech-language model.
Forms its own natural tone of voice	Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.
Responds to expression	Powered by our empathic large language model (speech-language model), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.
Always interruptible	Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.

Quickstart

Kickstart your integration with our quickstart guides for Next.js, TypeScript, and Python. Each guide walks you through integrating the EVI API, capturing user audio, and playing back EVI’s response so you can get up and running quickly.

Next.js Quickstart

Build web applications using our React client SDK in Next.js.

TypeScript Quickstart

Develop server-side or frontend applications using our TypeScript SDK.

Python Quickstart

Create integrations in Python using our Python SDK.

Building with EVI

EVI chat sessions run over a real-time WebSocket connection, enabling fluid, interactive dialogue. Users speak naturally while EVI analyzes their vocal expression and responds with emotionally intelligent speech.

Authentication

REST endpoints support the API key authentication strategy. specify your API key in the X-HUME-API-KEY header of your request.

The EVI WebSocket endpoint supports both the API key and Token authentication strategies, specify your API key or Access token in the query parameters of your request.

Configuration

Before starting a session, you’ll need a voice and a configuration.

Design a voice, clone an existing one, or select one from Hume’s extensive Voice Library.
Build an EVI configuration to define system behavior, voice selection, and other settings.

Connection

The EVI Playground is the easiest way to test your configuration. It lets you speak directly with EVI using your selected voice and settings, without writing any code.

To begin a conversation, connect using the EVI WebSocket URL start streaming the user’s audio input, via audio_input messages. EVI responds in real time with a sequence of structured messages:

user_message: Message containing a transcript of the user’s message along with their vocal expression measures
assistant_message: Message containing EVI’s response content.
audio_output: EVI’s response audio corresponding with the assistant_message
assistant_end: Message denoting the end of EVI’s response.

Developer tools

Hume provides a suite of developer tools to integrate and customize EVI.

WebSocket API Reference

Connect with EVI via WebSocket, including message formats and response types.

REST API Reference

Manage EVI configurations and access your chat history.

SDKs

Use official SDKs to streamline integration in Python and web-based projects.

Sample code

Browse example projects demonstrating EVI integration in different frameworks.

API limits

WebSocket connections limit: By default, EVI supports up to 5 concurrent connections for testing and development.

EVI is designed to scale seamlessly, and we can support deployments with thousands of concurrent users. For production environments requiring higher capacity, you can request an increased limit by filling out our request form.
WebSocket duration limit: connections are subject to a default timeout after thirty (30) minutes, or after two (2) minutes of user inactivity.

Duration limits may be adjusted by specifying the max_duration and inactivity fields in your EVI configuration.
WebSocket message size limit: messages cannot exceed 16MB in size.
Request rate limit: HTTP requests (e.g. configs endpoints) are limited to one hundred (100) requests per second.