Speech-to-speech (EVI)
Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI.
We’re officially sunsetting EVI versions 1 and 2 on August 30, 2025. To keep things running smoothly, be sure to migrate to EVI 3 before then.
Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI. EVI measures users’ nuanced vocal modulations and responds to them using a speech-language model, which guides language and speech generation.
By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice.
These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.
Explore EVI 3, the latest iteration of the Empathic Voice Interface, in the demo below!
EVI features
Basic capabilities
Empathic AI Features
Quickstart
Kickstart your integration with our quickstart guides for Next.js, TypeScript, and Python. Each guide walks you through integrating the EVI API, capturing user audio, and playing back EVI’s response so you can get up and running quickly.
Build web applications using our React client SDK in Next.js.
Develop server-side or frontend applications using our TypeScript SDK.
Create integrations in Python using our Python SDK.
Building with EVI
EVI chat sessions run over a real-time WebSocket connection, enabling fluid, interactive dialogue. Users speak naturally while EVI analyzes their vocal expression and responds with emotionally intelligent speech.
Authentication
REST endpoints support the API key authentication strategy.
specify your API key in the X-HUME-API-KEY
header of your request.
The EVI WebSocket endpoint supports both the API key and Token authentication strategies, specify your API key or Access token in the query parameters of your request.
Configuration
Before starting a session, you’ll need a voice and a configuration.
- Design a voice, clone an existing one, or select one from Hume’s extensive Voice Library.
- Build an EVI configuration to define system behavior, voice selection, and other settings.
Connection
The EVI Playground is the easiest way to test your configuration. It lets you speak directly with EVI using your selected voice and settings, without writing any code.
To begin a conversation, connect using the EVI WebSocket URL start streaming the user’s audio input, via audio_input messages. EVI responds in real time with a sequence of structured messages:
- user_message: Message containing a transcript of the user’s message along with their vocal expression measures
- assistant_message: Message containing EVI’s response content.
- audio_output: EVI’s response audio
corresponding with the
assistant_message
- assistant_end: Message denoting the end of EVI’s response.
Developer tools
Hume provides a suite of developer tools to integrate and customize EVI.
Connect with EVI via WebSocket, including message formats and response types.
Manage EVI configurations and access your chat history.
Use official SDKs to streamline integration in Python and web-based projects.
Browse example projects demonstrating EVI integration in different frameworks.
API limits
-
WebSocket connections limit: By default, EVI supports up to 5 concurrent connections for testing and development.
EVI is designed to scale seamlessly, and we can support deployments with thousands of concurrent users. For production environments requiring higher capacity, you can request an increased limit by filling out our request form.
-
WebSocket duration limit: connections are subject to a default timeout after thirty (30) minutes, or after two (2) minutes of user inactivity.
Duration limits may be adjusted by specifying the max_duration and inactivity fields in your EVI configuration.
-
WebSocket message size limit: messages cannot exceed 16MB in size.
-
Request rate limit: HTTP requests (e.g. configs endpoints) are limited to one hundred (100) requests per second.