Speech-to-speech (EVI)
Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI.
We’re officially sunsetting EVI versions 1 and 2 on August 30, 2025. To keep things running smoothly, be sure to migrate to EVI 3 before then.
Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI. EVI measures users’ nuanced vocal modulations and responds to them using a speech-language model, which guides language and speech generation.
By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice.
These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.
Explore EVI 3, the latest iteration of the Empathic Voice Interface, in the demo below!
EVI features
Basic capabilities
Empathic AI Features
Quickstart
Kickstart your integration with our quickstart guides for Next.js, TypeScript, and Python. Each guide walks you through integrating the EVI API, capturing user audio, and playing back EVI’s response so you can get up and running quickly.
Build web applications using our React client SDK in Next.js.
Develop server-side or frontend applications using our TypeScript SDK.
Create integrations in Python using our Python SDK.
Building with EVI
EVI chat sessions run over a real-time WebSocket connection, enabling fluid, interactive dialogue. Users speak naturally while EVI analyzes their vocal expression and responds with emotionally intelligent speech.
Authentication
REST endpoints support the API key authentication strategy.
specify your API key in the X-HUME-API-KEY
header of your request.
The EVI WebSocket endpoint supports both the API key and Token authentication strategies, specify your API key or Access token in the query parameters of your request.
Configuration
Before starting a session, you’ll need a voice and a configuration.
- Design a voice, clone an existing one, or select one from Hume’s extensive Voice Library.
- Build an EVI configuration to define system behavior, voice selection, and other settings.
Connection
The EVI Playground is the easiest way to test your configuration. It lets you speak directly with EVI using your selected voice and settings, without writing any code.
To begin a conversation, connect using the EVI WebSocket URL start streaming the user’s audio input, via audio_input messages. EVI responds in real time with a sequence of structured messages:
- user_message: Message containing a transcript of the user’s message along with their vocal expression measures
- assistant_message: Message containing EVI’s response content.
- audio_output: EVI’s response audio
corresponding with the
assistant_message
- assistant_end: Message denoting the end of EVI’s response.
Developer tools
Hume provides a suite of developer tools to integrate and customize EVI.
Connect with EVI via WebSocket, including message formats and response types.
Manage EVI configurations and access your chat history.
Use official SDKs to streamline integration in Python and web-based projects.
Browse example projects demonstrating EVI integration in different frameworks.
API limits
The following API limits apply to the EVI (speech-to-speech) API. For a quick overview, see the summary table at the end of the section.
Concurrent connections limit
Concurrent (WebSocket) connections limit is determined by your subscription tier.
EVI is designed to scale and can support thousands of concurrent users. If you need higher capacity:
- Upgrade to the Business or Enterprise tier.
- Submit the Sales & Partnerships Form.
Connection duration limit
The max Chat duration, and is configurable within your EVI config.
- Minimum: 30 seconds
- Maximum: 1,800 seconds (30 minutes)
- Default: 1,800 seconds
Inactivity timeout
Ends the session after a period of no speech detected, and is configurable in your EVI config.
- Minimum: 30 seconds
- Maximum: 1,800 seconds (30 minutes)
- Default: 120 seconds (2 minutes)
Payload limit
Each WebSocket message must be no larger than 16 MB.
Rate limit
REST API traffic is limited to 100 requests per second.