For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Start buildingGet support
DocumentationAPI ReferenceChangelogDiscord
  • Introduction
    • Welcome to Hume AI
    • Getting your API keys
    • Support
    • Pricing
  • Voice
    • Overview
    • Voice design
    • Voice cloning
    • Voice management
  • Text-to-Speech (TTS)
    • Overview
    • Voice
    • Acting instructions
    • Voice conversion
    • Continuation
    • Timestamps
    • FAQ
  • Speech-to-Speech (EVI)
    • Overview
    • FAQ
  • Expression Measurement
    • Overview
    • About the science
    • FAQ
  • Integrations
    • MCP
    • Vercel AI SDK
    • LiveKit
    • Pipecat
    • Vapi
    • Twilio
    • Agora
  • Resources
    • Terms of use
    • Use case guidelines
    • Billing
    • Errors
    • Privacy
    • Status
Start buildingGet support
LogoLogo
LogoLogo
On this page
  • EVI features
  • Version comparison
  • Basic capabilities
  • Empathic AI Features
  • Quickstart
  • Building with EVI
  • Authentication
  • Configuration
  • Connection
  • Developer tools
  • API limits
Speech-to-Speech (EVI)

Speech-to-Speech (EVI)

Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI.

Was this page helpful?
Edit this page
Previous

EVI Next.js Quickstart

A quickstart guide for implementing the Empathic Voice Interface (EVI) with Next.js.

Next
Built with

Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI. EVI measures users’ nuanced vocal modulations and responds to them using a speech-language model, which guides language and speech generation.

By processing the tune, rhythm, and timbre of speech, EVI unlocks a variety of new capabilities, like knowing when to speak and generating more empathic language with the right tone of voice.

These features enable smoother and more satisfying voice-based interactions between humans and AI, opening new possibilities for personal AI, customer service, accessibility, robotics, immersive gaming, VR experiences, and much more.

To try EVI in your browser, use the EVI Playground in the Hume platform.

EVI features

Version comparison

Feature EVI 3 EVI 4-mini
Languages supported English English, Japanese, Korean, Spanish, French, Portuguese, Italian, German, Russian, Hindi, Arabic
Quick responsesAvailableUnavailable
Supplemental LLMOptionalRequired

Basic capabilities

FeatureDescription
Transcription (ASR)

Fast and accurate ASR returns a full transcript of the conversation, with Hume’s expression measures tied to each sentence.

Text response (LLM)

Rapid language generation with our speech-language model, optionally supplemented with configurable partner APIs (Anthropic, OpenAI, Google, Fireworks, and more).

Voice response (TTS)

Streamed speech generation via our speech-language model.

Low latency response

Immediate response provided by the fastest models running together on one service.

Empathic AI Features

FeatureDescription
Responds at the right time

Uses your tone of voice for state-of-the-art end-of-turn detection — the true bottleneck to responding rapidly without interrupting you.

Understands users’ prosody

Provides streaming measurements of the tune, rhythm, and timbre of the user’s speech using Hume’s prosody model, integrated with our speech-language model.

Forms its own natural tone of voice

Guided by the users’ prosody and language, our model responds with an empathic, naturalistic tone of voice, matching the users’ nuanced “vibe” (calmness, interest, excitement, etc.). It responds to frustration with an apologetic tone, to sadness with sympathy, and more.

Responds to expression

Powered by our empathic large language model (speech-language model), EVI crafts responses that are not just intelligent but attuned to what the user is expressing with their voice.

Always interruptible

Stops rapidly whenever users interject, listens, and responds with the right context based on where it left off.

Multi-lingual

EVI 4-mini supports English, Japanese, Korean, Spanish, French, Portuguese, Italian, German, Russian, Hindi, Arabic.

Quickstart

Kickstart your integration with our quickstart guides for Next.js, TypeScript, and Python. Each guide walks you through integrating the EVI API, capturing user audio, and playing back EVI’s response so you can get up and running quickly.

React logo
Next.js Quickstart

Build web applications using our React client SDK in Next.js.

TypeScript logo
TypeScript Quickstart

Develop server-side or frontend applications using our TypeScript SDK.

Python logo
Python Quickstart

Create integrations in Python using our Python SDK.

Building with EVI

EVI chat sessions run over a real-time WebSocket connection, enabling fluid, interactive dialogue. Users speak naturally while EVI analyzes their vocal expression and responds with emotionally intelligent speech.

Authentication

REST endpoints support the API key authentication strategy. specify your API key in the X-HUME-API-KEY header of your request.

The EVI WebSocket endpoint supports both the API key and Token authentication strategies, specify your API key or Access token in the query parameters of your request.

Configuration

Before starting a session, you’ll need a voice and a configuration.

  • Design a voice, clone an existing one, or select one from Hume’s extensive Voice Library.
  • Build an EVI configuration to define system behavior, voice selection, and other settings.

Connection

The EVI Playground is the easiest way to test your configuration. It lets you speak directly with EVI using your selected voice and settings, without writing any code.

To begin a conversation, connect using the EVI WebSocket URL start streaming the user’s audio input, via audio_input messages. EVI responds in real time with a sequence of structured messages:

  • user_message: Message containing a transcript of the user’s message along with their vocal expression measures
  • assistant_message: Message containing EVI’s response content.
  • audio_output: EVI’s response audio corresponding with the assistant_message
  • assistant_end: Message denoting the end of EVI’s response.

Developer tools

Hume provides a suite of developer tools to integrate and customize EVI.

WebSocket API Reference

Connect with EVI via WebSocket, including message formats and response types.

REST API Reference

Manage EVI configurations and access your chat history.

SDKs

Use official SDKs to streamline integration in Python and web-based projects.

Sample code

Browse example projects demonstrating EVI integration in different frameworks.

API limits

The following limits apply to Hume’s Speech-to-Speech (EVI) API.

LimitValue
Concurrent sessionsDefined by your subscription tier
Maximum session duration30 minutes
Maximum message size (WebSocket)16 MB
Request rate limit (HTTP)100 requests/second

The EVI API supports thousands of concurrent sessions. To increase limits:

  1. Upgrade your account to Business or Enterprise.
  2. Submit the Sales & Partnerships form.