Empathic Voice Interface 2 (EVI 2)

Introducing EVI 2, our new voice-language foundation model, enabling human-like conversations with enhanced naturalness, emotional responsiveness, adaptability, and rich customization options for the voice and personality.

The Empathic Voice Interface 2 (EVI 2) introduces a new architecture that seamlessly integrates voice and language processing. This multimodal approach allows EVI 2 to understand and generate both language and voice, dramatically enhancing key features over EVI 1 while also enabling new capabilities.

EVI 2 can converse rapidly and fluently with users, understand a user’s tone of voice, generate any tone of voice, and can even handle niche requests like rapping, changing its style, or speeding up its speech. The model specifically excels at emulating a wide range of personalities, including their accents and speaking styles. It is exceptional at maintaining personalities that are fun and interesting to interact with. Ultimately, EVI 2 is capable of emulating the ideal personality for every application and user.

In addition, EVI 2 allows developers to create custom voices by using a new voice modulation method. Developers can adjust EVI 2’s base voices along a number of continuous scales, including gender, nasality, and pitch. This first-of-its-kind feature enables creating voices that are unique to an application or even a single user. Further, this feature does not rely on voice cloning, which currently invokes more risks than any other capability of this technology.

The EVI 2 API is currently in beta. We are still making ongoing improvements to the model. In the coming weeks and months, EVI 2 will sound better, speak more languages, follow more complex instructions, and use a wider range of tools.

Key improvements

  1. Improved voice quality: EVI 2 uses an advanced voice generation model connected to our eLLM, which can process and generate both text and audio. This results in more natural-sounding speech with better word emphasis, higher expressiveness, and more consistent vocal output.

  2. Faster responses: The integrated architecture of EVI 2 reduces end-to-end latency by 40% vs EVI 1, now averaging around 500ms. This significant speed improvement enables more responsive and human-like conversations.

  3. Enhanced emotional intelligence: By processing voice and language in the same model, EVI 2 can better understand the emotional context of user inputs and generate more empathic responses, both in terms of content and vocal tone.

  4. Custom voices and personality: EVI 2 offers new control over the AI’s voice characteristics. Developers can adjust various parameters to tailor EVI 2’s voice to their specific application needs. EVI 2 also supports in-conversation voice prompting, allowing users to dynamically modify EVI’s speaking style (e.g., “speak faster”, “sound excited”) during interactions.

  5. Cost-effectiveness: Despite its advanced capabilities, EVI 2 is 30% more cost-effective than its predecessor, with pricing reduced from $0.102 to $0.0714 per minute.

Beyond these improvements, EVI 2 also exhibits promising emerging capabilities including speech output in multiple languages. We will make these improvements available to developers as we scale up and improve the model.

We provide the same suite of tools to integrate and customize EVI 2 for your application as we do for EVI 1, and existing EVI developers can easily switch to the new system.

Building with EVI 2

Developers can start testing EVI 2 by simply creating an EVI config on the Hume platform. Just select EVI 2 as the version when creating your config.

To use EVI 2, simply create a configuration using the /v0/evi/configs endpoint and specify "evi_version": "2". Then, use this config in a conversation with EVI using the /v0/evi/chat endpoint. Most aspects of using EVI, including authentication strategies, remain the same as described in the EVI documentation.

  1. In your configuration JSON, set the evi_version parameter to "2". Here’s an example of an EVI 2 config:
1{
2 "evi_version": "2",
3 "name": "EVI 2 config",
4 "voice": {
5 "provider": "HUME_AI",
6 "name": "DACHER"
7 }
8}
  1. Using a config like the above, make a POST request to the /v0/evi/configs endpoint to save the config.
  2. Specify any other custom settings you need.

EVI 2 timeline

EVI 2 is available now, with full feature parity with EVI 1, including support for supplemental LLMs, custom language models, tool use, built-in tools like web search, and all configuration options.

From September to December 2024, the Hume team will focus on improving the reliability and quality of EVI 2. The team will ensure that all the features of the EVI 1 API work consistently in EVI 2.

In late December 2024, the EVI 1 API will be sunsetted and deprecated. Developers will need to migrate from EVI 1 to EVI 2 for ongoing support and new features.

Clear migration guidelines will be provided ahead of time, and our team will ensure only minor changes will be required to make applications work with EVI 2.

Feature comparison: EVI 1 vs EVI 2

This table provides a comprehensive comparison of features between EVI 1 and EVI 2, highlighting the new capabilities introduced in the latest version.

FeatureEVI 1EVI 2
Voice qualitySimilar to best TTS solutionsSignificantly improved naturalness, clarity, and expressiveness
Response latency~900ms-2000ms~500-800ms (about 2x faster)
Emotional intelligenceEmpathic responses informed by expression measuresEnd-to-end understanding of voice augmented with emotional intelligence training
Base voices3 core voice options (Kora, Dacher, Ito)4 new high-quality base voice options with expressive personalities (7 total)
Voice customizabilitySupported - can select base voices and adjust voice parametersSupported - extensive customization with parameter adjustments (e.g. pitch, huskiness, nasality)
In-conversation voice promptingNot supportedSupported (e.g., “speak faster”, “sound more excited”, change accents)
Multimodal processingTranscription augmented with high-dimensional voice measuresFully integrated voice and language processing within a single model, along with transcripts and expression measures
Supplemental LLMsSupportedSupported
Tool use and web searchSupportedSupported
Custom language model (CLM)SupportedSupported
Configuration optionsExtensive supportExtensive support (same options as EVI 1)
Typescript SDK supportSupportedSupported
Python SDK supportSupportedSupported
Multilingual supportEnglish onlyExpanded support for multiple languages planned for Q4 2024
Cost$0.102 per minute$0.0714 per minute (30% reduction)

Frequently Asked Questions