For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Start buildingGet support
DocumentationAPI ReferenceChangelogDiscord
  • Introduction
    • Welcome to Hume AI
    • Getting your API keys
    • Support
    • Pricing
  • Voice
    • Overview
    • Voice design
    • Voice cloning
    • Voice management
  • Text-to-Speech (TTS)
    • Overview
    • Voice
    • Acting instructions
    • Voice conversion
    • Continuation
    • Timestamps
    • FAQ
  • Speech-to-Speech (EVI)
    • Overview
      • Build a configuration
      • Session settings
      • EVI version
      • Voice
      • System prompt
      • Language model
      • Tools
      • Event messages
      • Turn detection
      • Interruption
      • Timeouts
      • Webhooks
    • FAQ
  • Expression Measurement
    • Overview
    • About the science
    • FAQ
  • Integrations
    • MCP
    • Vercel AI SDK
    • LiveKit
    • Pipecat
    • Vapi
    • Twilio
    • Agora
  • Resources
    • Terms of use
    • Use case guidelines
    • Billing
    • Errors
    • Privacy
    • Status
Start buildingGet support
LogoLogo
LogoLogo
On this page
  • Supported language models
  • Hume’s speech-language model
  • External LLMs
  • Latency
  • Custom language model
  • Pricing
Speech-to-Speech (EVI)Configuration

Language Model

Choose which language model to use for EVI's response generation.
Was this page helpful?
Edit this page
Previous

Tools

Equip EVI with Tools to enable function calling during Chats.
Next
Built with

EVI supports specifying a language model for response generation during chat sessions. The language model you choose plays an important role in the sort of responses that are generated by EVI.

While EVI supports several native speech-language models which are optimized for emotional intelligence and conversational use cases, the use of supplemental language models is also supported.

API Reference

See our API reference for how to specify a language model in your EVI configuration.

Supported language models

Hume’s speech-language model

Hume offers native speech-language models. These models are multi-modal, capable of processing both language and audio together. This allows EVI to understand and generate both language and voice in the same latent space, resulting in more coherent and contextually aware responses.

Hume speech-language model support by EVI version:

ModelEVI 1EVI 2EVI 3
hume-evi-2
hume-evi-3
hume-evi-3-websearch

External LLMs

Developers may also choose from leading external language models such as Claude, GPT, Gemini, and many others. For a complete list of external LLMs Hume natively supports, see our API Reference.

Latency

The landscape of large language models (LLMs) and their providers is constantly evolving, affecting which supplemental LLM is fastest with EVI.

The key factor influencing perceived latency using EVI is the time to first token (TTFT), with lower TTFT being better. The model and provider combination with the smallest TTFT will be the fastest.

Notably, there’s a tradeoff between speed and quality. Larger, slower models are easier to prompt. We recommend testing various supplemental LLM options when implementing EVI.

Artificial Analysis offers a useful dashboard for comparing model and provider latencies.

Custom language model

For specific application requirements, the API supports integrating custom language models, offering flexibility to tailor conversational behavior to your domain.

Custom Language Model Guide

See our guide for details on how to specify and use your custom language model for response generation.

Pricing

Using an external language model incurs additional cost. You can view estimated pricing by model on the Billing page when you are logged in to the Hume Platform. The cost of your external language model usage will be added to your monthly bill.