Welcome to Hume AI

Hume is a research lab and technology company with a mission to ensure that artificial intelligence is built to serve human goals and emotional well-being.

Hume develops two categories of models: speech-language models that interpret and generate expressive speech, and expression measurement models that analyze vocal, facial, and verbal expression.

These models are available through three APIs: the Empathic Voice Interface (EVI) for real-time voice interaction, Text-to-speech (TTS) for expressive speech synthesis, and Expression Measurement for analyzing expression in media and text.

Speech-to-speech (EVI)

Hume’s Empathic Voice Interface (EVI) is an advanced, real-time emotionally intelligent voice AI. EVI measures users’ nuanced vocal modulations and responds to them using a speech-language model, which guides language and speech generation. Trained on millions of human interactions, our speech-language model unites language modeling and text-to-speech with better EQ, prosody, end-of-turn detection, interruptibility, and alignment.

  • Interviewing & Coaching: Simulate lifelike interviews or leadership coaching sessions with dynamic tone adjustment.
  • Digital Companions: Build emotionally aware companions for seniors, kids, or mental wellness support.
  • Digital Assistants: Respond with empathy and modulate tone to reduce user frustration or improve engagement.

Text-to-speech (TTS)

Octave TTS is the first text-to-speech system built on LLM intelligence. Unlike conventional TTS that merely “reads” words, Octave is a “speech-language model” that understands what words mean in context, unlocking a new level of expressiveness and nuance.

  • Creative Tools: Narration for video, podcasting, and audiobooks.
  • Education/Coaching: Deliver lessons with engaging, emotionally varied voice.
  • Digital Avatars: Give realistic voice to AI-powered characters in apps, games, or virtual experiences.

Expression Measurement

Hume’s state-of-the-art expression measurement models for the voice, face, and language are built on 10+ years of research and advances in semantic space theory pioneered by Alan Cowen. Our expression measurement models are able to capture hundreds of dimensions of human expression in audio, video, and images.

  • Health & Wellness: Monitor patient tone and emotion during therapy or check-ins.
  • Call Center Analytics: Detect caller frustration or distress for triage and escalation.
  • UX/CX Research: Analyze user interviews and testing sessions for sentiment trends.

Voice

Voice defines how speech is delivered, shaping tone, pacing, accent, and personality. It plays a central role in how listeners perceive meaning and emotion.

All voices in Hume’s platform are powered by Octave, a speech-language model built on LLM intelligence. Octave enables expressive, context-aware speech generation from both text and natural language descriptions.

Voices can be used across both EVI and TTS to tailor how content is spoken.

SDKs

Jumpstart your development with SDKs built for Hume APIs. They handle authentication, requests, and workflows to make integration straightforward. With support for React, TypeScript, and Python, our SDKs provide the tools you need to build efficiently across different environments.

Example Code

Explore step-by-step guides and sample projects for integrating Hume APIs. Our GitHub repositories include ready-to-use code and open-source SDKs to support your development process in various environments.

Get Support

Need help? Our team is here to support you with any questions or challenges.