Expression Measurement API

Expression Measurement API

Hume's state of the art expression measurement models for the voice, face, and language.

Intro

Hume’s state of the art expression measurement models for the voice, face, and language are built on 10+ years of research and advances in computational approaches to emotion science (semantic space theory) pioneered by our team. Our expression measurement models are able to capture hundreds of dimensions of human expression in audio, video, and images.

Measurements

  • Facial Expression, including subtle facial movements often seen as expressing love or admiration, awe, disappointment, or cringes of empathic pain, along 48 distinct dimensions of emotional meaning. Our Facial Expression model will also optionally output FACS 2.0 measurements, our model of facial movements including traditional Action Units (AUs such as “Inner brow raise”, “Nose crinkle”) and facial descriptions (“Smile”, “Wink”, “Hand over mouth”, “Hand over eyes”)
  • Speech Prosody, or the non-linguistic tone, rhythm, and timbre of speech, spanning 48 distinct dimensions of emotional meaning.
  • Vocal Burst, including laughs, sighs, huhs, hmms, cries and shrieks (to name a few), along 48 distinct dimensions of emotional meaning.
  • Emotional Language, or the emotional tone of transcribed text, along 53 dimensions.

These behaviors are complex and multifaceted.

To learn more about how to use our models visit our API reference.

Model training

The models were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data gathered using the methods described in these papers: Deep learning reveals what vocal bursts express in different cultures and Deep learning reveals what facial expressions mean to people in different cultures.

While our models measure nuanced expressions that people most typically describe with emotion labels, it’s important to remember that they are not a direct readout of what someone is experiencing. Sometimes, the outputs from facial and vocal models will show different emotional meanings, which is completely normal. Generally speaking, emotional experience is subjective and its expression is multimodal and context-dependent.

Try out the models

Learn how you can use the Expression Measurement API through both REST and WebSockets.

REST and WebSocket endpoints provide access to all of the same Hume models, but with different speed and scale tradeoffs. All models share a common response format, which associates a score with each detected expression. Scores indicate the degree to which a human rater would assign an expression to a given sample of video, text or audio.

Specific expressions by modality

Our models measure 53 expressions identified through the subtleties of emotional language and 48 expressions discerned from facial cues, vocal bursts, and speech prosody.

ExpressionLanguageFace/Burst/Prosody
Admiration
Adoration
Aesthetic Appreciation
Amusement
Anger
Annoyance
Anxiety
Awe
Awkwardness
Boredom
Calmness
Concentration
Confusion
Contemplation
Contempt
Contentment
Craving
Desire
Determination
Disappointment
Disapproval
Disgust
Distress
Doubt
Ecstasy
Embarrassment
Empathic Pain
Enthusiasm
Entrancement
Envy
Excitement
Fear
Gratitude
Guilt
Horror
Interest
Joy
Love
Nostalgia
Pain
Pride
Realization
Relief
Romance
Sadness
Sarcasm
Satisfaction
Shame
Surprise (negative)
Surprise (positive)
Sympathy
Tiredness
Triumph