Expression Measurement
Hume’s state of the art expression measurement models for the voice, face, and language.
Intro
Hume’s state of the art expression measurement models for the voice, face, and language are built on 10+ years of research and advances in computational approaches to emotion science (semantic space theory) pioneered by our team. Our expression measurement models are able to capture hundreds of dimensions of human expression in audio, video, and images.
Measurements
- Facial Expression, including subtle facial movements often seen as expressing love or admiration, awe, disappointment, or cringes of empathic pain, along 48 distinct dimensions of emotional meaning. Our Facial Expression model will also optionally output FACS 2.0 measurements, our model of facial movements including traditional Action Units (AUs such as “Inner brow raise”, “Nose crinkle”) and facial descriptions (“Smile”, “Wink”, “Hand over mouth”, “Hand over eyes”)
- Speech Prosody, or the non-linguistic tone, rhythm, and timbre of speech, spanning 48 distinct dimensions of emotional meaning.
- Vocal Burst, including laughs, sighs, huhs, hmms, cries and shrieks (to name a few), along 48 distinct dimensions of emotional meaning.
- Emotional Language, or the emotional tone of transcribed text, along 53 dimensions.
Expressions are complex and multifaceted; they should not be treated as direct inferences of emotional experience. To learn more about the science behind expression measurement, visit the About the science page.
To learn more about how to use our models visit our API reference.
Model training
The models were trained on human intensity ratings of large-scale, experimentally controlled emotional expression data gathered using the methods described in these papers: Deep learning reveals what vocal bursts express in different cultures and Deep learning reveals what facial expressions mean to people in different cultures.
While our models measure nuanced expressions that people most typically describe with emotion labels, it’s important to remember that they are not a direct readout of what someone is experiencing. Sometimes, the outputs from facial and vocal models will show different emotional meanings, which is completely normal. Generally speaking, emotional experience is subjective and its expression is multimodal and context-dependent.
Try out the models
Learn how you can use the Expression Measurement API through both REST and WebSockets.
Use REST endpoints to process batches of videos, images, text, or audio files.
Use WebSocket endpoints when you need real-time predictions, such as processing a webcam or microphone stream.
REST and WebSocket endpoints provide access to all of the same Hume models, but with different speed and scale tradeoffs. All models share a common response format, which associates a score with each detected expression. Scores indicate the degree to which a human rater would assign an expression to a given sample of video, text or audio.
Specific expressions by modality
Our models measure 53 expressions identified through the subtleties of emotional language and 48 expressions discerned from facial cues, vocal bursts, and speech prosody.
Train your own custom model
Our Custom Models API builds on our expression measurement models and state-of-the-art eLLMs to bring custom insights to your application. Developed using transfer learning from our expression measurement models and eLLMs, our Custom Models API can predict almost any outcome more accurately than language alone, whether it’s toxicity, depressed mood, driver drowsiness, or any other metric important to your users.