About the Science

What is it about speaking in person that allows us to understand each other so much more accurately than text alone? It isn’t what we say—it’s the way we say it. Science consistently demonstrates that expressions convey important information that is vital for social interaction and forms the building blocks of empathy.

That being said, expressions aren’t direct windows into the human mind. Measuring and interpreting expressive behavior is a complex and nuanced task that is the subject of ongoing scientific research.

The scientists at Hume AI have run some of the largest-ever psychology studies to better understand how humans express themselves. By investigating expressions around the world and what they mean to the people making them, we’ve mapped out the nuances of expression in the voice, language, and face in unprecedented detail. We’ve published this research in the world’s leading scientific journals and, for the first time, translated it into cutting-edge machine learning models.

These models, shaped by a new understanding of human expression, include:

Facial Expression
Speech Prosody
Vocal Bursts
Emotional Language

Modalities

Facial Expression

Facial expression is the most well-studied modality of expressive behavior, but the overwhelming focus has been on six discrete categories of facial movement or time-consuming manual annotations of facial movements (the scientifically useful, but outdated, Facial Action Coding System). Our research shows that these approaches capture less than 30% of what typical facial expressions convey.

Hume’s Facial Emotional Expression model generates 48 outputs encompassing the dimensions of emotional meaning people reliably attribute to facial expressions. As with every model, the labels for each dimension are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.

Hume’s FACS 2.0 model is a new generation automated facial action coding system (FACS). With 55 outputs encompassing 26 traditional actions units (AUs) and 29 other descriptive features (e.g., smile, scowl), FACS 2.0 is even more comprehensive than manual FACS annotations.

Our facial expression models are packaged with face detection and work on both images and videos.

In addition to our image-based facial expression models, we also offer an Anonymized Facemesh model for applications in which it is essential to keep personally identifiable data on-device (e.g., for compliance with local laws). Instead of face images, our facemesh model processes facial landmarks detected using Google’s MediaPipe library. It achieves about 80% accuracy relative to our image-based model.

To read more about the team’s research on facial expressions, check out our publications in American Psychologist (2018), Nature (2021), and iScience (2024).

Speech Prosody

Speech prosody is not about the words you say, but the way you say them. It is distinct from language (words) and from non-linguistic vocal utterances.

Our Speech Prosody model generates 48 outputs encompassing the 48 dimensions of emotional meaning that people reliably distinguish from variations in speech prosody. As with every model, the labels for each dimension are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.

Our Speech Prosody model is packaged with speech detection and works on both audio files and videos.

To read more about the team’s research on speech prosody, check out our publications in Nature Human Behaviour (2019) and Proceedings of the 31st ACM International Conference on Multimedia (2023).

Vocal Bursts

Non-linguistic vocal utterances, including sighs, laughs, oohs, ahhs, umms, and shrieks (to name but a few), are a particularly powerful and understudied modality of expressive behavior. Recent studies reveal that they reliably convey distinct emotional meanings that are extremely well-preserved across most cultures.

Non-linguistic vocal utterances have different acoustic characteristics than speech emotional intonation (prosody) and need to be modeled separately.

Our Vocal Burst Expression model generates 48 outputs encompassing the distinct dimensions of emotional meaning that people distinguish in vocal bursts. As with every model, the labels for each dimension are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.

Our Vocal Burst Description model provides a more descriptive and categorical view of nonverbal vocal expressions (“gasp,” “mhm,” etc.) intended for use cases such as audio captioning. It generates 67 descriptors, including 30 call types (“sigh,” “laugh,” “shriek,” etc.) and 37 common onomatopoeia transliterations of vocal bursts (“hmm,” “ha,” “mhm,” etc.).

Our vocal burst models are packaged with non-linguistic vocal utterance detection and works on both audio files and videos.

To read more about the team’s research on vocal bursts, check out our publications in American Psychologist (2019), Interspeech 2022, ICASSP 2023, and Nature Human Behaviour (2023).

Emotional Language

The words we say include explicit disclosures of emotion and implicit emotional connotations. These meanings are complex and high-dimensional.

From written or spoken words, our Emotional Language model generates 53 outputs encompassing different dimensions of emotion that people often perceive from language. As with every model, the labels for each dimension are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.

Our Emotional Language model is packaged with speech transcription and works on audio files, videos, and text.

Our Named Entity Recognition (NER) model can also identify topics or entities (people, places, organizations, etc.) mentioned in speech or text and the tone of language they are associated with, as identified by our emotional language model.

Published Research

You can access a comprehensive list of our published research papers along with PDFs for download here.