How do I interpret my results?

Understanding outputs from Hume's face, speech prosody, vocal burst, and language models.

Our models capture the widest-ever range of facial, speech, vocal, and language modulations with distinct emotional meanings. We label each of their outputs with emotion terms like “amusement” and “doubt,” not because they always correspond to those emotional experiences (they must not, given that they often differ from one modality to another), but because scientific studies show that these kinds of labels are the most precise language we have for describing expressions.

Our models generate JSON or CSV output files with values typically ranging from 0 to 1 for each output in different segments of the input file. Higher values indicate greater intensity of facial movements or vocal modulations that are most strongly associated with the emotion label corresponding to the output.

A given expression will contain a blend of various emotions, and our models identify features that are associated with each emotional dimension. The score for each dimension is proportional to the likelihood that a human would perceive that emotion in the expression.

Specifically, the scores reflect the likelihood that an average human perceiver would use that emotion dimension to describe a given expression. The models were trained on human intensity ratings gathered using the methods described in this paper: Deep learning reveals what vocal bursts express in different cultures.

While our models measure nuanced expressions that people most typically describe with emotion labels, it's important to remember that they are not a direct readout of what someone is experiencing. Emotional experience is subjective and its expression is multimodal and context-dependent. Moreover, at any given time, our facial expression outputs might be quite different than our vocal expression outputs. Therefore, it's important to follow best practices when interpreting outputs.