Speech emotional intonation (prosody) reliably conveys at least 18 distinct dimensions of meaning (Brooks et al., 2022; Tzirakis et al., 2022).
Speech prosody is not about the words you say, but the way you say them. It is distinct from language (words) and from non-linguistic vocal utterances.
Our Speech Prosody Model generates 48 outputs encompassing the 18+ dimensions that people distinguish. These 48 outputs also encompass other, alternative conceptualizations for the sake of interpretation and alignment across our different models. As with every model, the labels for each dimension are proxies for how people tend to label the underlying patterns of behavior. They should not be treated as direct inferences of emotional experience.
Our Speech Prosody model is packaged with speech detection and works on both audio files and videos. Further details can be found in the API reference.
Updated 5 months ago