Speech prosody

Measure emotional expression from the tone, rhythm, and timbre of speech.

The speech prosody model measures 48 dimensions of emotional expression from the non-linguistic qualities of speech, specifically how something is said rather than what is said. It analyzes pitch, pace, intensity, and other vocal characteristics to capture emotional nuances in audio and video. Recommended input filetypes: .wav, .mp3, .mp4.

Job configuration

Batch API

The following parameters are available when configuring the prosody model for Batch API jobs.

ParameterTypeDefaultDescription
granularitystringutteranceLevel at which predictions are generated. One of word, sentence, utterance, or conversational_turn.
identify_speakersbooleanfalseWhen enabled, identifies and labels different speakers in the audio (speaker diarization).
windowobjectSliding window job configuration with length (seconds, min 0.5) and step (seconds, min 0.5). Useful for analyzing long audio at regular intervals instead of natural speech boundaries.

Streaming API

The prosody model is not configurable in the Streaming API. Enable it by passing an empty object:

1from hume import AsyncHumeClient
2from hume.expression_measurement.stream.stream.types import Config
3
4client = AsyncHumeClient(api_key="<YOUR_API_KEY>")
5async with client.expression_measurement.stream.connect(
6 options={"config": Config(prosody={})}
7) as socket:
8 result = await socket.send_file("audio.mp3")

Example job configuration

$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "prosody": {
> "granularity": "sentence",
> "identify_speakers": true
> }
> },
> "urls": ["https://example.com/audio.mp3"]
> }'

The example job configuration above applies to the Batch API. In the Streaming API, the prosody model uses default settings and does not accept job configuration parameters.

Output

Each prediction includes:

  • Time interval: the begin and end timestamps in seconds
  • Emotion scores: scores for each of the 48 expressions
1{
2 "grouped_predictions": [
3 {
4 "id": "unknown",
5 "predictions": [
6 {
7 "text": "I'm so happy to see you",
8 "time": {
9 "begin": 0.32,
10 "end": 1.84
11 },
12 "confidence": 0.95,
13 "speaker_confidence": null,
14 "emotions": [
15 { "name": "Admiration", "score": 0.107 },
16 { "name": "Joy", "score": 0.482 },
17 ...
18 ]
19 }
20 ]
21 }
22 ]
23}

Granularity

The granularity parameter controls how speech is segmented before predictions are generated. This parameter is only available in the Batch API.

ValueDescription
wordOne prediction per transcribed word. Provides the most detailed temporal resolution.
sentenceOne prediction per sentence, as determined by natural speech pauses and punctuation.
utteranceOne prediction per utterance, a continuous segment of speech separated by pauses. This is the default.
conversational_turnOne prediction per speaker turn. Requires identify_speakers to be enabled.

Sliding window

The window parameter provides an alternative to granularity-based segmentation. Instead of splitting audio at natural speech boundaries, it analyzes the audio in fixed-length, overlapping windows.

$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "prosody": {
> "window": {
> "length": 4.0,
> "step": 1.0
> }
> }
> },
> "urls": ["https://example.com/audio.mp3"]
> }'
  • length: Duration of each window in seconds (minimum 0.5).
  • step: How far to advance between windows in seconds (minimum 0.5). A step smaller than the length creates overlapping windows.

Expressions

The speech prosody model measures the following 48 expressions. These are the same expressions measured by the facial expression and vocal burst models.

AdmirationConfusionEmpathic PainPride
AdorationContemptEntrancementRealization
Aesthetic AppreciationContentmentEnvyRelief
AmusementCravingExcitementRomance
AngerDesireFearSadness
AnxietyDeterminationGuiltSatisfaction
AweDisappointmentHorrorShame
AwkwardnessDisgustInterestSurprise (negative)
BoredomDistressJoySurprise (positive)
CalmnessDoubtLoveSympathy
ConcentrationEcstasyNostalgiaTiredness
ContemplationEmbarrassmentPainTriumph