The Expression Measurement API is being sunset.
The speech prosody model measures 48 dimensions of emotional expression from the non-linguistic qualities of speech, specifically
how something is said rather than what is said. It analyzes pitch, pace, intensity, and other vocal characteristics
to capture emotional nuances in audio and video. Recommended input filetypes: .wav, .mp3, .mp4.
The following parameters are available when configuring the prosody model for Batch API jobs.
The prosody model is not configurable in the Streaming API. Enable it by passing an empty object:
The example job configuration above applies to the Batch API. In the Streaming API, the prosody model uses default settings and does not accept job configuration parameters.
Each prediction includes:
begin and end timestamps in secondsThe granularity parameter controls how speech is segmented before predictions are generated. This parameter is only
available in the Batch API.
The window parameter provides an alternative to granularity-based segmentation. Instead of splitting audio at natural
speech boundaries, it analyzes the audio in fixed-length, overlapping windows.
length: Duration of each window in seconds (minimum 0.5).step: How far to advance between windows in seconds (minimum 0.5). A step smaller than the length creates
overlapping windows.The speech prosody model measures the following 48 expressions. These are the same expressions measured by the facial expression and vocal burst models.