Emotional language

Measure emotional expression, sentiment, and toxicity from the meaning and tone of text.

The emotional language model measures 53 dimensions of emotional expression from the meaning and tone of text. It supports 5 additional expressions beyond the other models: Annoyance, Disapproval, Enthusiasm, Gratitude, and Sarcasm. Recommended input filetypes: .txt, .mp3, .wav, .mp4.

You can optionally enable sentiment analysis and toxicity detection alongside emotion scores. The NER model can also be run alongside emotional language to identify named entities in text.

Job configuration

ParameterTypeDefaultDescription
granularitystringwordLevel at which predictions are generated. See Granularity for available values.
identify_speakersbooleanfalseWhen enabled, identifies and labels different speakers in transcribed audio. Batch API only.
sentimentobjectInclude this field to enable sentiment analysis. Returns a distribution over a 9-point scale.
toxicityobjectInclude this field to enable toxicity detection. Returns scores for 6 categories.

Example job configuration

$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "language": {
> "granularity": "sentence",
> "sentiment": {},
> "toxicity": {}
> }
> },
> "urls": ["https://example.com/audio.mp3"]
> }'

Output

Each prediction includes:

  • Text: the analyzed text segment
  • Position: the begin and end character indices
  • Emotion scores: scores for each of the 53 expressions
  • Sentiment: distribution over the 9-point scale (when enabled)
  • Toxicity: scores for each toxicity category (when enabled)
1{
2 "grouped_predictions": [
3 {
4 "id": "unknown",
5 "predictions": [
6 {
7 "text": "I'm so happy to see you",
8 "position": {
9 "begin": 0,
10 "end": 23
11 },
12 "emotions": [
13 { "name": "Admiration", "score": 0.107 },
14 { "name": "Joy", "score": 0.482 },
15 ...
16 ],
17 "sentiment": [
18 { "name": "1", "score": 0.01 },
19 ...
20 { "name": "9", "score": 0.04 }
21 ],
22 "toxicity": [
23 { "name": "toxic", "score": 0.001 },
24 ...
25 ]
26 }
27 ]
28 }
29 ]
30}

Granularity

The granularity parameter controls how text is segmented before predictions are generated.

ValueAPIDescription
wordBothOne prediction per word. Provides the most detailed resolution. This is the default.
sentenceBothOne prediction per sentence.
utteranceBothOne prediction per utterance, a continuous segment of text separated by pauses or punctuation.
conversational_turnBatchOne prediction per speaker turn. Requires identify_speakers to be enabled.
passageStreamingOne prediction for the entire text of the streaming payload.

Sentiment

When sentiment is enabled, each prediction includes a probability distribution over a 9-point scale, where 1 represents the most negative sentiment and 9 represents the most positive.

1"sentiment": [
2 { "name": "1", "score": 0.01 },
3 { "name": "2", "score": 0.02 },
4 { "name": "3", "score": 0.05 },
5 { "name": "4", "score": 0.10 },
6 { "name": "5", "score": 0.30 },
7 { "name": "6", "score": 0.25 },
8 { "name": "7", "score": 0.15 },
9 { "name": "8", "score": 0.08 },
10 { "name": "9", "score": 0.04 }
11]

Toxicity

When toxicity is enabled, each prediction includes scores for the following categories:

CategoryDescription
toxicGeneral toxicity
severe_toxicSevere or extreme toxicity
obsceneObscene or vulgar language
threatThreatening language
insultInsulting language
identity_hateHate speech targeting identity groups

Transcription

When processing audio or video with the language model, Hume transcribes speech to text before analysis. Transcription settings are configured separately from models.

ParameterTypeDefaultDescription
languagestringnullBCP-47 language tag (e.g., en, fr, ja). When null, the language is auto-detected.
identify_speakersbooleanfalseEnable speaker diarization in the transcript.
confidence_thresholdnumber0.5Minimum confidence for including transcribed text. Range: 0.0 to 1.0.
$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "language": {
> "granularity": "sentence"
> }
> },
> "transcription": {
> "language": "en",
> "confidence_threshold": 0.5
> },
> "urls": ["https://example.com/audio.mp3"]
> }'

Named Entity Recognition (NER)

The NER model identifies people, places, organizations, and other entities in text. It can be run alongside the emotional language model.

NER accepts one job configuration parameter:

ParameterTypeDefaultDescription
identify_speakersbooleanfalseWhen enabled, identifies and labels different speakers in transcribed audio.
$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "language": {
> "granularity": "sentence"
> },
> "ner": {
> "identify_speakers": true
> }
> },
> "urls": ["https://example.com/audio.mp3"]
> }'

Expressions

The emotional language model measures the following 53 expressions. The 5 expressions marked with * are unique to the language model and not available in the face, prosody, or vocal burst models.

AdmirationContemptEnthusiasm*Pain
AdorationContentmentEntrancementPride
Aesthetic AppreciationCravingEnvyRealization
AmusementDesireExcitementRelief
AngerDeterminationFearRomance
Annoyance*DisappointmentGratitude*Sadness
AnxietyDisapproval*GuiltSarcasm*
AweDisgustHorrorSatisfaction
AwkwardnessDistressInterestShame
BoredomDoubtJoySurprise (negative)
CalmnessEcstasyLoveSurprise (positive)
ConcentrationEmbarrassmentNostalgiaSympathy
ConfusionEmpathic PainTiredness
ContemplationTriumph