Facial expression

Measure facial expressions, action units, and facial descriptions from images and video.

The facial expression model measures 48 dimensions of emotional expression from facial movements in images and video. It detects subtle expressions often associated with emotions like admiration, awe, empathic pain, and more. Recommended input filetypes: .png, .jpeg, .mp4.

You can optionally enable two additional outputs alongside emotion scores: FACS 2.0 action units and facial descriptions. The facemesh model can also be run alongside facial expression to capture detailed face geometry.

Job configuration

The face model is configurable in both Batch API and Streaming API jobs.

ParameterTypeDefaultDescription
facsobjectInclude this field to enable FACS 2.0 predictions alongside emotion scores.
descriptionsobjectInclude this field to enable facial description predictions (e.g., “Smile”, “Frown”, “Wink”).
identify_facesbooleanfalseWhen enabled, assigns a consistent identifier to each detected face across frames, useful for tracking individuals in video.
fps_prednumber3Number of frames per second to process in video input. Lower values improve performance; higher values increase temporal resolution.
prob_thresholdnumber0.99Minimum confidence threshold for face detection. Faces detected below this threshold are excluded from results.
min_face_sizeinteger60Minimum bounding box size in pixels. Faces smaller than this are ignored.
save_facesbooleanfalseBatch API only. When enabled, extracts detected faces into the job artifacts ZIP file.

Example job configuration

$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "face": {
> "fps_pred": 3,
> "identify_faces": true,
> "facs": {},
> "descriptions": {}
> }
> },
> "urls": ["https://example.com/video.mp4"]
> }'

Output

Each prediction includes:

  • Bounding box: the x, y, w, h coordinates of the detected face
  • Detection confidence: the probability that the detection is a face
  • Face ID: a consistent identifier when identify_faces is enabled
  • Emotion scores: scores for each of the 48 expressions
  • FACS scores: action unit scores when facs is enabled
  • Description scores: facial description scores when descriptions is enabled
1{
2 "grouped_predictions": [
3 {
4 "id": "unknown",
5 "predictions": [
6 {
7 "frame": 0,
8 "time": 0.0,
9 "prob": 0.9994,
10 "box": {
11 "x": 187.88,
12 "y": 197.70,
13 "w": 401.67,
14 "h": 561.43
15 },
16 "emotions": [
17 { "name": "Admiration", "score": 0.107 },
18 { "name": "Joy", "score": 0.482 },
19 ...
20 ],
21 "facs": [
22 { "name": "AU12", "score": 0.82 },
23 ...
24 ],
25 "descriptions": [
26 { "name": "Smile", "score": 0.91 },
27 ...
28 ]
29 }
30 ]
31 }
32 ]
33}

When identify_faces is enabled, each group’s id is replaced with a unique face identifier that persists across frames, allowing you to track the same individual throughout a video.

Expressions

The facial expression model measures the following 48 expressions. Each score indicates the degree to which a human rater would identify that expression in the given sample.

AdmirationConfusionEmpathic PainPride
AdorationContemptEntrancementRealization
Aesthetic AppreciationContentmentEnvyRelief
AmusementCravingExcitementRomance
AngerDesireFearSadness
AnxietyDeterminationGuiltSatisfaction
AweDisappointmentHorrorShame
AwkwardnessDisgustInterestSurprise (negative)
BoredomDistressJoySurprise (positive)
CalmnessDoubtLoveSympathy
ConcentrationEcstasyNostalgiaTiredness
ContemplationEmbarrassmentPainTriumph

FACS 2.0

The Facial Action Coding System (FACS) measures individual facial muscle movements called action units. When facs is enabled, predictions include intensity scores for each of the following action units.

Action unitDescription
AU1Inner Brow Raise
AU2Outer Brow Raise
AU4Brow Lowerer
AU5Upper Lid Raise
AU6Cheek Raise
AU7Lids Tight
AU9Nose Wrinkle
AU10Upper Lip Raiser
AU11Nasolabial Furrow Deepener
AU12Lip Corner Puller
AU14Dimpler
AU15Lip Corner Depressor
AU16Lower Lip Depress
AU17Chin Raiser
AU18Lip Pucker
AU19Tongue Show
AU20Lip Stretch
AU22Lip Funneler
AU23Lip Tightener
AU24Lip Presser
AU25Lips Part
AU26Jaw Drop
AU27Mouth Stretch
AU28Lips Suck
AU32Bite
AU34Puff
AU37Lip Wipe
AU38Nostril Dilate
AU43Eye Closure
AU53Head Up
AU54Head Down
Hand over Mouth
Hand over Eyes
Hand over Forehead
Hand over Face
Hand touching Face / Head

Facial descriptions

When descriptions is enabled, predictions include scores for high-level facial descriptions. These provide an intuitive, human-readable summary of the face’s appearance.

BeamingFrownLaughSquint
Biting lipGaspLicking lipSulking
CheeringGlarePoutTongue out
CringeGlaringScowlWide-eyed
CryGrimaceSmileWince
Eyes closedGrinSmirkWrinkled nose
Face in handsJaw dropSnarl

Facemesh

The facemesh model detects 468 facial landmark points, providing detailed face geometry data. This is useful for applications that need precise face shape analysis beyond emotional expression.

Facemesh has no configurable parameters. Enable it alongside the face model by passing an empty object:

$curl -X POST "https://api.hume.ai/v0/batch/jobs" \
> -H "X-Hume-Api-Key: <YOUR_API_KEY>" \
> -H "Content-Type: application/json" \
> -d '{
> "models": {
> "face": {
> "identify_faces": true
> },
> "facemesh": {}
> },
> "urls": ["https://example.com/video.mp4"]
> }'

Each facemesh prediction includes an array of 478 3D facial landmark coordinates for each detected face.