Facial expression
The facial expression model measures 48 dimensions of emotional expression from facial movements in images and video.
It detects subtle expressions often associated with emotions like admiration, awe, empathic pain, and more.
Recommended input filetypes: .png, .jpeg, .mp4.
You can optionally enable two additional outputs alongside emotion scores: FACS 2.0 action units and facial descriptions. The facemesh model can also be run alongside facial expression to capture detailed face geometry.
Job configuration
The face model is configurable in both Batch API and Streaming API jobs.
Example job configuration
Output
Each prediction includes:
- Bounding box: the
x,y,w,hcoordinates of the detected face - Detection confidence: the probability that the detection is a face
- Face ID: a consistent identifier when
identify_facesis enabled - Emotion scores: scores for each of the 48 expressions
- FACS scores: action unit scores when
facsis enabled - Description scores: facial description scores when
descriptionsis enabled
When identify_faces is enabled, each group’s id is replaced with a unique face identifier that persists across
frames, allowing you to track the same individual throughout a video.
Expressions
The facial expression model measures the following 48 expressions. Each score indicates the degree to which a human rater would identify that expression in the given sample.
FACS 2.0
The Facial Action Coding System (FACS) measures individual facial muscle movements called action units. When facs is
enabled, predictions include intensity scores for each of the following action units.
Facial descriptions
When descriptions is enabled, predictions include scores for high-level facial descriptions. These provide an
intuitive, human-readable summary of the face’s appearance.
Facemesh
The facemesh model detects 468 facial landmark points, providing detailed face geometry data. This is useful for applications that need precise face shape analysis beyond emotional expression.
Facemesh has no configurable parameters. Enable it alongside the face model by passing an empty object:
Each facemesh prediction includes an array of 478 3D facial landmark coordinates for each detected face.

