SPACE : Speech-driven Portrait Animation with Controllable Expression

We present SPACE, a method for generating high-resolution, expressive videos with realistic head pose, using just speech and a single image. It uses a multi-stage approach, combining the controllability of facial landmarks with the high-quality synthesis power of a pretrained face generator. SPACE also allows for the control of emotions and their intensities. Our method outperforms prior methods in objective metrics for image quality and facial motions and is strongly preferred by users in pair-wise comparisons.

Speech-driven animation of a portrait, with control over the output pose, emotions, and intensities of expressions
Pose     Generated Transferred Generated Generated
Emotion     Neutral Neutral Happy Surprise
Emotion     Neutral Neutral Sad Fear
Pose     Generated Transferred Generated Generated