Facial Action Unit Detection with Transformers
The Facial Action Coding System is a taxonomy for fine-grained facial expression analysis. This paper proposes a method for detecting Facial Action Units (FAU), which define particular face muscle activity, from an input image. FAU detection is formulated as a multi-task learning problem, where image features and attention maps are input to a branch for each action unit to extract discriminative feature embeddings, using a new loss function, the Center Contrastive (CC) loss. We employ a new FAU correlation network, based on a transformer encoder architecture, to capture the relationships between different action units for the wide range of expressions in the training data. The resulting features are shown to yield high classification performance. We validate our design choices, including the use of CC loss and Tversky loss functions, in ablative experiments. We show that the proposed method outperforms state-of-theart techniques on two public datasets, BP4D and DISFA, with an absolute improvement of the F1-score of over 2% on each.