UNFACE Project

UNFACE: Fine grained facial analysis for unmasking hidden information

The human face is a fundamental source of information to understand the behavior of individuals. Traditionally this has been exploited in computer vision for the recognition of identity and expressions but it has been recently suggested that the information that could be extracted from the face goes well beyond this and can be indicative of things such as deception, heart rate, psychological states or even psychiatric disorders such as autism or depression Some of this information, however, might be not apparent or it might even be hidden to us, and it could only be recovered by means of specialized techniques. An iconic example is the detection of cardiac heart rate by amplifying the subtle color changes of the face due to the blood flow, which are invisible to the human eye.

The goal of the UNFACE project has been to addresses fine grained facial analysis to unmask different sources of information hidden in the face. The project has delivered research results in both fundamental facial analysis algorithms (e.g. landmark localization and tracking, facial expressions, head pose estimation, and facial surface reconstruction) and in a few selected application areas to demonstrate the practical relevance of the developed methods (e.g. affective computing, automatic lip reading, dysmorphology analysis and deception detection),

Among the achievements of the UNFACE project, we highlight 1) the design of advanced deep learning architectures for accuracy and robust tracking of facial landmarks under realistic (in-the-wild) scenarios, for which the resulting models have been made publicly available; 2) the use of spectral decomposition methods to improve the accuracy of facial expression analysis in 3D, as well as to improve dense surface correspondences for 3D facial reconstruction; 3) the creation of the first 3D baby face model, built exclusively from infant facial surfaces within an innovative pipeline based on the aforementioned spectral correspondences that is further capable to automatically derive the model template instead of requiring a pre-existing one as commonly required by other state of the art methods; 4) the development of a database for lie detection based on a competitive game scenario that promotes the frequent and motivated use of lies by the participants, recorded with multiple cameras that provide both 2D and 3D information of the participant faces; 5) the development of data-driven representations that make possible continuous lip reading in Spanish, and potentially also in other languages without the need of replicating the huge data resources needed to train other state of the art lip reading systems, which in practice constrain their applicability only to English.

Principal Investigators: Xavier Binefa & Federico Sukno

This project was funded by the 2017 call from “Programa Estatal de Fomento de la Investigación Científica y Técnica de Excelencia” from the Spanish Ministry of Economy, Industry and Competitiveness.

Index of Project Results:

1. Composite recurrent network with internal denoising for facial alignment in still and video images in the wild, Image and Vision Computing, 2021.

2. Survey on 3D face reconstruction from uncalibrated images, Computer Science Review, 2021.

3. 3D Fetal Face Reconstruction from Ultrasound Imaging, GRAPP 2021.

4. An Enhanced Adversarial Network with Combined Latent Features for Spatio-Temporal Facial Affect Estimation in the Wild, VISAPP 2021.

5. Spectral Correspondence Framework for Building a 3D Baby Face Model , FG 2020.

6. End-to-end facial and physiological model for Affective Computing and applications, FG 2020.

7. Refining the resolution of craniofacial dysmorphology in bipolar disorder as an index of brain dysmorphogenesis, Psychiatry Research, 291: 113243, 2020.

8. Cogans For Unsupervised Visual Speech Adaptation to New Speakers, ICASSP 2020.

9. Tensor Decomposition and Non-linear Manifold Modeling for 3D Head Pose Estimation, International Journal of Computer Vision, 127(10): 1565–1585, 2019.

10. Three-Dimensional Face Reconstruction from Uncalibrated Photographs: Application to Early Detection of Genetic Syndromes, MICCAI CLIP 2019.

11. Robust facial alignment with internal denoising auto-encoder, CRV 2019.

12. Lip-Reading with Limited-Data Network, EUSIPCO 2019.

13. Fully end-to-end composite recurrent convolution network for deformable facial tracking in the wild, FG 2019.

14. Heatmap-guided balanced deep convolution networks for family classification in the wild, FG 2019.

15. Optimizing Phoneme-to-Viseme Mapping for Continuous Lip-Reading in Spanish, Communication in Computer and Information Science, 2019.

16. Multi-instance dynamic ordinal random fields for weakly supervised facial behavior analysis, IEEE Transactions on Image Processing, 27(8): 3969–3982, 2018.

17. Automatic local shape spectrum analysis for 3D facial expression recognition, Image and Vision Computing, 79: 86–98, 2018.

18. 3D head pose estimation using tensor decomposition and non-linear manifold modeling, 3DV 2018.

19. Survey on Automatic Lip-Reading in the Era of Deep Learning, Image and Vision Computing, 78: 53–72, 2018.

20. A quantitative comparison of methods for 3D face reconstruction from 2D images, FG 2018.

Further activities related to the project

· 4 PhD Theses

· 12 Final bachelor/master projects

Composite recurrent network with internal denoising for facial alignment in still and video images in the wild

D. Aspandi, O. Martinez, F.M. Sukno and X. Binefa

Image and Vision Computing, 111(7): 104189, 2021.

The Facial alignment is an essential task for many higher level facial analysis applications, such as animation, human activity recognition and human - computer interaction. Although the recent availability of big datasets and powerful deep-learning approaches have enabled major improvements on the state of the art accuracy, the performance of current approaches can severely deteriorate when dealing with images in highly unconstrained conditions, which limits the real-life applicability of such models. In this paper, we propose a composite recurrent tracker with internal denoising that jointly address both single image facial alignment and deformable facial tracking in the wild. Specifically, we incorporate multilayer LSTMs to model temporal dependencies with variable length and introduce an internal denoiser which selectively enhances the input images to improve the robustness of our overall model. We achieve this by combining 4 different sub-networks that specialize in each of the key tasks that are required, namely face detection, bounding-box tracking, facial region validation and facial alignment with internal denoising. These blocks are endowed with novel algorithms resulting in a facial tracker that is both accurate, robust to in-the-wild settings and resilient against drifting. We demonstrate this by testing our model on 300-W and Menpo datasets for single image facial alignment, and 300-VW dataset for deformable facial tracking. Comparison against 20 other state of the art methods demonstrates the excellent performance of the proposed approach.

Fig. 3