eSCANFace Project

Projects

eSCANFace: Early Screening of Craniofacial Anomalies in Newborn Faces

Congenital anomalies are a major cause of infant mortality and childhood morbidity, affecting 2-3% of newborns. It is estimated that 30% to 40% of genetic disorders produce alterations in the normal morphology of the face and the head (dysmorphology), which can impact swallowing, breathing, hearing, vision, speech, and -more importantly- cognitive development.

Thus, craniofacial anomalies have been highlighted as an index of developmental disturbance at early stages of life. Initial diagnosis is often based on visual inspection from pediatricians but, unfortunately, dysmorphology is hard to identify in this way, and massive genetic screening is expensive and impractical. For these reasons, there is a growing interest in using facial imaging as a lowcost tool for genetic pre-screening, i.e., to highlight suspicious cases for further study. The objectives of this project are to develop the technology necessary to make such early screening more accurate, more accessible, and more comprehensive, and to allow its deployment as early as possible in life.

Because dysmorphology patterns tend to be subtle in most disorders, and they can affect any of the spatial components of the face (rightleft, cranio-caudal, anterior-posterior), the main hypothesis in this project is that advanced 3D modeling techniques can lead to a more accurate characterization and screening of craniofacial anomalies in infants and fetuses. This hypothesis is supported by previous findings highlighting that 3D analysis of facial dysmorphology is superior to 2D analysis, both from the project team and from other researchers..

Of particular interest for this project is the BabyFM, a 3D Morphable Model (3DMM) for babies that we designed in collaboration with the National Childrens Hospital from Washington. Since their introduction 20 years ago, 3DMMs have played a central role in most applications involving 3D facial analysis, including the recent data-driven approaches based on deep learning. However, previous 3DMMs were built from adults and, although sometimes they also included children, none of them included babies. Thus, the BabyFM constitutes a key advantage for the project team.

An additional advantage of the BabyFM, is that it allows recovering the 3D facial geometry from one or more uncalibrated pictures. This is especially relevant when targeting the analysis of newborns, since it avoids the use of expensive specialized machinery, making the technology more accessible. We will also explore the analysis to fetal data, for which our preliminary results suggest that an adequate adaptation of models built from newborns could serve as statistical constraints to guide the representation of fetal geometry, which is expected to improve accuracy given the higher quality of the data used to construct the model. Moreover, the use of a unified underlying model should facilitate the integration of the different sources of information, as well as the synthetic generation of magnified 3D patterns to address a more comprehensive visualization of the identified dysmorphologies

Principal Investigators: Gemma Piella & Federico Sukno

This project was funded by the 2020 call from “Programa Estatal de Generació de Conocimiento y Fortalecimiento Científico y Tecnológico” from the Spanish Ministry of Science and Innovation.

Index of Project Results:

1.       BabyFM: Towards accurate 3D baby facial models using spectral decomposition and asymmetry swapping, Computers in Biology and Medicine, 2025.

2.       3D imaging and geometric morphometrics of facial dysmorphology and asymmetry indicate gestational timings of dysmorphogenesis in schizophrenia and bipolar disorder. European Neuropsychopharmacology, 2025.

3.       Automatic Facial Axes Standardization of 3D Fetal Ultrasound Images, MICCAI ASMUS, 2024.

4.       PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows. BMVC 2024.

5.       OBBabyFace: Oriented Bounding Box for Infant Face Detection. DELTA 2024.

6.       Deep learning-based standardisation of the anatomical fetal facial axes in 3D prenatal ultrasounds. ISUOG 2024.

7.       Loss of normal facial asymmetry in schizophrenia and bipolar disorder: Implications for development of brain asymmetry in psychotic illness. Psychiatry Research, 2024.

8.       Accuracy and repeatability of fetal facial measurements in 3D ultrasound: A longitudinal study. Early Human Development, 2024.

9.       Deep adaptative spectral zoom for improved remote heart rate estimation. FG 2024.

10.   Three-Dimensional Face Reconstruction from Uncalibrated Photographs: Application to Early Detection of Genetic Syndromes, MICCAI CLIP 2019.

11.   BabyNet: Reconstructing 3D faces of babies from uncalibrated photographs, Pattern Recognition, 2023.

12.   An automatic pipeline for atlas-based fetal and neonatal brain segmentation and analysis. Computer Methods and Programs in Biomedicine, 2023.

13.   Prenatal facial landmarks' location at 20 and 26 weeks of gestation using 3D segmentation tools: reproducibility and feasibility – preliminary results. ISUOG 2022.

14.   Look-alike humans identified by facial recognition algorithms show genetic similarities. Cell Reports, 2022.

15.   End-to-end lip-reading without large-scale data. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022.

16.   Reconstruction of the fetus face from three-dimensional ultrasound using a newborn face statistical shape model, Computer Methods and Programs in Biomedicine, 2022.

17.   Audio-visual gated-sequenced neural networks for affect recognition. IEEE Transactions on Affective Computing, 2022.

18.   Transferring 3D facial expressions from adults to children. WSCG 2022.

19.   Efficient remote photoplethysmography with temporal derivative modules and time-shift invariant loss. CVPRW 2022.

20.    Survey on 3D face reconstruction from uncalibrated images, Computer Science Review, 2021.

21.   3D Fetal Face Reconstruction from Ultrasound Imaging, GRAPP 2021.

 

Further activities related to the project

·        4 PhD Theses

·        15 Final bachelor/master projects

 

3D imaging and geometric morphometrics of facial dysmorphology and asymmetry indicate gestational timings of dysmorphogenesis in schizophrenia and bipolar disorder

Diagram

Description automatically generated with low confidence

J.L. Waddington and F.M. Sukno

European Neuropsychopharmacology, 93: 1-2, 2025.

Related findings on the topography of facial dysmorphology across these developmental fields have also been reported in 22q11.2 deletion syndrome, which is associated with a 25-fold increase in risk for psychotic symptoms. Technical refinements have subsequently allowed geometric morphometric analysis in non-affine space for incisive resolution in bipolar disorder of more subtle, localised dysmorphologies across these three developmental fields, particularly in the frontonasal-forebrain prominence, which implicate disruption of processes during GW 10–15, The most recent studies have investigated a yet more fundamental domain of development: the extent to which vertebrate morphogenesis proceeds symmetrically or involves the embryonic breaking of left-right symmetry to create asymmetry whereby quantitative differences emerge between the left and right sides of a given structure. For example, a cardinal feature of normal subjects is brain asymmetry, including the frontal lobes, with that asymmetry postulated to be disrupted in schizophrenia. We have demonstrated that the geometry of normal facial asymmetries, primarily in the frontonasal-forebrain prominence, shows commonalities with that of normal frontal lobe asymmetries; furthermore, these normal asymmetries in the frontonasal-forebrain prominence are markedly reduced in schizophrenia and reduced also in bipolar disorder with residual retention of asymmetries. These findings implicate a trans-diagnostic process that involves loss of facial asymmetries across GW 7–14 and are consistent with still controversial loss of brain asymmetries in psychotic illness.

 

 

European Neuropsychopharmacology

A picture containing diagram

Description automatically generated

Full paper

 

 

 

 

 

 

 

BabyFM: Towards accurate 3D baby facial models using spectral decomposition and asymmetry swapping

Diagram

Description automatically generated with low confidence

A. Morales, A. Alomar, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno

Computers in Biology and Medicine, 186: 109652, 2025.

In this paper, we present the first publicly available 3D statistical facial shape model of babies, the Baby Face Model (BabyFM). Constructing a model of the facial geometry of babies entails specific challenges, such as occlusions, extreme and uncontrollable expressions, and data shortage. We address these challenges by proposing (1) a non-template dependent method that jointly estimates a 3D facial baby-specific template and the point-to-point correspondences; (2) a novel method to establish correspondences based on the spectral decomposition of the Laplace Beltrami Operator, which provides a more robust theoretical foundation than state-of-the-art methods; and (3) an asymmetry-swapping strategy to alleviate the shortage of large scale datasets by decoupling the identity-related and the asymmetry-related shape deformation fields. The latter leads to a data augmentation technique that we integrate within the Gaussian Process Morphable Model framework, providing a simple way of combining synthetic or sample covariance functions. We exhaustively evaluate each stage of our method and demonstrate that (1) when aiming at the 3D facial geometry of a baby, a specific model of babies is needed, since the pre-built publicly available models constructed with adults or older children are not able to accurately represent the facial shape of babies; (2) our spectral approach improves correspondences accuracy with respect to state-of-the-art-methods; and (3) the proposed data augmentation technique enhances the robustness of the BabyFM..

 

 

A picture containing diagram

Description automatically generated

Full paper

 

 

 

Automatic Facial Axes Standardization of 3D Fetal Ultrasound Images

A. Alomar, R Rubio, L. Salort, G. Albaixes, A Payà, G. Piella and F.M. Sukno

Proc. 5th International Workshop on Simplifying Medical Ultrasound, ASMUS 2024, in Conjunction with MICCAI, Vol 4, pp. 88–98, Marrakesh, Morocco, 2024.

Craniofacial anomalies indicate early developmental disturbances and are usually linked to many genetic syndromes. Early diagnosis is critical, yet ultrasound (US) examinations often fail to identify these features. This study presents an AI-driven tool to assist clinicians in standardizing fetal facial axes/planes in 3D US, reducing sonographer workload and facilitating the facial evaluation. Our network, structured into three blocks-feature extractor, rotation and translation regression, and spatial transformer-processes three orthogonal 2D slices to estimate the necessary transformations for standardizing the facial planes in the 3D US. These transformations are applied to the original 3D US using a differentiable module (the spatial transformer block), yielding a standardized 3D US and the corresponding 2D facial standard planes. The dataset used consists of 1180 fetal facial 3D US images acquired between weeks 20 and 35 of gestation. Results show that our network considerably reduces inter-observer rotation variability in the test set, with a mean geodesic angle difference of 14.12 ± 18.27 and an Euclidean angle error of 7.45 ± 14.88. These findings demonstrate the network’s ability to effectively standardize facial axes, crucial for consistent fetal facial assessments. In conclusion, the proposed network demonstrates potential for improving the consistency and accuracy of fetal facial assessments in clinical settings, facilitating early evaluation of craniofacial anomalies.

 

 

 

Full paper

PhysFlow: Skin tone transfer for remote heart rate estimation through conditional normalizing flows

J. Comas, A. Alomar, A. Ruiz and F.M. Sukno

Proc. 35th British Machine Vision Conference, Glasgow, UK, 2024.

In recent years, deep learning methods have shown impressive results for camera-based remote physiological signal estimation, clearly surpassing traditional methods. However, the performance and generalization ability of Deep Neural Networks heavily depends on rich training data truly representing different factors of variation encountered in real applications. Unfortunately, many current remote photoplethysmography (rPPG) datasets lack diversity, particularly in darker skin tones, leading to biased performance of existing rPPG approaches. To mitigate this bias, we introduce PhysFlow, a novel method for augmenting skin diversity in remote heart rate estimation using conditional normalizing flows. PhysFlow adopts end-to-end training optimization, enabling simultaneous training of supervised rPPG approaches on both original and generated data. Additionally, we condition our model using CIELAB color space skin features directly extracted from the facial videos without the need for skin-tone labels. We validate PhysFlow on publicly available datasets, UCLA-rPPG and MMPD, demonstrating reduced heart rate error, particularly in dark skin tones. Furthermore, we demonstrate its versatility and adaptability across different data-driven rPPG methods.

 

 

Full paper

A close-up of a document

Description automatically generated

Poster

 

OBBabyFace: Oriented Bounding Box for Infant Face Detection

J.C. Reyes-Hernández, A. Alomar, R. Rubio, G. Piella and F.M. Sukno

Proc. 5th International Conference on Deep Learning Theory and Applications, Dijon, France, 2024.

This study presents an infant-specific face detection approach that addresses the existing gap in facial detection for non-adults, where the typical bias is toward adult faces. A new infant faces dataset was created to enhance Deep Learning (DL) models’ ability to accurately detect infant faces, comprising over 8,862 images with diverse orientations. We introduce Oriented Bounding Boxes (OBB) to account for greater variability in face orientations observed in infants, offering precise alignment to their orientation, a significant improvement over traditional Axis-Aligned Bounding Boxes (AABB). Employing the YOLOv8-OBB architecture, our model is trained and compared against state-of-the-art models such as RetinaFace and MogFace. The results show that our approach outperforms state-of-the-art methods in precision and recall, particularly in non-frontal facial orientations. The proposed infant face detector marks a major advancement in pediatric face detection technology, offering a robust foundation for future advancements in medical monitoring and developmental diagnosis..

 

 

Full paper

 

 

 

 

 

 

 

Deep learning-based standardisation of the anatomical fetal facial axes in 3D prenatal ultrasounds

A. Alomar, R. Rubio, A. Payá, G. Piella, F. Sukno

34th World Congress on Ultrasound in Obstetrics & Gynecology, Budapest, Hungary, Volume 64, Issue S1 p. 109-109, 2024.

Objectives: We propose an AI-driven tool designed to assist clinicians in standardising the detection of facial planes in 3D ultrasound (US) imaging. It aims to minimise variability across detections while simultaneously mitigating the effects of interobserver variability in fetal facial assessment.

Methods: We used 445 fetal facial 3D US images acquired between 20 and 26 weeks of gestation. The data was split into 80% for training and 20% for validation. We defined the three-orthogonal standard planes of the fetus' facial planes using anatomical landmarks and computed the 3D ground truth (GT) transformations to achieve alignment with these planes. A deep learning architecture was trained to take as input 3 orthogonal slices from the 3D US and output the 3D translation and rotation to achieve the standard anatomical facial axes. The network is composed of 4 blocks: features extractor, translation regressor, rotation regressor, and differentiable spatial transform (see figure 1). To assess the resulting standard planes, the average error between the estimated 3D transformation by the network and the 3D GT transform was computed. Also, the PSNR and the SSIM between the estimated and the GT planes were calculated.

Results: The network accurately estimates 3D transformations for standardising the fetus facial axes, obtaining a translation error of 4.55 mm, a rotation error of 17.4° degrees, a PSNR of 18.6 dB, and a SSIM of 0.657 in the validation set. The estimated standard facial slices closely match the GT standard facial planes.

Conclusions: Our method effectively standardises the fetal facial axes, facilitating the measurement and assessment of the fetal face to diagnose fetal abnormalities. Consequently, this tool has the potential to reduce reliance on interobserver variability and alleviate the time and burden associated with locating these planes.

A diagram of a process

Description automatically generated

 

Loss of normal facial asymmetry in schizophrenia and bipolar disorder: implications for development of brain asymmetry in psychotic illness

F.M. Sukno, B.D Kelly, A. Lane, S. Katina, M.A. Rojas, P.F. Whelan and John L Waddington

Psychiatry Research, 342: 116213, 2024.

Audio-Visual Speech Recognition (AVSR) faces the difficult task of exploiting acoustic and visual cues simultaneously. Augmenting speech with the visual channel creates its own challenges, e.g. every person has unique mouth movements, making the generalization of visual models very difficult. This factor motivates our focus on the generalization of speaker-independent (SI) AVSR systems especially in noisy environments by exploiting the visual domain. Specifically, we are the first to explore the visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker. We adapt an AVSR system trained in a source domain to decode samples in a target domain without the need for labels in the target domain. For the domain adaptation of the unknown speaker, we use Coupled Generative Adversarial Networks to automatically learn a joint distribution of multi-domain images. We evaluate our character-based AVSR system on the TCD-TIMIT dataset and obtain up to a 10% average improvement with respect to its AVSR system equivalent.

 

 

Fig 3

Full paper

 

 

 

 

 

Accuracy and repeatability of fetal facial measurements in 3D ultrasound: A longitudinal study

N. González-Aranceta, A. Alomar, R. Rubio, S. Maya-Enero, A. Payá, G. Piella and F.M. Sukno.

Early Human Development, 193: 106021, 2024.

Objective: Fetal face measurements in prenatal ultrasound can aid in identifying craniofacial abnormalities in the developing fetus. However, the accuracy and reliability of ultrasound measurements can be affected by factors such as fetal position, image quality, and the sonographer's expertise. This study assesses the accuracy and reliability of fetal facial measurements in prenatal ultrasound. Additionally, the temporal evolution of measurements is studied, comparing prenatal and postnatal measurements.

Methods: Three different experts located up to 23 facial landmarks in 49 prenatal 3D ultrasound scans from normal Caucasian fetuses at weeks 20, 26, and 35 of gestation. Intra- and inter-observer variability was obtained. Postnatal facial measurements were also obtained at 15 days and 1 month postpartum.

Results: Most facial landmarks exhibited low errors, with overall intra- and inter-observer errors of 1.01 mm and 1.60 mm, respectively. Landmarks on the nose were found to be the most reliable, while the most challenging ones were those located on the ears and eyes. Overall, scans obtained at 26 weeks of gestation presented the best trade-off between observer variability and landmark visibility. The temporal evolution of the measurements revealed that the lower face area had the highest rate of growth throughout the latest stages of pregnancy.

Conclusions: Craniofacial landmarks can be evaluated using 3D fetal ultrasound, especially those located on the nose, mouth, and chin. Despite its limitations, this study provides valuable insights into prenatal and postnatal biometric changes over time, which could aid in developing predictive models for postnatal measurements based on prenatal data.

A screenshot of a computer

Description automatically generated

Full paper

 

Deep adaptative spectral zoom for improved remote heart rate estimation

J. Comas, A. Ruiz and F.M. Sukno

18th IEEE International Conference on Automatic Face and Gesture Recognition (FG), Istanbul, Turkey, 2024.

Recent advances in remote heart rate measurement, motivated by data-driven approaches, have notably enhanced accuracy. However, these improvements primarily focus on recovering the rPPG signal, overlooking the implicit challenges of estimating the heart rate (HR) from the derived signal. While many methods employ the Fast Fourier Transform (FFT) for HR estimation, the performance of the FFT is inherently affected by a limited frequency resolution. In contrast, the Chirp-Z Transform (CZT), a generalization form of FFT, can refine the spectrum to the narrow-band range of interest for heart rate, providing improved frequential resolution and, consequently, more accurate estimation. This paper presents the advantages of employing the CZT for remote HR estimation and introduces a novel data-driven adaptive CZT estimator. The objective of our proposed model is to tailor the CZT to match the characteristics of each specific dataset sensor, facilitating a more optimal and accurate estimation of HR from the rPPG signal without compromising generalization across diverse datasets. This is achieved through a Sparse Matrix Optimization (SMO). We validate the effectiveness of our model through exhaustive evaluations on three publicly available datasets UCLA-rPPG, PURE, and UBFC-rPPG employing both intra- and cross-database performance metrics. The results reveal outstanding heart rate estimation capabilities, establishing the proposed approach as a robust and versatile estimator for any rPPG method.

A diagram of a diagram of a diagram

Description automatically generated

 

 

Full paper

 

BabyNet: Reconstructing 3D faces of babies from uncalibrated photographs

Diagram

Description automatically generated with low confidence

A. Morales, A. Alomar, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno

Pattern Recognition 139, 109367, 2023

We present a 3D face reconstruction system that aims at recovering the 3D facial geometry of babies from uncalibrated photographs, BabyNet. Since the 3D facial geometry of babies differs substantially from that of adults, baby-specific facial reconstruction systems are needed. BabyNet consists of two stages: 1) a 3D graph convolutional autoencoder learns a latent space of the baby 3D facial shape; and 2) a 2D encoder that maps photographs to the 3D latent space based on representative features extracted using transfer learning. In this way, using the pre-trained 3D decoder, we can recover a 3D face from 2D images. We evaluate BabyNet and show that 1) methods based on adult datasets cannot model the 3D facial geometry of babies, which proves the need for a baby-specific method, and 2) BabyNet outperforms classical model-fitting methods even when a baby-specific 3D morphable model, such as BabyFM, is used.

 

 

 

 

 

A collage of different faces

Description automatically generated

A picture containing diagram

Description automatically generated

Full paper

 

 

 

 

 

An automatic pipeline for atlas-based fetal and neonatal brain segmentation and analysis

Diagram

Description automatically generated with low confidence

A. Urru A. Nakaki, O. Benkarim, F. Crovetto, L. Segalés, V. Comte, N. Hahner, E. Eixarch, E. Gratacos, F. Crispi, G. Piella and Miguel A. González Ballester

Computer Methods and Programs in Biomedicine 230, 107334, 2023

Background and Objective: The automatic segmentation of perinatal brain structures in magnetic resonance imaging (MRI) is of utmost importance for the study of brain growth and related complications. While different methods exist for adult and pediatric MRI data, there is a lack for automatic tools for the analysis of perinatal imaging.

Methods: In this work, a new pipeline for fetal and neonatal segmentation has been developed. We also report the creation of two new fetal atlases, and their use within the pipeline for atlas-based segmentation, based on novel registration methods. The pipeline is also able to extract cortical and pial surfaces and compute features, such as curvature, local gyrification index, sulcal depth, and thickness.

Results: Results show that the introduction of the new templates together with our segmentation strategy leads to accurate results when compared to expert annotations, as well as better performances when compared to a reference pipeline (developing Human Connectome Project (dHCP)), for both early and late-onset fetal brains.

Conclusions: These findings show the potential of the presented atlases and the whole pipeline for application in both fetal, neonatal, and longitudinal studies, which could lead to dramatic improvements in the understanding of perinatal brain development.

 

 

 

Fig. 1

 

 

A close-up of a document

Description automatically generated

Full paper

 

 

Prenatal facial landmarks' location at 20 and 26 weeks of gestation using 3D segmentation tools: reproducibility and feasibility – preliminary results

R. Rubio, A. Alomar, S. Maya, A. Payá, G. Piella and F.M. Sukno

32nd World Congress in Ultrasound in Obstetrics & Gynecology, London, UK. Volume 60, Issue S1 p. 106-106, 2022.

Objectives: The aim of this study is analysing the feasibility and reproducibility of fetal facial landmarks (lmks) location on 3D ultrasound (US) volumes.

Methods: We examined 11 cases of low-risk Caucasian pregnant women. We acquired 3D US volumes at week 20 and 26. The volumes were automatically segmented (binary thresholding) using 3DSlicer. Then, two observers located the visible lmks out of the 23 anatomical lmks considered (figure 1, left), using the best 3D segmented volumes 2/week/case.

Results: We assessed the feasibility and the inter-observer variability of fetal facial landmarking at week 20 and 26. Obs1 and Obs2 placed an average of 11,46 and 10,95 lmks/mesh respectively. The lmks error/mesh between observers depends on the lmk considered (mean 1,96 ± 1,24 mm, range between 3,47 and 0,78 mm). The more reliable lmks are sn, prn, and ls where the error between observers is lower than 2 mm. The largest discrepancies are obtained when comparing eye lmks (ex, en). A clear difference in the visibility of each landmark can be observed (figure 1) between week 20 and 26. At week 26, 9 lmks around the nose are visible in more than 80% of the scans, whereas only 2 lmks at week 20.

Conclusions: To conclude, 20 weeks 3D US volumes are difficult to landmark, whereas 26 weeks volumes are easier as finer details are present and more lmks can be located. Lmks located in 3D US volumes have low inter-observer variability but insufficient precision and accuracy for facial analysis due to noise and poor 3D US quality. However, a 3D morphable model could play a crucial role in fetal face analysis at week 26 when more facial lmks can be located and used to initialise the model.

A close up of a baby's face

Description automatically generated

 

Look-alike humans identified by facial recognition algorithms show genetic similarities

R.S. Joshi, M. Rigau, C.A García-Prieto, M. Castro de Moura, D. Piñeyro, S. Moran, V. Davalos, P. Carrión, M. Ferrando-Bernal, I. Olalde, C. Lalueza-Fox, A. Navarro, C. Fernández-Tena, D. Aspandi, F.M. Sukno, X. Binefa, A. Valencia and M. Esteller

Cell Reports, 40(8): 111257, 2022.

The human face is one of the most visible features of our unique identity as individuals. Interestingly, monozygotic twins share almost identical facial traits and the same DNA sequence but could exhibit differences in other biometrical parameters. The expansion of the world wide web and the possibility to exchange pictures of humans across the planet has increased the number of people identified online as virtual twins or doubles that are not family related. Herein, we have characterized in detail a set of “look-alike” humans, defined by facial recognition algorithms, for their multiomics landscape. We report that these individuals share similar genotypes and differ in their DNA methylation and microbiome landscape. These results not only provide insights about the genetics that determine our face but also might have implications for the establishment of other human anthropometric properties and even personality characteristics.

 

 

A collage of a map of the world

Description automatically generated

 

 

Full paper

 

End-to-end lip-reading without large-scale data

A. Fernandez-Lopez and F.M. Sukno

IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30: 2076-2090, 2022.

The development of Automatic Lip-Reading (ALR) systems for continuous speech recognition has so far limited their applicability to English since this is the only language with large-scale datasets sufficient to train end-to-end ALR systems. In this work, we show that it is possible to train competitive end-to-end ALR systems in alternative languages with challenging small-scale data as long as the appropriate restrictions are made to the learning process of the visual front-end objective. To this end, we hypothesize that the visual front-end should be trained in a self-supervised setting, allowing it to target its own visual units . We specifically define visual units as a collection of visually similar images constrained by linguistics and provide an algorithmic implementation to automatically generate them. We show that visual units can be used to add an intermediate classification task between the visual and temporal modules that facilitates meaningful learning of visual features and, as a consequence, reduces the amount of data required to train an end-to-end ALR system. Additionally, we present a data augmentation strategy for enriching the temporal context. We synthesize realistic video sequences by appropriately combining characters-like sub-sequences from existing videos. We test the proposed ALR system on i) the VLRF dataset, a small-scale database that is one of the largest in Spanish, and achieve 44.77 % CER and 72.90% WER, which are competitive with the state-of-the-art and significant for this volume of training material; ii) the TCD-TIMIT dataset, a comparable medium-scale database in English, where we achieve 36.58% CER and 56.29% WER, which are also state-of-the-art results on speaker-dependent experiments..

Mapping into visual units of the phrase "Miraba el reloj" for sets with 15 and 11 visual units.

Example of approximate automatic annotations per frame for 3 different sequences.

Full paper

 

Reconstruction of the fetus face from three-dimensional ultrasound using a newborn face statistical shape model

A. Alomar, A. Morales, K. Vellvé, A.R. Porras, F. Crispi, M.G. Linguraru, G. Piella and F.M. Sukno

Computer Methods and Programs in Biomedicine 221, 106893, 2022

Background and objective: The fetal face is an essential source of information in the assessment of congenital malformations and neurological anomalies. Disturbance in early stages of development can lead to a wide range of effects, from subtle changes in facial and neurological features to characteristic facial shapes observed in craniofacial syndromes. Three-dimensional ultrasound (3D US) can provide more detailed information about the facial morphology of the fetus than the conventional 2D US, but its use for pre-natal diagnosis is challenging due to imaging noise, fetal movements, limited field-of-view, low soft-tissue contrast, and occlusions.

Methods: In this paper, we propose the use of a novel statistical morphable model of newborn faces, the BabyFM, for fetal face reconstruction from 3D US images. We test the feasibility of using newborn statistics to accurately reconstruct fetal faces by fitting the regularized morphable model to the noisy 3D US images.

Results: The results indicate that the reconstructions are quite accurate in the central-face and less reliable in the lateral regions (mean point-to-surface error of 2.35 mm vs 4.86 mm). The algorithm is able to reconstruct the whole facial morphology of babies from US scans while handle adverse conditions (e.g. missing parts, noisy data).

Conclusions: The proposed algorithm has the potential to aid in-utero diagnosis for conditions that involve facial dysmorphology.

 

Fig. 2

Full paper

 

Audio-visual gated-sequenced neural networks for affect recognition

D. Aspandi, F.M. Sukno, B.W. Schuller and X Binefa

IEEE Transactions on Affective Computing 14 (3), 2193-2208, 2022.

The interest in automatic emotion recognition and the larger field of Affective Computing has recently gained momentum. The current emergence of large, video-based affect datasets offering rich multi-modal inputs facilitates the development of deep learning-based models for automatic affect analysis that currently holds the state of the art. However, recent approaches to process these modalities cannot fully exploit them due to the use of oversimplified fusion schemes. Furthermore, the efficient use of temporal information inherent to these huge data are also largely unexplored hindering their potential progress. In this work, we propose a multi-modal, sequence-based neural network with gating mechanisms for Valence and Arousal based affect recognition. Our model consists of three major networks: Firstly, a latent-feature generator that extracts compact representations from both modalities that have been artificially degraded to add robustness. Secondly, a multi-task discriminator that estimates both input identity and a first step emotion quadrant estimation. Thirdly, a sequence-based predictor with attention and gating mechanisms that effectively merges both modalities and uses this information through sequence modelling. In our experiments on the SEMAINE and SEWA affect datasets, we observe the impact of both proposed methods with progressive increase in accuracy. We further show in our ablation studies how the internal attention weight and gating coefficient impact our models’ estimates quality. Finally, we demonstrate state of the art accuracy through comparisons with current alternatives on both datasets..

 

A diagram of a person

Description automatically generated

Full paper

 

Transferring 3D facial expressions from adults to children

Diagram

Description automatically generated with low confidence

A. Alomar, A. Morales, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno

Proc. 30th International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czech Republic, pp 109 – 118, 2022

Diagnosis of craniofacial conditions is shifting towards pre- and peri-natal stages, since early assessment has shown to be crucial for the effective treatment of functional and developmental aspects of children. 3D Morphable Models are a valuable tool for such evaluation. However, limited data availability on 3D newborn geometry, and highly variable imaging environments, challenge the construction of 3D baby face models. Our hypothesis is that constructing a bi-linear baby face model that allows identity and expression decoupling, enables to improve craniofacial and brain function assessments. Thus, given that adult and infants facial expression configurations are very similar and that 3D facial expressions in babies are difficult to be scanned in a controlled manner, we propose transferring the facial expressions from the available FaceWarehouse (FW) database to baby scans, to construct a baby-specific bi-linear expression model. First, we defined a spatial mapping between the BabyFM and the FW. Then, we propose an automatic neutralization to remove the expressions from the facial scans. Finally, we apply expression transfer to obtain a complete data tensor. We test the performance and generalization of the resulting bi-linear model with a test set. Results show that the obtained model allow us to successfully and realistically manipulate facial expressions of babies while keeping them decoupled from identity variations.

A collage of different facial expressions

Description automatically generated

A picture containing diagram

Description automatically generated

Full paper

 

 

Efficient remote photoplethysmography with temporal derivative modules and time-shift invariant loss

J. Comas, A Ruiz and F.M. Sukno

Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2182-2191, New Orleans, Louisiana, USA, 2022.

We present a lightweight neural model for remote heart rate estimation focused on the efficient spatio-temporal learning of facial photoplethysmography (PPG) based on i) modelling of PPG dynamics by combinations of multiple convolutional derivatives, and ii) increased flexibility of the model to learn possible offsets between the facial video PPG and the ground truth. PPG dynamics are modelled by a Temporal Derivative Module (TDM) constructed by the incremental aggregation of multiple convolutional derivatives, emulating a Taylor series expansion up to the desired order. Robustness to ground truth offsets is handled by the introduction of TALOS (Temporal Adaptive LOcation Shift), a new temporal loss to train learning-based models. We verify the effectiveness of our model by reporting accuracy and efficiency metrics on the public PURE and UBFC-rPPG datasets. Compared to existing models, our approach shows competitive heart rate estimation accuracy with a much lower number of parameters and lower computational cost.

A diagram of a diagram

Description automatically generated

Full paper

 

 

 

Survey on 3D face reconstruction from uncalibrated images

Diagram

Description automatically generated with low confidence

A. Morales, G. Piella and F.M. Sukno

Computer Science Review, 40(5): 100400, 2021.

Recently, a lot of attention has been focused on the incorporation of 3D data into face analysis and its applications. Despite providing a more accurate representation of the face, 3D facial images are more complex to acquire than 2D pictures. As a consequence, great effort has been invested in developing systems that reconstruct 3D faces from an uncalibrated 2D image. However, the 3D-from-2D face reconstruction problem is ill-posed, thus prior knowledge is needed to restrict the solutions space. In this work, we review 3D face reconstruction methods proposed in the last decade, focusing on those that only use 2D pictures captured under uncontrolled conditions. We present a classification of the proposed methods based on the technique used to add prior knowledge, considering three main strategies, namely, statistical model fitting, photometry, and deep learning, and reviewing each of them separately. In addition, given the relevance of statistical 3D facial models as prior knowledge, we explain the construction procedure and provide a list of the most popular publicly available 3D facial models. After the exhaustive study of 3D-from-2D face reconstruction approaches, we observe that the deep learning strategy is rapidly growing since the last few years, becoming the standard choice in replacement of the widespread statistical model fitting. Unlike the other two strategies, photometry-based methods have decreased in number due to the need for strong underlying assumptions that limit the quality of their reconstructions compared to statistical model fitting and deep learning methods. The review also identifies current challenges and suggests avenues for future research.

Chart, bar chart

Description automatically generated

A picture containing diagram

Description automatically generated

Full paper

 

3D Fetal Face Reconstruction from Ultrasound Imaging

A. Alomar, A. Morales, K. Vellve, A.R. Porras, F. Crispi, M.G. Linguraru, G. Piella and F.M. Sukno

Proc. 18th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Vol 4 VISAPP, pp. 615–624, 2021.

The fetal face contains essential information in the evaluation of congenital malformations and the fetal brain function, as its development is driven by genetic factors at early stages of embryogenesis. Three-dimensional ultrasound (3DUS) can provide information about the facial morphology of the fetus, but its use for prenatal diagnosis is challenging due to imaging noise, fetal movements, limited field-of-view, low soft-tissue contrast, and occlusions. In this paper, we propose a fetal face reconstruction algorithm from 3DUS images based on a novel statistical morphable model of newborn faces, the BabyFM. We test the feasibility of using newborn statistics to accurately reconstruct fetal faces by fitting the regularized morphable model to the noisy 3DUS images. The algorithm is capable of reconstructing the whole facial morphology of babies from one or several ultrasound scans to handle adverse conditions (e.g. missing parts, noisy data), and it has the potential to aid in-utero diagnosis for conditions that involve facial dysmorphology.

Full paper

Icon

Description automatically generated

Presentation (video)

Further activities related to the project

The research activities in this project have contributed directly or indirectly to the development of undergrad and graduate students, as briefly summarized below:

 

Antonia Alomar Adrover. Perinatal 3D face reconstruction and analysis for early diagnosis of craniofacial anomalies. PhD Thesis, Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions (In progress, 2021 – 2025), Supervisors: F.M. Sukno and G. Piella

 

Joaquim Comas Martinez. Deep facial analysis for remote emotion estimation in real world scenarios

PhD Thesis, Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions (in progress, 2021 – 2025). Supervisors: F.M. Sukno and A. Ruiz

 

Mireia Masías Bruns. Normalizing flows for Neurorimaging Research

PhD Thesis, Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions (04-03-2021). Supervisors: G. Piella and M.A. Gonzalez Ballester

 

María de Aracerli Morales.  Statistical modelling of the baby face for 3D face reconstruction

PhD Thesis, Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les Comunicacions (In progress, 2018 – 2022), Supervisors: F.M. Sukno and G. Piella

 

Marc Aguilar Velazquez. Baby face generation and edition through text-guided diffusion models

Bachelor Thesis, Mathematical Engineering in Data Science, Universitat Pompeu Fabra (2024). Supervisors: A. Alomar, G. Piella and F.M. Sukno

 

Xavier Vives i Sanchez. Base de dades multimodal pel reconeixement d’emocions a partir d’expressions facials i senyals fisiologiques. Bachelor Thesis, Mathematical Engineering in Data Science, Universitat Pompeu Fabra (2024). Supervisors: J. Comas, F.M. Sukno

 

Judith Recober Martín. Biomechanical regularization in a deep learning network for a fetal MRI registration and segmentation pipeline. Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2023). Supervisors: V. Comte, G. Piella. M.A. Gonzalez.

 

Sofia Cárdenas Linares. Reconocimiento de expresiones en máscaras antiguas

Bachelor Thesis, Audiovisual Systems Engineering, Universitat Pompeu Fabra (2023).

Supervisor: F.M. Sukno, A. Orsingher

 

Alexander Joel Vera Moncayo. A high-quality dataset for emotion recognition based on contactless

physiological signals

Bachelor Thesis, Audiovisual Systems Engineering, Universitat Pompeu Fabra (2023).

Supervisors: J. Comas, F.M. Sukno

 

Icon

Description automatically generated

Alba Puyuelo Citoler. The face as a window to the brain: Development of a pipeline for facial segmentation from magnetic resonance scans. Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2022). Supervisors: G. Piella, F.M. Sukno

 

Nerea González Aranceta. On the basis of impaired functional connectivity in psychosis: the role of structural substrate and neurometabolites contribution Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2022). Supervisors: M. Masias, G. Piella.

 

Pablo Mesa. Fetus Facial Landmark Detection on 3D Ultrasounds with Deep Learning

Master Thesis, Computational Biomedical Engineering, Universitat Pompeu Fabra (2022).

Supervisors: A. Alomar, G. Piella, F.M. Sukno

 

Yadira Ronquillo. Continuous Lip Reading in Spanish

Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2022).

Supervisors: F.M. Sukno, A. Fernandez-Lopez

 

Maria Pujol Gil. Cerebral maturation measures on autism spectrum disorder and attention deficit/hyperactivity disorder patients based on MRI scans. Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2022). Supervisors: G. Piella, D. Paretto

 

Daniel Mateos Manjón. Procesamiento de imágenes en juegos de Atari para facilitar el aprendizaje reforzado. Bachelor Thesis, Audiovisual Systems Engineering, Universitat Pompeu Fabra (2021).

Supervisors: A. Jonsson, F.M. Sukno

 

Adrian Blanco Barco. Procesamiento de imagen para Inteligencia Artificial

Bachelor Thesis, Audiovisual Systems Engineering, Universitat Pompeu Fabra (2021).

Supervisors: A. Jonsson, F.M. Sukno

 

Anna Harris Martinez. Brain alterations between attention deficit/hyperactivity disorder and autism spectrum disorder. A volumetric, structural and functional study. Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2021). Supervisors: G. Piella, D. Paretto

 

Angel Bazan. Use of unsupervised techniques for brain MRI stratification

Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2021).

Supervisors: M. Masias, G. Piella

 

A red book with a black stripe

Description automatically generated

Antonia Alomar Adrover. Automatic 3D segmentation algorithm of the fetal face using 3D ultrasound imaging. Master Thesis, Computational Biomedical Engineering, Universitat Pompeu Fabra (2021).

Supervisors: A. Morales, G. Piella, F.M. Sukno

 

Acknowledgements

The Principal Investigators, Prof. Gemma Piella & Dr. Federico Sukno, would like to thank all those involved in the activities listed above, and to the Ministry of Science, Innovation and Universities, who funded this project through grant PID2020-114083GB-I00

A close up of a number

Description automatically generated