eSCANFace:
Early Screening of Craniofacial Anomalies in Newborn Faces |
|||||||||||||
Congenital anomalies are a major cause of
infant mortality and childhood morbidity, affecting 2-3% of newborns. It is
estimated that 30% to 40% of genetic disorders produce alterations in the
normal morphology of the face and the head (dysmorphology), which can impact
swallowing, breathing, hearing, vision, speech, and -more importantly-
cognitive development. Thus, craniofacial anomalies have been
highlighted as an index of developmental disturbance at early stages of life.
Initial diagnosis is often based on visual inspection from pediatricians but,
unfortunately, dysmorphology is hard to identify in this way, and massive
genetic screening is expensive and impractical. For these reasons, there is a
growing interest in using facial imaging as a lowcost tool for genetic
pre-screening, i.e., to highlight suspicious cases for further study. The
objectives of this project are to develop the technology necessary to make
such early screening more accurate, more accessible, and more comprehensive,
and to allow its deployment as early as possible in life. Because dysmorphology patterns tend to
be subtle in most disorders, and they can affect any of the spatial
components of the face (rightleft, cranio-caudal, anterior-posterior), the
main hypothesis in this project is that advanced 3D modeling techniques can
lead to a more accurate characterization and screening of craniofacial
anomalies in infants and fetuses. This hypothesis is supported by previous
findings highlighting that 3D analysis of facial dysmorphology is superior to
2D analysis, both from the project team and from other researchers.. Of particular interest for this project
is the BabyFM, a 3D Morphable Model (3DMM) for babies that we designed in
collaboration with the National Childrens Hospital from Washington. Since
their introduction 20 years ago, 3DMMs have played a central role in most
applications involving 3D facial analysis, including the recent data-driven
approaches based on deep learning. However, previous 3DMMs were built from
adults and, although sometimes they also included children, none of them
included babies. Thus, the BabyFM constitutes a key advantage for the project
team. An additional advantage of the BabyFM,
is that it allows recovering the 3D facial geometry from one or more
uncalibrated pictures. This is especially relevant when targeting the
analysis of newborns, since it avoids the use of expensive specialized machinery,
making the technology more accessible. We will also explore the analysis to
fetal data, for which our preliminary results suggest that an adequate
adaptation of models built from newborns could serve as statistical
constraints to guide the representation of fetal geometry, which is expected
to improve accuracy given the higher quality of the data used to construct
the model. Moreover, the use of a unified underlying model should facilitate
the integration of the different sources of information, as well as the
synthetic generation of magnified 3D patterns to address a more comprehensive
visualization of the identified dysmorphologies Principal Investigators: Gemma Piella & Federico Sukno This project was
funded by the 2020 call from “Programa Estatal de Generació de Conocimiento y
Fortalecimiento Científico y Tecnológico” from the Spanish Ministry of
Science and Innovation. Index of Project Results: 1. BabyFM: Towards accurate 3D baby facial models
using spectral decomposition and asymmetry swapping, Computers in Biology
and Medicine, 2025. 2. 3D imaging and geometric morphometrics of facial
dysmorphology and asymmetry indicate gestational timings of dysmorphogenesis
in schizophrenia and bipolar disorder. European Neuropsychopharmacology, 2025. 3. Automatic Facial Axes Standardization of 3D Fetal
Ultrasound Images, MICCAI ASMUS, 2024. 4. PhysFlow: Skin tone transfer for remote heart rate
estimation through conditional normalizing flows. BMVC 2024. 5. OBBabyFace: Oriented Bounding Box for Infant Face
Detection. DELTA 2024. 6. Deep learning-based standardisation of the
anatomical fetal facial axes in 3D prenatal ultrasounds. ISUOG 2024. 7. Loss of normal facial asymmetry in schizophrenia and
bipolar disorder: Implications for development of brain asymmetry in
psychotic illness. Psychiatry Research, 2024. 8. Accuracy and
repeatability of fetal facial measurements in 3D ultrasound: A longitudinal
study.
Early Human Development, 2024. 9. Deep
adaptative spectral zoom for improved remote heart rate estimation.
FG 2024. 10. Three-Dimensional
Face Reconstruction from Uncalibrated Photographs: Application to Early
Detection of Genetic Syndromes, MICCAI CLIP 2019. 11. BabyNet: Reconstructing 3D faces of babies from
uncalibrated photographs, Pattern Recognition, 2023. 12. An automatic pipeline for atlas-based fetal and
neonatal brain segmentation and analysis. Computer Methods and Programs
in Biomedicine, 2023. 13. Prenatal facial landmarks' location at 20 and 26 weeks
of gestation using 3D segmentation tools: reproducibility and feasibility –
preliminary results. ISUOG 2022. 14. Look-alike humans identified by facial recognition
algorithms show genetic similarities. Cell Reports, 2022. 15. End-to-end
lip-reading without large-scale data.
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2022. 16. Reconstruction of the fetus face from
three-dimensional ultrasound using a newborn face statistical shape model,
Computer Methods and Programs in Biomedicine, 2022. 17. Audio-visual gated-sequenced neural networks for
affect recognition. IEEE Transactions on Affective Computing, 2022. 18. Transferring 3D facial expressions from adults to
children. WSCG 2022. 19. Efficient remote photoplethysmography with
temporal derivative modules and time-shift invariant loss. CVPRW 2022. 20. Survey on 3D face reconstruction from uncalibrated images,
Computer Science Review, 2021. 21. 3D
Fetal Face Reconstruction from Ultrasound Imaging, GRAPP 2021. Further
activities related to the project ·
4 PhD Theses ·
15 Final bachelor/master
projects |
|||||||||||||
J.L. Waddington
and F.M. Sukno European
Neuropsychopharmacology, 93: 1-2, 2025. |
|||||||||||||
Related findings on the topography of facial
dysmorphology across these developmental fields have also been reported in
22q11.2 deletion syndrome, which is associated with a 25-fold increase in
risk for psychotic symptoms. Technical refinements have subsequently allowed
geometric morphometric analysis in non-affine space for incisive resolution
in bipolar disorder of more subtle, localised dysmorphologies across these
three developmental fields, particularly in the frontonasal-forebrain
prominence, which implicate disruption of processes during GW 10–15, The most
recent studies have investigated a yet more fundamental domain of
development: the extent to which vertebrate morphogenesis proceeds
symmetrically or involves the embryonic breaking of left-right symmetry to
create asymmetry whereby quantitative differences emerge between the left and
right sides of a given structure. For example, a cardinal feature of normal
subjects is brain asymmetry, including the frontal lobes, with that asymmetry
postulated to be disrupted in schizophrenia. We have demonstrated that the
geometry of normal facial asymmetries, primarily in the frontonasal-forebrain
prominence, shows commonalities with that of normal frontal lobe asymmetries;
furthermore, these normal asymmetries in the frontonasal-forebrain prominence
are markedly reduced in schizophrenia and reduced also in bipolar disorder
with residual retention of asymmetries. These findings implicate a
trans-diagnostic process that involves loss of facial asymmetries across GW
7–14 and are consistent with still controversial loss of brain asymmetries in
psychotic illness. |
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|
|
|||||||||||
BabyFM: Towards accurate 3D
baby facial models using spectral decomposition and asymmetry swapping |
|||||||||||||
A. Morales, A.
Alomar, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno Computers in
Biology and Medicine, 186: 109652, 2025. |
|||||||||||||
In this paper, we present the first publicly
available 3D statistical facial shape model of babies, the Baby Face Model
(BabyFM). Constructing a model of the facial geometry of babies entails
specific challenges, such as occlusions, extreme and uncontrollable
expressions, and data shortage. We address these challenges by proposing (1)
a non-template dependent method that jointly estimates a 3D facial
baby-specific template and the point-to-point correspondences; (2) a novel
method to establish correspondences based on the spectral decomposition of
the Laplace Beltrami Operator, which provides a more robust theoretical
foundation than state-of-the-art methods; and (3) an asymmetry-swapping
strategy to alleviate the shortage of large scale datasets by decoupling the
identity-related and the asymmetry-related shape deformation fields. The
latter leads to a data augmentation technique that we integrate within the
Gaussian Process Morphable Model framework, providing a simple way of
combining synthetic or sample covariance functions. We exhaustively evaluate
each stage of our method and demonstrate that (1) when aiming at the 3D
facial geometry of a baby, a specific model of babies is needed, since the
pre-built publicly available models constructed with adults or older children
are not able to accurately represent the facial shape of babies; (2) our
spectral approach improves correspondences accuracy with respect to
state-of-the-art-methods; and (3) the proposed data augmentation technique
enhances the robustness of the BabyFM.. |
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
Automatic Facial Axes
Standardization of 3D Fetal Ultrasound Images |
|||||||||||||
A. Alomar, R Rubio, L. Salort, G. Albaixes, A
Payà, G. Piella and F.M. Sukno Proc. 5th
International Workshop on Simplifying Medical Ultrasound, ASMUS 2024, in
Conjunction with MICCAI, Vol 4, pp. 88–98, Marrakesh, Morocco, 2024. |
|||||||||||||
Craniofacial anomalies indicate early developmental
disturbances and are usually linked to many genetic syndromes. Early
diagnosis is critical, yet ultrasound (US) examinations often fail to
identify these features. This study presents an AI-driven tool to assist
clinicians in standardizing fetal facial axes/planes in 3D US, reducing
sonographer workload and facilitating the facial evaluation. Our network,
structured into three blocks-feature extractor, rotation and translation
regression, and spatial transformer-processes three orthogonal 2D slices to
estimate the necessary transformations for standardizing the facial planes in
the 3D US. These transformations are applied to the original 3D US using a
differentiable module (the spatial transformer block), yielding a
standardized 3D US and the corresponding 2D facial standard planes. The
dataset used consists of 1180 fetal facial 3D US images acquired between
weeks 20 and 35 of gestation. Results show that our network considerably
reduces inter-observer rotation variability in the test set, with a mean
geodesic angle difference of 14.12 ± 18.27 and an Euclidean angle error of
7.45 ± 14.88. These findings demonstrate the network’s ability to effectively
standardize facial axes, crucial for consistent fetal facial assessments. In
conclusion, the proposed network demonstrates potential for improving the
consistency and accuracy of fetal facial assessments in clinical settings,
facilitating early evaluation of craniofacial anomalies. |
|||||||||||||
|
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
PhysFlow: Skin tone
transfer for remote heart rate estimation through conditional normalizing
flows |
|||||||||||||
J. Comas,
A. Alomar, A. Ruiz and F.M. Sukno Proc. 35th British Machine Vision
Conference, Glasgow, UK, 2024. |
|||||||||||||
In recent years, deep learning methods have shown
impressive results for camera-based remote physiological signal estimation,
clearly surpassing traditional methods. However, the performance and
generalization ability of Deep Neural Networks heavily depends on rich
training data truly representing different factors of variation encountered
in real applications. Unfortunately, many current remote photoplethysmography
(rPPG) datasets lack diversity, particularly in darker skin tones, leading to
biased performance of existing rPPG approaches. To mitigate this bias, we introduce
PhysFlow, a novel method for augmenting skin diversity in remote heart rate
estimation using conditional normalizing flows. PhysFlow adopts end-to-end
training optimization, enabling simultaneous training of supervised rPPG
approaches on both original and generated data. Additionally, we condition
our model using CIELAB color space skin features directly extracted from the
facial videos without the need for skin-tone labels. We validate PhysFlow on
publicly available datasets, UCLA-rPPG and MMPD, demonstrating reduced heart
rate error, particularly in dark skin tones. Furthermore, we demonstrate its
versatility and adaptability across different data-driven rPPG methods. |
|||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
J.C.
Reyes-Hernández, A. Alomar, R. Rubio, G. Piella and F.M. Sukno Proc. 5th International
Conference on Deep Learning Theory and Applications, Dijon, France, 2024. |
|||||||||||||
This study presents an infant-specific face
detection approach that addresses the existing gap in facial detection for
non-adults, where the typical bias is toward adult faces. A new infant faces
dataset was created to enhance Deep Learning (DL) models’ ability to
accurately detect infant faces, comprising over 8,862 images with diverse
orientations. We introduce Oriented Bounding Boxes (OBB) to account for
greater variability in face orientations observed in infants, offering
precise alignment to their orientation, a significant improvement over
traditional Axis-Aligned Bounding Boxes (AABB). Employing the YOLOv8-OBB
architecture, our model is trained and compared against state-of-the-art
models such as RetinaFace and MogFace. The results show that our approach
outperforms state-of-the-art methods in precision and recall, particularly in
non-frontal facial orientations. The proposed infant face detector marks a
major advancement in pediatric face detection technology, offering a robust
foundation for future advancements in medical monitoring and developmental
diagnosis.. |
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|
|
|||||||||||
Deep learning-based
standardisation of the anatomical fetal facial axes in 3D prenatal
ultrasounds |
|||||||||||||
A. Alomar, R.
Rubio, A. Payá, G. Piella, F. Sukno 34th World Congress on Ultrasound in Obstetrics &
Gynecology, Budapest, Hungary, Volume 64, Issue S1 p. 109-109, 2024. |
|||||||||||||
Objectives: We
propose an AI-driven tool designed to assist clinicians in standardising the
detection of facial planes in 3D ultrasound (US) imaging. It aims to minimise
variability across detections while simultaneously mitigating the effects of
interobserver variability in fetal facial assessment. Methods: We used
445 fetal facial 3D US images acquired between 20 and 26 weeks of
gestation. The data was split into 80% for training and 20% for validation.
We defined the three-orthogonal standard planes of the fetus' facial planes
using anatomical landmarks and computed the 3D ground truth (GT)
transformations to achieve alignment with these planes. A deep learning
architecture was trained to take as input 3 orthogonal slices from the 3D US
and output the 3D translation and rotation to achieve the standard anatomical
facial axes. The network is composed of 4 blocks: features extractor,
translation regressor, rotation regressor, and differentiable spatial
transform (see figure 1). To assess the resulting standard planes, the
average error between the estimated 3D transformation by the network and the
3D GT transform was computed. Also, the PSNR and the SSIM between the
estimated and the GT planes were calculated. Results: The
network accurately estimates 3D transformations for standardising the fetus
facial axes, obtaining a translation error of 4.55 mm, a rotation error
of 17.4° degrees, a PSNR of 18.6 dB, and a SSIM of 0.657 in the validation
set. The estimated standard facial slices closely match the GT standard
facial planes. Conclusions: Our
method effectively standardises the fetal facial axes, facilitating the
measurement and assessment of the fetal face to diagnose fetal abnormalities.
Consequently, this tool has the potential to reduce reliance on interobserver
variability and alleviate the time and burden associated with locating these
planes. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
F.M. Sukno, B.D
Kelly, A. Lane, S. Katina, M.A. Rojas, P.F. Whelan and John L Waddington Psychiatry
Research, 342: 116213, 2024. |
|||||||||||||
Audio-Visual Speech Recognition (AVSR) faces the difficult
task of exploiting acoustic and visual cues simultaneously. Augmenting speech
with the visual channel creates its own challenges, e.g. every person has
unique mouth movements, making the generalization of visual models very
difficult. This factor motivates our focus on the generalization of
speaker-independent (SI) AVSR systems especially in noisy environments by
exploiting the visual domain. Specifically, we are the first to explore the
visual adaptation of an SI-AVSR system to an unknown and unlabelled speaker.
We adapt an AVSR system trained in a source domain to decode samples in a
target domain without the need for labels in the target domain. For the
domain adaptation of the unknown speaker, we use Coupled Generative
Adversarial Networks to automatically learn a joint distribution of
multi-domain images. We evaluate our character-based AVSR system on the
TCD-TIMIT dataset and obtain up to a 10% average improvement with respect to
its AVSR system equivalent. |
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|||||||||||||
Accuracy and repeatability
of fetal facial measurements in 3D ultrasound: A longitudinal study |
|||||||||||||
N. González-Aranceta, A. Alomar, R. Rubio, S.
Maya-Enero, A. Payá, G. Piella and F.M. Sukno. Early
Human Development, 193: 106021, 2024. |
|||||||||||||
Objective: Fetal
face measurements in prenatal ultrasound can aid in identifying craniofacial
abnormalities in the developing fetus. However, the accuracy and reliability
of ultrasound measurements can be affected by factors such as fetal position,
image quality, and the sonographer's expertise. This study assesses the
accuracy and reliability of fetal facial measurements in prenatal ultrasound.
Additionally, the temporal evolution of measurements is studied, comparing
prenatal and postnatal measurements. Methods: Three
different experts located up to 23 facial landmarks in 49 prenatal 3D
ultrasound scans from normal Caucasian fetuses at weeks 20, 26, and 35 of
gestation. Intra- and inter-observer variability was obtained. Postnatal
facial measurements were also obtained at 15 days and 1 month
postpartum. Results: Most
facial landmarks exhibited low errors, with overall intra- and inter-observer
errors of 1.01 mm and 1.60 mm, respectively. Landmarks on the nose
were found to be the most reliable, while the most challenging ones were
those located on the ears and eyes. Overall, scans obtained at 26 weeks
of gestation presented the best trade-off between observer variability and
landmark visibility. The temporal evolution of the measurements revealed that
the lower face area had the highest rate of growth throughout the latest
stages of pregnancy. Conclusions: Craniofacial
landmarks can be evaluated using 3D fetal ultrasound, especially those
located on the nose, mouth, and chin. Despite its limitations, this study
provides valuable insights into prenatal and postnatal biometric changes over
time, which could aid in developing predictive models for postnatal
measurements based on prenatal data. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
Deep adaptative spectral
zoom for improved remote heart rate estimation |
|||||||||||||
J. Comas, A. Ruiz and F.M. Sukno 18th IEEE International Conference
on Automatic Face and Gesture Recognition (FG), Istanbul, Turkey, 2024. |
|||||||||||||
Recent advances in remote heart rate measurement,
motivated by data-driven approaches, have notably enhanced accuracy. However,
these improvements primarily focus on recovering the rPPG signal, overlooking
the implicit challenges of estimating the heart rate (HR) from the derived
signal. While many methods employ the Fast Fourier Transform (FFT) for HR
estimation, the performance of the FFT is inherently affected by a limited
frequency resolution. In contrast, the Chirp-Z Transform (CZT), a
generalization form of FFT, can refine the spectrum to the narrow-band range
of interest for heart rate, providing improved frequential resolution and,
consequently, more accurate estimation. This paper presents the advantages of
employing the CZT for remote HR estimation and introduces a novel data-driven
adaptive CZT estimator. The objective of our proposed model is to tailor the
CZT to match the characteristics of each specific dataset sensor,
facilitating a more optimal and accurate estimation of HR from the rPPG
signal without compromising generalization across diverse datasets. This is
achieved through a Sparse Matrix Optimization (SMO). We validate the
effectiveness of our model through exhaustive evaluations on three publicly
available datasets UCLA-rPPG, PURE, and UBFC-rPPG employing both intra- and
cross-database performance metrics. The results reveal outstanding heart rate
estimation capabilities, establishing the proposed approach as a robust and
versatile estimator for any rPPG method. |
|||||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
BabyNet: Reconstructing 3D
faces of babies from uncalibrated photographs |
|||||||||||||
A. Morales, A.
Alomar, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno Pattern
Recognition 139, 109367, 2023 |
|||||||||||||
We present a 3D face reconstruction system that aims
at recovering the 3D facial geometry of babies from uncalibrated photographs,
BabyNet. Since the 3D facial geometry of babies differs substantially from
that of adults, baby-specific facial reconstruction systems are needed.
BabyNet consists of two stages: 1) a 3D graph convolutional autoencoder
learns a latent space of the baby 3D facial shape; and 2) a 2D encoder that
maps photographs to the 3D latent space based on representative features
extracted using transfer learning. In this way, using the pre-trained 3D
decoder, we can recover a 3D face from 2D images. We evaluate BabyNet and
show that 1) methods based on adult datasets cannot model the 3D facial
geometry of babies, which proves the need for a baby-specific method, and 2)
BabyNet outperforms classical model-fitting methods even when a baby-specific
3D morphable model, such as BabyFM, is used. |
|||||||||||||
|
|
|
|||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
An
automatic pipeline for atlas-based fetal and neonatal brain segmentation and
analysis |
|||||||||||||
A. Urru A. Nakaki,
O. Benkarim, F. Crovetto, L. Segalés, V. Comte, N. Hahner, E. Eixarch, E.
Gratacos, F. Crispi, G. Piella and Miguel A. González Ballester Computer
Methods and Programs in Biomedicine 230, 107334, 2023 |
|||||||||||||
Background and Objective: The automatic
segmentation of perinatal brain structures in magnetic resonance imaging
(MRI) is of utmost importance for the study of brain growth and related
complications. While different methods exist for adult and pediatric MRI
data, there is a lack for automatic tools for the analysis of perinatal
imaging. Methods: In
this work, a new pipeline for fetal and neonatal segmentation has been
developed. We also report the creation of two new fetal atlases, and their
use within the pipeline for atlas-based segmentation, based on novel
registration methods. The pipeline is also able to extract cortical and pial
surfaces and compute features, such as curvature, local gyrification index,
sulcal depth, and thickness. Results:
Results show that the introduction of the new templates together with our
segmentation strategy leads to accurate results when compared to expert
annotations, as well as better performances when compared to a reference
pipeline (developing Human Connectome Project (dHCP)), for both early and
late-onset fetal brains. Conclusions: These
findings show the potential of the presented atlases and the whole pipeline
for application in both fetal, neonatal, and longitudinal studies, which
could lead to dramatic improvements in the understanding of perinatal brain
development. |
|||||||||||||
|
|
||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|
||||||||||||
R. Rubio, A.
Alomar, S. Maya, A. Payá, G. Piella and F.M. Sukno 32nd
World Congress in Ultrasound in Obstetrics & Gynecology, London, UK.
Volume 60, Issue S1 p. 106-106, 2022. |
|||||||||||||
Objectives: The aim
of this study is analysing the feasibility and reproducibility of fetal
facial landmarks (lmks) location on 3D ultrasound (US) volumes. Methods: We examined
11 cases of low-risk Caucasian pregnant women. We acquired 3D US volumes at
week 20 and 26. The volumes were automatically segmented (binary
thresholding) using 3DSlicer. Then, two observers located the visible lmks
out of the 23 anatomical lmks considered (figure 1, left), using the best 3D
segmented volumes 2/week/case. Results: We
assessed the feasibility and the inter-observer variability of fetal facial
landmarking at week 20 and 26. Obs1 and Obs2 placed an average of 11,46 and
10,95 lmks/mesh respectively. The lmks error/mesh between observers depends
on the lmk considered (mean 1,96 ± 1,24 mm, range between
3,47 and 0,78 mm). The more reliable lmks are sn, prn, and ls where the
error between observers is lower than 2 mm. The largest discrepancies
are obtained when comparing eye lmks (ex, en). A clear difference in the
visibility of each landmark can be observed (figure 1) between week 20 and
26. At week 26, 9 lmks around the nose are visible in more than 80% of the scans,
whereas only 2 lmks at week 20. Conclusions: To
conclude, 20 weeks 3D US volumes are difficult to landmark, whereas 26 weeks
volumes are easier as finer details are present and more lmks can be located.
Lmks located in 3D US volumes have low inter-observer variability but
insufficient precision and accuracy for facial analysis due to noise and poor
3D US quality. However, a 3D morphable model could play a crucial role in
fetal face analysis at week 26 when more facial lmks can be located and used
to initialise the model. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
Look-alike humans
identified by facial recognition algorithms show genetic similarities |
|||||||||||||
R.S. Joshi, M.
Rigau, C.A García-Prieto, M. Castro de Moura, D. Piñeyro, S. Moran, V.
Davalos, P. Carrión, M. Ferrando-Bernal, I. Olalde, C. Lalueza-Fox, A.
Navarro, C. Fernández-Tena, D. Aspandi, F.M. Sukno, X. Binefa, A. Valencia
and M. Esteller Cell Reports, 40(8):
111257, 2022. |
|||||||||||||
The human face is one of the most visible features
of our unique identity as individuals. Interestingly, monozygotic twins share
almost identical facial traits and the same DNA sequence but could exhibit
differences in other biometrical parameters. The expansion of the world wide
web and the possibility to exchange pictures of humans across the planet has
increased the number of people identified online as virtual twins or doubles
that are not family related. Herein, we have characterized in detail a set of
“look-alike” humans, defined by facial recognition algorithms, for their
multiomics landscape. We report that these individuals share similar
genotypes and differ in their DNA methylation and microbiome landscape. These
results not only provide insights about the genetics that determine our face
but also might have implications for the establishment of other human
anthropometric properties and even personality characteristics. |
|||||||||||||
|
|
|
|||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
A.
Fernandez-Lopez and F.M. Sukno IEEE/ACM
Transactions on Audio, Speech, and Language Processing, 30: 2076-2090,
2022. |
|||||||||||||
The development of Automatic Lip-Reading (ALR)
systems for continuous speech recognition has so far limited their
applicability to English since this is the only language with large-scale
datasets sufficient to train end-to-end ALR systems. In this work, we show
that it is possible to train competitive end-to-end ALR systems in
alternative languages with challenging small-scale data as long as the
appropriate restrictions are made to the learning process of the visual
front-end objective. To this end, we hypothesize that the visual front-end
should be trained in a self-supervised setting, allowing it to target its own
visual units . We specifically define visual units as a collection of
visually similar images constrained by linguistics and provide an algorithmic
implementation to automatically generate them. We show that visual units can
be used to add an intermediate classification task between the visual and
temporal modules that facilitates meaningful learning of visual features and,
as a consequence, reduces the amount of data required to train an end-to-end
ALR system. Additionally, we present a data augmentation strategy for
enriching the temporal context. We synthesize realistic video sequences by
appropriately combining characters-like sub-sequences from existing videos.
We test the proposed ALR system on i) the VLRF dataset, a small-scale
database that is one of the largest in Spanish, and achieve 44.77 % CER and
72.90% WER, which are competitive with the state-of-the-art and significant
for this volume of training material; ii) the TCD-TIMIT dataset, a comparable
medium-scale database in English, where we achieve 36.58% CER and 56.29% WER,
which are also state-of-the-art results on speaker-dependent experiments.. |
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
A. Alomar, A.
Morales, K. Vellvé, A.R. Porras, F. Crispi, M.G. Linguraru, G. Piella and
F.M. Sukno Computer
Methods and Programs in Biomedicine 221, 106893, 2022 |
|||||||||||||
Background and objective: The
fetal face is an essential source of information in the assessment of
congenital malformations and neurological anomalies. Disturbance in early
stages of development can lead to a wide range of effects, from subtle
changes in facial and neurological features to characteristic facial shapes
observed in craniofacial syndromes. Three-dimensional ultrasound (3D US) can
provide more detailed information about the facial morphology of the fetus
than the conventional 2D US, but its use for pre-natal diagnosis is
challenging due to imaging noise, fetal movements, limited field-of-view, low
soft-tissue contrast, and occlusions. Methods: In
this paper, we propose the use of a novel statistical morphable model of
newborn faces, the BabyFM, for fetal face reconstruction from 3D US images.
We test the feasibility of using newborn statistics to accurately reconstruct
fetal faces by fitting the regularized morphable model to the noisy 3D US
images. Results: The
results indicate that the reconstructions are quite accurate in the
central-face and less reliable in the lateral regions (mean point-to-surface
error of 2.35 mm vs 4.86 mm). The algorithm is able to reconstruct the whole
facial morphology of babies from US scans while handle adverse conditions
(e.g. missing parts, noisy data). Conclusions: The
proposed algorithm has the potential to aid in-utero diagnosis for conditions
that involve facial dysmorphology. |
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
Audio-visual
gated-sequenced neural networks for affect recognition |
|||||||||||||
D. Aspandi, F.M. Sukno, B.W. Schuller and X Binefa IEEE
Transactions on Affective Computing 14 (3), 2193-2208, 2022. |
|||||||||||||
The interest in automatic emotion recognition and
the larger field of Affective Computing has recently gained momentum. The
current emergence of large, video-based affect datasets offering rich
multi-modal inputs facilitates the development of deep learning-based models
for automatic affect analysis that currently holds the state of the art.
However, recent approaches to process these modalities cannot fully exploit
them due to the use of oversimplified fusion schemes. Furthermore, the
efficient use of temporal information inherent to these huge data are also largely
unexplored hindering their potential progress. In this work, we propose a
multi-modal, sequence-based neural network with gating mechanisms for Valence
and Arousal based affect recognition. Our model consists of three major
networks: Firstly, a latent-feature generator that extracts compact
representations from both modalities that have been artificially degraded to
add robustness. Secondly, a multi-task discriminator that estimates both
input identity and a first step emotion quadrant estimation. Thirdly, a
sequence-based predictor with attention and gating mechanisms that
effectively merges both modalities and uses this information through sequence
modelling. In our experiments on the SEMAINE and SEWA affect datasets, we
observe the impact of both proposed methods with progressive increase in
accuracy. We further show in our ablation studies how the internal attention
weight and gating coefficient impact our models’ estimates quality. Finally,
we demonstrate state of the art accuracy through comparisons with current
alternatives on both datasets.. |
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
|
|||||||||||||
A. Alomar, A.
Morales, A.R. Porras, M.G. Linguraru, G. Piella and F.M. Sukno Proc. 30th International Conference in
Central Europe on Computer Graphics, Visualization and Computer Vision, Plzen, Czech
Republic, pp 109 – 118, 2022 |
|||||||||||||
Diagnosis of craniofacial conditions is shifting
towards pre- and peri-natal stages, since early assessment has shown to be
crucial for the effective treatment of functional and developmental aspects
of children. 3D Morphable Models are a valuable tool for such evaluation.
However, limited data availability on 3D newborn geometry, and highly
variable imaging environments, challenge the construction of 3D baby face
models. Our hypothesis is that constructing a bi-linear baby face model that
allows identity and expression decoupling, enables to improve craniofacial
and brain function assessments. Thus, given that adult and infants facial
expression configurations are very similar and that 3D facial expressions in
babies are difficult to be scanned in a controlled manner, we propose
transferring the facial expressions from the available FaceWarehouse (FW)
database to baby scans, to construct a baby-specific bi-linear expression
model. First, we defined a spatial mapping between the BabyFM and the FW.
Then, we propose an automatic neutralization to remove the expressions from
the facial scans. Finally, we apply expression transfer to obtain a complete
data tensor. We test the performance and generalization of the resulting
bi-linear model with a test set. Results show that the obtained model allow
us to successfully and realistically manipulate facial expressions of babies
while keeping them decoupled from identity variations. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
Efficient remote
photoplethysmography with temporal derivative modules and time-shift
invariant loss |
|||||||||||||
J. Comas, A
Ruiz and F.M. Sukno Proc. IEEE/CVF Conference on Computer Vision
and Pattern Recognition (CVPR) Workshops, pp. 2182-2191, New Orleans,
Louisiana, USA, 2022. |
|||||||||||||
We present a lightweight neural model for remote
heart rate estimation focused on the efficient spatio-temporal learning of
facial photoplethysmography (PPG) based on i) modelling of PPG dynamics by
combinations of multiple convolutional derivatives, and ii) increased
flexibility of the model to learn possible offsets between the facial video
PPG and the ground truth. PPG dynamics are modelled by a Temporal Derivative
Module (TDM) constructed by the incremental aggregation of multiple
convolutional derivatives, emulating a Taylor series expansion up to the
desired order. Robustness to ground truth offsets is handled by the
introduction of TALOS (Temporal Adaptive LOcation Shift), a new temporal loss
to train learning-based models. We verify the effectiveness of our model by
reporting accuracy and efficiency metrics on the public PURE and UBFC-rPPG
datasets. Compared to existing models, our approach shows competitive heart
rate estimation accuracy with a much lower number of parameters and lower
computational cost. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|
||||||||||||
|
|||||||||||||
A. Morales, G. Piella and F.M. Sukno Computer
Science Review, 40(5): 100400, 2021. |
|||||||||||||
Recently, a lot of attention has been focused on the
incorporation of 3D data into face analysis and its applications. Despite
providing a more accurate representation of the face, 3D facial images are
more complex to acquire than 2D pictures. As a consequence, great effort has
been invested in developing systems that reconstruct 3D faces from an
uncalibrated 2D image. However, the 3D-from-2D face reconstruction problem is
ill-posed, thus prior knowledge is needed to restrict the solutions space. In
this work, we review 3D face reconstruction methods proposed in the last
decade, focusing on those that only use 2D pictures captured under
uncontrolled conditions. We present a classification of the proposed methods
based on the technique used to add prior knowledge, considering three main
strategies, namely, statistical model fitting, photometry, and deep learning,
and reviewing each of them separately. In addition, given the relevance of
statistical 3D facial models as prior knowledge, we explain the construction
procedure and provide a list of the most popular publicly available 3D facial
models. After the exhaustive study of 3D-from-2D face reconstruction
approaches, we observe that the deep learning strategy is rapidly growing
since the last few years, becoming the standard choice in replacement of the
widespread statistical model fitting. Unlike the other two strategies,
photometry-based methods have decreased in number due to the need for strong
underlying assumptions that limit the quality of their reconstructions
compared to statistical model fitting and deep learning methods. The review
also identifies current challenges and suggests avenues for future research. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
A. Alomar, A.
Morales, K. Vellve, A.R. Porras, F. Crispi, M.G. Linguraru, G. Piella and
F.M. Sukno Proc. 18th International Joint Conference on
Computer Vision, Imaging and Computer Graphics Theory and Applications, Vol 4 VISAPP,
pp. 615–624, 2021. |
|||||||||||||
The fetal face contains essential information in the
evaluation of congenital malformations and the fetal brain function, as its
development is driven by genetic factors at early stages of embryogenesis.
Three-dimensional ultrasound (3DUS) can provide information about the facial
morphology of the fetus, but its use for prenatal diagnosis is challenging
due to imaging noise, fetal movements, limited field-of-view, low soft-tissue
contrast, and occlusions. In this paper, we propose a fetal face reconstruction
algorithm from 3DUS images based on a novel statistical morphable model of
newborn faces, the BabyFM. We test the feasibility of using newborn
statistics to accurately reconstruct fetal faces by fitting the regularized
morphable model to the noisy 3DUS images. The algorithm is capable of
reconstructing the whole facial morphology of babies from one or several
ultrasound scans to handle adverse conditions (e.g. missing parts, noisy
data), and it has the potential to aid in-utero diagnosis for conditions that
involve facial dysmorphology. |
|||||||||||||
|
|||||||||||||
|
|||||||||||||
|
|||||||||||||
Further activities related to the project |
|||||||||||||
The research activities in this project have
contributed directly or indirectly to the development of undergrad and
graduate students, as briefly summarized below: |
|||||||||||||
|
Antonia Alomar Adrover. Perinatal 3D
face reconstruction and analysis for early diagnosis of craniofacial
anomalies. PhD Thesis, Universitat Pompeu Fabra.
Departament de Tecnologies de la Informació i les Comunicacions (In progress,
2021 – 2025), Supervisors: F.M. Sukno and G. Piella |
||||||||||||
|
Joaquim Comas Martinez. Deep facial
analysis for remote emotion estimation in real world scenarios PhD Thesis,
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les
Comunicacions (in progress, 2021 – 2025). Supervisors: F.M. Sukno and A. Ruiz |
||||||||||||
|
Mireia Masías Bruns. Normalizing flows for
Neurorimaging Research PhD Thesis, Universitat Pompeu Fabra. Departament
de Tecnologies de la Informació i les Comunicacions (04-03-2021).
Supervisors: G. Piella and M.A. Gonzalez Ballester |
||||||||||||
|
María de Aracerli Morales. Statistical modelling
of the baby face for 3D face reconstruction PhD Thesis,
Universitat Pompeu Fabra. Departament de Tecnologies de la Informació i les
Comunicacions (In progress, 2018 – 2022), Supervisors: F.M. Sukno and G.
Piella |
||||||||||||
|
Marc Aguilar Velazquez. Baby face generation and edition
through text-guided diffusion models Bachelor Thesis, Mathematical
Engineering in Data Science, Universitat Pompeu Fabra (2024). Supervisors: A.
Alomar, G. Piella and F.M. Sukno |
||||||||||||
|
Xavier Vives i Sanchez. Base de dades
multimodal pel reconeixement d’emocions a partir d’expressions facials i
senyals fisiologiques. Bachelor Thesis, Mathematical Engineering in Data
Science, Universitat Pompeu Fabra (2024). Supervisors: J. Comas, F.M. Sukno |
||||||||||||
|
Judith Recober Martín. Biomechanical regularization in a
deep learning network for a fetal MRI registration and segmentation pipeline.
Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2023).
Supervisors: V. Comte, G. Piella. M.A. Gonzalez. |
||||||||||||
|
Sofia
Cárdenas Linares. Reconocimiento de expresiones en máscaras antiguas Bachelor Thesis, Audiovisual Systems
Engineering, Universitat Pompeu Fabra (2023). Supervisor: F.M. Sukno, A. Orsingher |
||||||||||||
|
Alexander Joel Vera Moncayo. A
high-quality dataset for emotion recognition based on contactless physiological signals Bachelor Thesis, Audiovisual Systems
Engineering, Universitat Pompeu Fabra (2023). Supervisors: J. Comas, F.M. Sukno |
||||||||||||
|
Alba Puyuelo Citoler. The face as a window to the brain:
Development of a pipeline for facial segmentation from magnetic resonance
scans. Bachelor Thesis, Biomedical Engineering, Universitat Pompeu
Fabra (2022). Supervisors: G. Piella, F.M. Sukno |
||||||||||||
|
Nerea González Aranceta. On the basis of impaired functional
connectivity in psychosis: the role of structural substrate and
neurometabolites contribution Bachelor Thesis, Biomedical Engineering,
Universitat Pompeu Fabra (2022). Supervisors: M. Masias, G. Piella. |
||||||||||||
|
Pablo Mesa. Fetus Facial Landmark
Detection on 3D Ultrasounds with Deep Learning Master Thesis, Computational Biomedical
Engineering, Universitat Pompeu Fabra (2022). Supervisors: A. Alomar, G. Piella, F.M.
Sukno |
||||||||||||
|
Yadira Ronquillo. Continuous Lip
Reading in Spanish Bachelor Thesis, Biomedical Engineering,
Universitat Pompeu Fabra (2022). Supervisors: F.M. Sukno, A. Fernandez-Lopez |
||||||||||||
|
Maria Pujol Gil. Cerebral maturation
measures on autism spectrum disorder and attention deficit/hyperactivity
disorder patients based on MRI scans. Bachelor Thesis, Biomedical
Engineering, Universitat Pompeu Fabra (2022). Supervisors: G. Piella, D.
Paretto |
||||||||||||
|
Daniel
Mateos Manjón. Procesamiento de imágenes en juegos de Atari para facilitar
el aprendizaje reforzado. Bachelor
Thesis, Audiovisual Systems Engineering, Universitat Pompeu Fabra (2021). Supervisors: A. Jonsson, F.M. Sukno |
||||||||||||
|
Adrian
Blanco Barco. Procesamiento de imagen para Inteligencia Artificial Bachelor Thesis, Audiovisual Systems
Engineering, Universitat Pompeu Fabra (2021). Supervisors: A. Jonsson, F.M. Sukno |
||||||||||||
|
Anna Harris Martinez. Brain
alterations between attention deficit/hyperactivity disorder and autism
spectrum disorder. A volumetric, structural and functional study.
Bachelor Thesis, Biomedical Engineering, Universitat Pompeu Fabra (2021).
Supervisors: G. Piella, D. Paretto |
||||||||||||
|
Angel Bazan. Use of unsupervised
techniques for brain MRI stratification Bachelor Thesis, Biomedical Engineering,
Universitat Pompeu Fabra (2021). Supervisors: M. Masias, G. Piella |
||||||||||||
|
Antonia Alomar Adrover. Automatic 3D
segmentation algorithm of the fetal face using 3D ultrasound imaging.
Master Thesis, Computational Biomedical Engineering, Universitat Pompeu Fabra
(2021). Supervisors: A. Morales, G. Piella, F.M.
Sukno |
||||||||||||
|
|||||||||||||
Acknowledgements |
|||||||||||||
The Principal Investigators, Prof. Gemma Piella
& Dr. Federico Sukno, would like to thank all those involved in the
activities listed above, and to the Ministry of Science, Innovation and
Universities, who funded this project through grant PID2020-114083GB-I00 |
|
||||||||||||
|
|||||||||||||