Code & Data

AV@CAR

Multichannel audio-visual database including recordings in driving conditions. The database is free and includes manual annotations for in-studio images under various head poses avatcar.atspace.eu

The Visual Lip Reading Feasibility (VLRF) Database

The Visual Lip-Reading Feasibility (VLRF) database is designed with the aim to contribute to research in visual only speech recognition. A key difference of the VLRF database with respect to existing corpora is that it has been designed from a novel point of view: instead of trying to lip-read from people who are speaking naturally (normal speed, normal intonation,...), we propose to lip-read from people who strive to be understood.

We recruited 24 adult volunteers (3 male and 21 female). Each participant was asked to read 25 different sentences, from a total pool of 500 sentences containing between 3 and 12 words each. The sentences were unrelated between them to avoid that lip-readers could benefit from conversation context. The camera recorded a close up shot at 50 fps with a resolution of 1280x720 pixels and audio at 48 kHz mono with 16-bit resolution.

The database is freely available for research purposes. It includes the following: a) the audio-visual recordings; b) the text of the uttered sentences; c) the phonetic transcription of the uttered sentences. To obtain a copy of the database, please download the License Agreement listed below and send a signed copy to the following e-mail: vlrf.database@upf.edu (vlrf “dot” database “at” upf “dot” edu).

For additional information, please refer to the following publication:

· A. Fernandez-Lopez, O. Martinez and F.M. Sukno. Towards estimating the upper bound of visual-speech recognition: The Visual Lip-Reading Feasibility Database. In Proc. 12th IEEE Conference on Automatic Face and Gesture Recognition, Washington DC, USA, 2017.

VLRF License Agreement

Manual annotations for first 100 scans of FRGC database

If using these annotations please cite the following work, where we also analyze the consistency of the new annotations in relation to previously available ones:

· F.M. Sukno, J.L. Waddington and P.F. Whelan. Compensating inaccurate annotations to train 3D facial landmark localization models. In Proc. FG Workshop on 3D Face Biometrics, Shanghai, China, 2013.

Details about the dataset

Manual annotations

SRILF 3D Face Landmarker

Free 3D face landmarking software (Windows binaries). The goal of this command-line application is to locate facial landmarks on a 3D scan in a fully automatic manner. The implementation, based on the SRiLF algorithm, uses shape and descriptor statistics which are encoded in a configuration model that controls the behavior of the landmaker.

	Download SRILF 3D Face Landmarker v1.0
	User guide of the landmarker
	Description of the SRILF algorithm
Some examples of landmark localization using SRILF v1.0:

More examples of SRILF (on FRGC and Bosphorus databases)
Demonstration video of scanning (with Hand-held laser device) and landmarking

Configuration models for SRILF 3D Face Landmarker

The configuration model allows controlling several aspects of the SRILF 3D Face Landmarker software, including which (and how many) landmarks are targeted and the expected type of input surface.

The configuration model srilf3dFL_HHLaser_HQ.cfg is provided with the software by default. It has been created from a training set of facial surfaces scanned with a hand-held laser device, with a resolution between 1mm and 2mm, with no significant holes or missing parts.The subjects that were scanned were mainly Western-European, had their eyes closed (because of laser safety) and were instructed to pose with neutral facial expression.

For optimal performance, input surfaces should meet the above specifications. In case the input surfaces do not meet the requirements of the provided model, it is recommended to either: i) Pre-process the input surfaces appropriately (if possible); or ii) Contact us to check if there is a more suitable model for your needs. Models for the FRGC and Bosphorus databases are also available.

	Default model: Congiuration model trained on Hand-held laser data, targeting 12 facial landmarks; see User Guide for details.
	FRGCv1 model: Configuration model targeting 14 landmarks, trained with facial scans from the FRGCv1 database. See video examples here.
	FRGCv2 model: Configuration model targeting 14 landmarks, trained with facial scans from the FRGCv2 database. See video examples here.
	Bosphorus model: Configuration model targeting 14 landmarks, trained with facial scans from the Bosphorus database. See snapshot examples here.