WWW.ABSTRACT.DISLIB.INFO
FREE ELECTRONIC LIBRARY - Abstracts, online materials
 
<< HOME
CONTACTS



Pages:   || 2 |

«Abstract Fitting a single generic AAM on an unseen face (that is not in the training set) under any pose and expression is very difficult. The ...»

-- [ Page 1 ] --

Pools of AAMs: Towards Automatically Fitting

any Face Image

Julien Peyras Adrien Bartoli Samir Khoualed

LASMEA, Clermont-Ferrand, France

name.surname@gmail.com

Abstract

Fitting a single generic AAM on an unseen face (that is not in the training set) under any pose and expression is very difficult. The variability of

the data is so high that the fitting process usually gets stuck into one of the

numerous local minima. We show that a solution to this problem consists

to separate the variability sources. We build a pool of specialized AAMs.

Each AAM is trained over multiple identities, all shown under the same pose and expression. We then retain the AAM that shows the smallest residual error when fitted to the input image. The fitting obtained in this manner is very accurate on unseen faces. The ultimate goal is to automatically train a person-specific AAM. In addition, the pool of specialized AAMs allows us to recognize the face pose and expression at each frame of the video with good performances. The proposed method has potential applications in Human Computer Interaction and driving surveillance, to name just but a few.

1 Introduction The problem of face analysis in still images and videos has been extensively studied for years. This intense research activity finds its motivation in the possibility to set up a large range of applications in the medical, psychological and linguistic fields (cognitive studies, expression transfer on an avatar, etc. ). Face analysis is a difficult topic since face images vary in identity, pose and expression. The sought-after model should be able to automatically and reliably describe previously unseen faces under any pose and expression. We describe the two most promising approaches.

The first one is Bartlett et al.’s machine learning based expression analysis solution proposed in [1]. Several classifiers are trained for face and eye detection, as well as for the presence and intensity of particular Action Units. These are the elementary deformations occuring on a face, as described by Ekman’s Facial Action Coding System [6]. This method is probably the best performing one in the literature for expression analysis on unseen faces (faces that are not explicitly learnt by the classifiers). The method is non model-based. This makes it difficult to retrieve the shape, and so restrain the range of possible applications.

The second established approach is the Active Appearance Model (AAM) proposed by Cootes et al. [3]. An ad hoc face AAM is trained on manually labeled images, so as to learn the shape and appearance bases. An optimization process is used to fit the AAM on an input image: the shape and appearance coefficients of the model are tuned until the model instance matches the input picture. Retrieving the face shape is important for many video post processing systems. Obviously the performance of such systems is directly related to the quality of the face shape description, i.e., the fitting accuracy is crucial.

As Gross et al. [7] first pointed out, it is important to distinguish between two situations, providing two different kinds of achievable fitting accuracy:

• the person-specific context, where the fitted face has been explicitly learnt by the model. The fitting accuracy is usually very good in this context, and reliable for post processing systems. In [8], Lucey et al. use person-specific AAMs to retrieve the face shape and successfully classify facial deformations into Action Units.

• the person-generic context, where the fitted face is not in the training set. As first shown by Gross et al. in [7], the fitting process is much harder than in the personspecific context. In [10], Peyras et al. showed with carefully chosen experiments that fitting an unseen face with an AAM is much less accurate than fitting a face that belongs to the set of images used to train the model. They explained the reason for this: in the generic context, the appearance counterpart of the model cannot fully explain the appearance of the face in the input image. As an unfortunate consequence, the minimum error of the cost function corresponds to a biased position of the model. Even when initialised in the best possible position (the ground-truth shape), the AAM drifts away.

The problem of fitting unseen faces is a corner-stone for an extended amount of applications. As of today, no method have proven able to accurately fit previously unseen faces under a wide range of poses and expressions. AAMs appear to provide an interesting basis to face this problem. One could think that adding more training data would increase the ability of the model to generalize to unseen faces. Indeed, this ability increases with the amount of training data. In practice however, the higher complexity of the AAM makes its fitting unreliable because this induces numerous local minima in the cost function. In other words, the model is so flexible that it ‘explains’ spurious non face solutions in the image. As a consequence, the solution for reliable and accurate fitting

must combine these two contradictory conditions:

• the complexity of an AAM must be kept as low as possible so as to preserve a large convergence basin and be able to find the global cost minimum,





• the range of face images that the AAM can explain must be large, so that the global cost minimum matches the sought after solution.

The first condition is satisfied by limiting the size of the training set while the second one requires to expand the training set. To bypass such a contradiction, we propose to separate the sources of variability within the training data. Instead of considering the face as an object that varies in identity, pose and expression, we see it as a collection of objects that vary in identity only: each object has a constant pose and expression. In this view, an AAM must model only one of the three sources of variability: identity, so as to fit a variety of unseen faces under the same pose and expression. We say that such an AAM is specialized to a particular pose and expression pair. To deal with many poses and facial deformations, we train a pool of specialized AAMs.

Contribution. We showed in [10] that fitting an unseen face with local models increases the generalization ability and the fitting accuracy in comparison to global models covering all facial features. The fitting bias is reduced to a point where the fitting accuracy on unseen faces is equivalent to the accuracy of manual labels. Following this insight, we design two categories of specialized AAMs that locally model the face: the upper AAMs, built to fit the eyes and eyebrows, and the lower AAMs, designed to fit the mouth. This also presents the advantage to model separately the possible combinations of facial deformations. Our strategy consists to run all upper and lower AAMs on one input picture.

For each category we keep the AAM presenting the smallest residual error. This AAM is expected to be the most accurately fitted on the face, and should represent the current pose and expression of this face. Consequently, we expect our method to automatically provide accurate labels on unseen faces under varying expression and pose, and also to correctly classify the pose and expression at any frame of a video. The process is presumably slow and costly. This is often not a limitation: the long off-line training is performed only once, on a video of a person who frequently uses the device at hand. As an example, communication with personal-computers and car driver monitoring systems can be

equipped with this technology. As two important contributions, we show that:

• good fitting accuracy, good robustness to position perturbation and high classification rates are obtained,

• the obtained labels can be used to automatically train a person-specific AAM, which is able to fit the face and classify its expression in real-time.

Organization. Section 2 reviews the literature and introduces the AAMs. Section 3 presents the specialized AAMs and the pose and expression database we have used to perform our experiments. In section 4 we show experimental results on still images in a leave-one-identity-out fashion, and on a video where an unseen person displays a series of poses and expressions. We compare the performance of the specialized AAMs against the classical AAM learning all data. Section 5 gives a conclusion and our perspectives. The good fitting results of the specialized models will allow us to build a person-specific AAM for real-time tracking and pose and expression classification on the just-learnt person.

2 Background

2.1 Previous Work The concept of fitting several models is not new: Cootes et al. used one model for each face pose in [4]. However, despite the advantages it presents, this solution were not pursued afterward.

The AAM is not the unique face fitting solutions in the literature. We review some others. Cristinacce et al. proposed a competitive template matching solution called Constrained Local Models in [5], which were further studied by Wang et al. in [11]. This solution exhibits better fitting results than AAMs. Note that these methods can be embedded as the specialized models in our framework. Indeed, pools would increase the discriminability between correct and wrong alignments, which is an important ability when aligning objects with a very high and complex range of variability.

The 3DMM (3D Morphable Model) presented by Vetter et al. in [2] can recover the 3D structure of a face from a single picture. This model is too heavy to automatically and reliably fit faces under any pose and expression. Here too, the specialization of multiple 3DMMs could be help to improve the results.

2.2 Background on the AAM An AAM combines two linear subspaces, one for the shape and one for the appearance.

They are learnt from a labeled set of training images [3]. A certain percentage of the whole training set shape and appearance variance is kept. As a rule of thumb, [10] showed that keeping 60% shape variance and 100% appearance variance is ‘optimal’ in the persongeneric context. We therefore keep 60% shape and 95% appearance variance, so as to keep the AAM size reasonable.

Fitting an AAM consists to find the shape and appearance instances that make the residual error between the image and the synthesized model as small as possible. We use Baker and Matthews’ optimization framework [9] with the Simultaneous Inverse Compositional Algorithm.

3 A Pool of Specialized AAMs

3.1 The Concept In [10], both global and local models are specialized on the frontal pose and neutral expression. Since stuffing various poses and expressions into a single AAM spoils its fitting performance, we extend here the concept of specialized AAM. The idea is to build a pool of AAMs, each being specialized on a particular pose and expression pair. The whole pool would then encompass a continuum of poses and expressions.

Each specialized AAM is built over N different identities, giving the AAM a certain ability to explain unseen faces. Unfortunately, none of the publicly available face databases present a large range of facial deformations under several head poses and an homogeneous illumination. For this reason, we had to build our own pose and expression database that we present in the rest of the section.

3.2 The Pose and Expression Database Our current database has 15 identities taken under 3 views (frontal, 10◦ and 20◦ in azimut) displaying 21 facial (upper or lower) deformations. We kept the illumination homogeneous. All pictures (63 per identity) were manually labeled thoroughly to maximize the label accuracy. Taking pictures and labeling them represents about 3 hours of work per identity. The facial deformations we use are showed in figure 1. Figure 2 shows a sample of people from the database.

It is obvious that more people, more poses and more deformations could be included in the database to fit more unseen people under a less restricted amount of poses and

expressions. However one faces several difficulties:

• it is time consuming and tedious to label images with high accuracy, as this present study requires.

Figure 1: Facial deformations represented in the database. The manually placed landmarks represent the vertices used for training or fitting (for testing purposes). The deformation number is indicated on top of each of the thumbnails. Each deformation is meant to represent some Action Units or a particular combination of them [6].

–  –  –

0◦ 10◦ 20◦ Figure 2: 5 of the 15 identities of the database for all poses and deformation n◦ 5.

• the appearance and deformation of faces are wide-ranging. The set of people forming the database must capture this diversity, in quality and quantity.

• the quality of the deformations is very important to prevent from badly defined deformation classes and their possible overlaps. The selected people composing the database should therefore be actors or possess some particular talents to perform facial deformations on demand.

4 Evaluation and Tests

4.1 Leave-One-Identity-Out Test The test consists to train a pool of specialized AAMs on N identities and to operate the fitting on one of the remaining faces. In this way, the identity we fit is unknown from the AAMs. We perform this leave-one-identity-out test 15 times. N can at most be 14. For each test identity, 63 images (21 expressions under 3 poses) must be fitted with all upper or all lower specialized AAMs. For each image, we run all AAMs and keep as the winner the one that makes the smallest residual error at convergence, after 30 iterations. Our goal

is to assess the two following points:

• the fitting accuracy, i.e., the quality of each label position on the face at convergence: we measure it by comparison with manual labels taken as a reference,

• the basin of convergence, i.e., the ability to cope with perturbed initializations,

• the classification rate, i.e., the frequency of correct correspondence between the pose and expression of the winning AAM and the true pose and expression.



Pages:   || 2 |


Similar works:

«Padres Press Clips Tuesday, September 13, 2016 Article Source Author Page Ailing Clemens leads Padres’ shutout of Giants MLB.com Cassavell/Wise 2 Sanchez stars with homer against former team MLB.com 5 Cassavell/Collazo Venezuelan-heavy lineup sets Padres record MLB.com Cassavell 7 Richard looks to continue success in Padres’ rotation MLB.com Cassavell 8 Bethancourt’s season over; Ross’ return in doubt MLB.com Cassavell 9 Renfroe aims to power El Paso in Game 1 of PCL Finals MLB.com...»

«Issue 3, Spring 2002 http://seelrc.org/glossos/ The Slavic and East European Language Resource Center glossos@seelrc.org Ljiljana Šarić On the semantics of the “dative of possession” in the Slavic languages: An analysis on the basis of Russian, Polish, Croatian/Serbian and Slovenian examples 1. Introduction Studies of case in the framework of Cognitive Grammar (see Wierzbicka, 1986, 1988, Janda 1993, Dąbrowska 1997, for some of the analyses of the Slavic data) have shown that the meaning...»

«PAGAN CHRISTMAS The Plants, Spirits, and Rituals at the Origins of Yuletide Christian Rätsch and Claudia Müller-Ebeling Translated from the German by Katja Lueders and Rafael Lorenzo Inner Traditions Rochester, Vermont CONTENTS Preface The Ethnobotany of Christmas Traditions, Rituals, and Customs Christmas Songs of the Hard Winter A Pagan Feast Red and White: Colors of Christmas The Darkness of Midwinter Sacred Nights, Smudging Nights, and Incense Wotan and the Wild Hunt From the Shamanic...»

«RESHAPING CARE FOR OLDER PEOPLE A PROGRAMME FOR CHANGE 2011 – 2021 RESHAPING CARE FOR OLDER PEOPLE: A PROGRAMME FOR CHANGE 2011 – 2021 MINISTERIAL AND COSLA FOREWORD 1. PURPOSE 2. WHY WE NEED TO CHANGE The current landscape The challenge of a growing older population Key messages 3. OUR VISION – WHAT SUCCESS WILL LOOK LIKE What people want from care and support The outcomes we want to achieve Our commitments 4 WHAT WE WILL DO Co-production and community capacity building Creating the...»

«MICHAEL KORS HOLDINGS LTD FORM 424B4 (Prospectus filed pursuant to Rule 424(b)(4)) Filed 12/15/11 Telephone (852) 2371-8634 CIK 0001530721 Symbol KORS SIC Code 3100 Leather & Leather Products Industry Apparel/Accessories Sector Consumer Cyclical http://www.edgar-online.com © Copyright 2014, EDGAR Online, Inc. All Rights Reserved. Distribution and use of this document restricted under EDGAR Online, Inc. Terms of Use. Table of Contents Filed Pursuant to Rule 424(b)(4) Registration Nos....»

«Dearborn Partners Rising Dividend Fund Class A Shares (Ticker Symbol: DRDAX) Class C Shares (Ticker Symbol: DRDCX) Class I Shares (Ticker Symbol: DRDIX) Prospectus June 28, 2016 The Securities and Exchange Commission (“SEC”) has not approved or disapproved of these securities or determined if this Prospectus is truthful or complete. Any representation to the contrary is a criminal offense. Dearborn Partners Rising Dividend Fund A series of Trust for Professional Managers (the “Trust”)...»

«United States Court of Appeals FOR THE DISTRICT OF COLUMBIA CIRCUIT Argued September 9, 2016 Decided December 9, 2016 No. 15-5238 RONALD M. SMITH, APPELLANT v. UNITED STATES OF AMERICA, ET AL., APPELLEES Appeal from the United States District Court for the District of Columbia (No. 1:12-cv-01679) Gregory L. Lattimer argued the cause and filed the briefs for appellant. W. Mark Nebeker, Assistant U.S. Attorney, argued the cause for appellees. With him on the brief were R. Craig Lawrence and...»

«PUBLIC ACCESS COUNSELOR STATE OF INDIANA JOSEPH B. HOAGE Indiana Government Center South MITCHELL E. DANIELS, JR., Governor 402 West Washington Street, Room W470 Indianapolis, Indiana 46204-2745 Telephone: (317)233-9435 Fax: (317)233-3091 1-800-228-6013 www.IN.gov/pac March 15, 2012 Seth E. Anderson 126 W. Lamont Road Huntington, Indiana 46750 Re: Formal Complaint 12-FC-44; Alleged Violation of the Access to Public Records Act by the Huntington County Commissioners Dear Mr. Anderson: This...»

«APPLICATION OF THE ANALYTIC HIERARCHY PROCESS AND GIS IN SUSCEPTIBILITY MAPPING OF EARTHQUAKE-TRIGGERED LANDSLIDES: A CASE STUDY OF BOHOL ISLAND, CENTRAL PHILIPPINES Noelynna T. Ramos, Kathrine V. Maxwell, Betchaida D. Payot and Nichole Anthony D. Pacle National Institute of Geological Sciences, College of Science, University of the Philippines, Diliman, Quezon City, 1101, Philippines Email: noelynna.ramos@up.edu.ph Email: kvmaxwell@up.edu.ph Email: bdpayot@gmail.com Email: nikkopacle@gmail.com...»

«www.intermountainresidential.org Jill Richards, BA, Admissions Direct: 406-457-4778 or jillr@intermountain.org Fax: 406-442-7949 (Attn: Jill Richards) RESIDENTIAL APPLICATION Child’s Full name: Date:_ _ Last First Middle Social Security # _ Date of Birth Sex Race _ Height Weight Religious Preference _ Eye Color Hair Color Identifying Characteristics/scars _ Tribal Affiliation Tribal Enrollment Number Referral Source: _ Name _ Address _ _ Phone Child’s current location or placement: _ Name...»

«3/20/2015 The Difference Sameness Makes: Objectification, Sex Work, and Queerness ­ Cahill ­ 2014 ­ Hypatia ­ Wiley Online Library Log in / Register Go to old article view Hypatia Volume 29, Issue 4 Fall 2014  Pages 840–856 Original Article The Difference Sameness Makes: Objectification, Sex Work, and Queerness Ann J. Cahill First published: 8 September 2014 Full publication history DOI: 10.1111/hypa.12111 View/save citation Cited by:...»

«P3C 2016 Conference Program Day 1: MONDAY, MARCH 7, 2016 Pre-Conference Activities 8:00 AM – 1:30 PM Conference Check In & Exhibitor Set-Up (Expo Hall) 9:30 AM – 11:30 AM P3 Hits the Highway: A Tour of Dallas’s LBJ Express Advanced Registration Required (Meet in the lobby of the Sheraton Hotel at 9:15 AM) Before the conference kicks off, come see one of the largest publicprivate partnership transportation projects in Texas! The LBJ Express project team will be hosting a limited number of...»





 
<<  HOME   |    CONTACTS
2017 www.abstract.dislib.info - Abstracts, online materials

Materials of this site are available for review, all rights belong to their respective owners.
If you do not agree with the fact that your material is placed on this site, please, email us, we will within 1-2 business days delete him.