Advanced
Multi-classifier Fusion Based Facial Expression Recognition Approach
Multi-classifier Fusion Based Facial Expression Recognition Approach
KSII Transactions on Internet and Information Systems (TIIS). 2014. Jan, 8(1): 196-212
Copyright © 2014, Korean Society For Internet Information
  • Received : October 30, 2013
  • Accepted : December 23, 2014
  • Published : January 30, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Xibin Jia
Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
Yanhua Zhang
Beijing Municipal Key Laboratory of Multimedia and Intelligent Software Technology, Beijing University of Technology, Beijing, 100124, China
David Powers
School of Computer Science, Engineering and Mathematics, Flinders University of South Australia, Adelaide, Australia
Humayra Binte Ali
School of Computer Science, Engineering and Mathematics, Flinders University of South Australia, Adelaide, Australia

Abstract
Facial expression recognition is an important part in emotional interaction between human and machine. This paper proposes a facial expression recognition approach based on multi-classifier fusion with stacking algorithm. The kappa-error diagram is employed in base-level classifiers selection, which gains insights about which individual classifier has the better recognition performance and how diverse among them to help improve the recognition accuracy rate by fusing the complementary functions. In order to avoid the influence of the chance factor caused by guessing in algorithm evaluation and get more reliable awareness of algorithm performance, kappa and informedness besides accuracy are utilized as measure criteria in the comparison experiments. To verify the effectiveness of our approach, two public databases are used in the experiments. The experiment results show that compared with individual classifier and two other typical ensemble methods, our proposed stacked ensemble system does recognize facial expression more accurately with less standard deviation. It overcomes the individual classifier’s bias and achieves more reliable recognition results.
Keywords
1. Introduction
F acial expression is a primary means of conveying social information between humans, and is putatively independent of race, gender and age [1] . The good facial expression recognition is to achieve similar levels of effectiveness for Human-Computer Interaction (HCI), which is most effective when it’s face-to-face between natural human beings. So, facial expression plays an important role in interpersonal communication and is explored using techniques from pattern recognition, computer vision, Psychology and Linguistics.
In 1971, Ekman and Friesen had proposed 6 basic facial expressions [2] , being anger, disgust, fear, happiness, sadness and surprise which can be viewed as a K-class classification problem with K=6 (or 7 if Neutral is included). Most researchers classify facial expression based on the above K-class. Since then a lot of effort has been made to build more reliable facial expression recognition. Ekman et al. proposed FACS in 1978 and revised it in 2002 [3] . In 1997, Lanitis et al. used the active appearance models (AAM) to interpret the face images [4] .
Zhang et al. used dynamic Bayesian network with the FACS (Facial Action Coding System) and realized real-time recognition facial expression substantially [5] . Shan et al. used SVM (Support Vector Machine) with Boosted-LBP (Local Binary Patterns) feature about 7-class facial expression recognition, and obtained the highest accuracy 97.5% for happiness and disgust respectively and the lowest accuracy 74.7% for sadness [6] . Peng Yang et al. divided face image into local patches according to AUs (Action Unites) and extract appearance feature from each patch, they experiment on Cohn-Kanade database by using Adaboost and obtained accuracy 92.3% on the testing set and 80.0% on the extended testing set [7] .
Recently, many researchers have applied ensemble techniques that fuse the results of multiple classifiers instead of using just a single classifier. Bartlett et al. used Adaboost and SVM to get 89.1% on Cohn-Kanade database in Exp. II [8] . Sander Koelstra proposed a dynamic texture-based approach to the recognition of facial Action Units and their temporal models by using GentleBoost ensemble algorithm with Hidden Markov Model. This work tested on Cohn-Kanade database and MMI database, and obtained the highest accuracy 95.80% for AU27 and the lowest accuracy 71.33% for AU7 on the Cohn-Kanade database [9] . Thiago et al. used ensemble classifiers to recognize facial expression with Gabor and LBP, and got 88.9% on Cohn-Kanade database in the Exp. II [10] . These ensemble approaches aim to improve the classification results by integrating the several classification results obtaining on the partial selected datasets by a certain strategy. The multi-classifiers are normally same types and integrated with boosting integrating strategy, so its classification results are still determined largely by the performance of the kernel classifier. But the classifiers with the different mechanics display the discriminatory performances on the facial expression recognition under different cases, such as datasets adopted, features used. Therefore, integrating the contribution of several classifiers to improve overall classification results is one possible solution. In this paper, we aim to explore the possible solution of the multi-classifier fusion to improve overall classification results.
Considering stacking [11 , 12] is an advanced form of ensemble classifier, which seeks to learn the best way of fusing several classifiers to optimize its classification performance, we propose a new emotion recognition system based on stacking in this paper. Whilst, we propose and introduce an approach of base classifiers selection referring to the achievement of Kuncheva on algorithm evaluation with trading off the recognition error and the algorithm diversity [13] . Comprehensive comparison experiments are done in this paper to test the performance of our proposed stacking ensemble facial expression recognition system.
The rest of this paper is organized as following. Section 2 introduces the principle of our stacking ensemble emotion method. Section 3 discusses the selection way of the base classifiers based on kappa-error diagram. The tests on the two public databases JAFFE and Cohn-Kanade are demonstrated in section 4, with a detail analysis of the results and comprehensive comparison to existing methods. The summary of our present work and discussion of the future work are given in the final section.
2. Principle of Stacking Ensemble Expression Recognition Approach
Stacking is a technique to fuse multiple classifiers applied to a specific classification problem [14] , and aims to improve the results of individual classifier. It outperforms the other methods to fuse multi-classifiers by simply voting or linear combination, which integrates the function of individual classifiers in expression recognition through the sample training. Although the ensemble techniques using a fixed rule such as with a simple majority voting rule is unnecessary to train with additional training data, the one using a trained rule which characterizes stacking, is potentially able to obtain a better classification result [15] . Therefore, we propose to employ stacked ensemble in face expression recognition to take full advantage of each individual classifier and obtain the better understanding of emotion by face.
The stacked fusion system is illustrated in Fig. 1 . The lower level in Fig. 1 is called base-level which processes the input respectively with several base classifiers. The upper level in Fig. 1 is called meta-level which stacking relearns the results of base classifiers with using this additional level of classification, the so-called meta-classifier. The detail procedure of stacking is illustrated as follows.
PPT Slide
Lager Image
Fusion system based on stacking
Supposing there are n base classifiers marked as F1 , F2 , ..., Fn , one meta-classifier marked as M, and m classes marked as C1 , C2 , ..., Cm . For each sample S, it will be processed by following procedure.
PPT Slide
Lager Image
Note that the meta-classifier sees only the probabilities estimates for each classifier and class, across the set of fusion samples. And separate data partitions should be used for training the base classifiers, validating the base classifiers to train the meta-classifier, and testing the combined classifier. Usually this is done using cross-validation given the increased data requirements implied by the additional data partitions.
3. Selection of Classifiers According to Kappa-error Diagram
In the field of facial expression recognition, the ability of recognition system depends strongly on the classifiers selected as well as the features used. C.Shan et al. used SVM and Boosted-LBP in Ref [6] . Koutlas et al. applied ANN (Artificial Neural Network) and Gabor filters in Ref [16] . Zhang et al. adopted Dynamic Bayesian network to track run-time emotion [5] . Xu et al. employed KNN (K Nearest Neighbor) in Ref [17] . According to the achievements of the present research, the typical classification techniques: KNN, SVM, ANN and Bayesian, all have achieved a fairly good outcome under a certain context. However, these algorithms realize the classification based on very different principles. For example, KNN is based on minimizing risk, the realization of ANN depends on associative memory, the principle of SVM is maximum interval, and Bayesian is based on posterior probability. So they perform variously under the different cases. Fusing their complimentary functions to improve the effectiveness and robustness of recognition system is one of effective solution. So we explore the feasibility of integration from the above typical classification algorithm.
To make the fusion effective, the diversity between the classifiers is one of key points. Kuncheva pointed out that there are two main factors for successful fusion - individual accuracy and pairwise diversity. She proposes the bound indicating possible ensemble with trading two factors off through mathematic proof and large experiments by analyzing classifier ensemble performance of every two technique pair [18] . Referring to those public conclusions, this paper adopts the kappa-error diagram and bound conclusion in our base-level classifier selection for stacking ensemble face recognition approach.
- 3.1 Theory of Classifier Ensemble Prune Using Kappa-error Diagram
Kappa-error diagram is a popular tool for analyze ensemble methods proposed by Margineantu and Dietterich [13] . Kappa-error diagrams visualize individual accuracy and diversity in a 2D plot, and have been used to decide which ensemble members can be pruned without much harm to the overall performance [13 , 18] . The common kappa and error of two classifiers underestimated are computed as following way. Suppose F1 and F2 are a pair of classifiers underestimated. On a dataset, each classifier is applied to do the classification respectively. The corresponding pairwise contingency table is counted shown in Table 1 . In the table, parameter ‘ a ’ represents the number of samples both classifiers doing the right classification, ‘ b ’ and ‘ c ’ are the number of samples one classifier right and another one is wrong and ‘ d ’ points the number of samples both classifiers wrong Then error ‘ e ’ and kappa ‘ kappa ’ values are computed as Eq. (1) and Eq. (2) [18] .
Contingency Table of Two Classifiers
PPT Slide
Lager Image
Contingency Table of Two Classifiers
PPT Slide
Lager Image
PPT Slide
Lager Image
Where N is the number of samples in the dataset, that is N = a + b + c + d . OA and AC are computed as following Eq. (3) and Eq. (4).
PPT Slide
Lager Image
PPT Slide
Lager Image
Here ‘ OA ’ represents the average sample number that both classifiers having same classification results. So it predicts the coherence extent of two classifiers. With higher coherence, the functions of two classifiers are close and suitable to be pruned without much harm of ensemble. On the contrary, it indicates the functions are various and proper to be kept to improve the whole ensemble results with complementary contribution. Kappa is actually the derived parameter, which removes the chance factor by minus ‘ AC ’, the average of numbers of both right and both wrong. Kappa provides more objective criteria than accuracy directly. So we could conclude that the lower kappa in this definition indicates the higher diversity.
Error in Eq. (1) is the average of number of samples that each classifier doing the wrong classification. It is easy to understand that classifier with higher error will reduce the entire function of the ensemble.
Absolutely, high diversity and high accuracy are what we want in determining the base classifiers for stacking ensemble system.
Further, Kuncheva examines the bound on the region for the dichotomous case where feasible kappa-error tradeoffs are found. The paper derives bounds k min on kappa in terms of the error ‘ e ’, as in Eq. (5). The pairwise closes the bound has good performance benefit for fusion [18] .
PPT Slide
Lager Image
- 3. 2 Base Classifiers Determination
To determine the final base classifiers from the above four typical classifiers, viz KNN, SVM, ANN and Bayesian, we made full analysis fusion performance of the each classifier pairwise. Criteria of fusion performance proposed by Kuncheva, which bases on the kappa-error diagram, are employed in the paper as the evaluation rule. To make the results more generally, we made the analysis on the two public databases and utilized several different features.
- 3.2.1 Databases
In this paper, we use two public databases: the JAFFE and the Cohn-Kanade. The JAFFE database contains 183 images from 10 different Japanese women. The Cohn-Kanade database contains 355 samples from 97 subjects. Each sample includes sequences of frames from movies of the subjects in making various expressions, we use it to test the fusion performance with input data representing in dynamic features.
- 3.2.2 Solution of Kappa-error Diagram Based Base Classifier Determination
The facial expression images in two public databases are preprocessed being represented in different features respectively. Here we used the Gabor feature, static geometric feature and dynamic feature. The four classifiers are used separately in expression recognition. Because the base-level classifiers should be as simple as possible, the typical algorithms: 1-NN (1-nearest neighboring), SMO (Sequential Minimal Optimization) [19] , MLP (Multilayer Perceptron) and NB (Naïve Bayes) are chosen for the four classifiers mentioned above. The 6 pairs of classifiers between each other in 4 classifiers are counted with the contingency table in Table 1 . After counting the recognition results of each pair of classifier, error and kappa about each classifier pairwise are computed according to Eq. (1) and Eq. (2). The values of pairwise kappa and pairwise error among 1-NN, SMO, NB and MLP are shown on Table 2 . The corresponding kappa-error diagram is shown in Fig. 2 .
PPT Slide
Lager Image
Kappa-error diagram about six pairs of classifiers. The points obtained by experimenting on Gabor feature of JAFFE are shown cyan, the points obtained by experimenting on Gabor feature of Cohn-Kanade are shown red, the points obtained by experimenting on static geometry feature of Cohn-Kanade are shown blue, and the points obtained by experimenting on dynamic geometry feature of Cohn-Kanade are shown green.
Information of Six Pairwise Classifiers
PPT Slide
Lager Image
Information of Six Pairwise Classifiers
To determine the base classifiers for stacking ensemble expression recognition, we do the analysis from two angels according to ensemble prune theory. Referring to Kuncheva’s experiment conclusion, we first evaluate the accuracy of classifier in facial expression recognition which is the leading factor for the fusion success. The bad performance of individual classifier will cause the catastrophic fusion, so we remove the corresponding classifiers from ensemble directly. Then we analyze the pairwise diversity, especially taking the Kuncheva’s bound as reference to decide the performance of classifier according to if classifier pairs are closer to the bound curve k in kappa-error diagram. The classifier pairwise will be fare better for ensemble than that far away.
Base on the experiment results in Table 2 , we make analysis of the error first. We could find that the pairwise error of SMO and MLP is smallest in the four different cases with different feature or in different databases. This indicates SMO and MLP plays comparatively well in facial expression recognition and can be considered to be adopted. The similar results are displayed visually in Fig. 2 , where the points of MLP-SMO pairwise classifiers marked with the rectangle symbol ‘□’.
From the perspective of diversity criteria, the pairwise classifiers of MLP and 1-NN in three out of four cases shown in Table 2 have lowest kappa. They are also the points marking with the small star symbol ‘*’ in Fig. 2 , which are closer to the bound having better ensemble performance with tradeoff of lower pairwise error and diversity.
On the other hand, three out of four of pairwise: 1-NN and NB has highest error shown corresponding column in Table 2 . Actually, analyzing the NB in facial expression recognition, its performance drops because "High dimensionality and small size samples" is widely encountered in facial expression recognition, whilst the Bayes works well depending on the training with big samples. So avoiding the catastrophic fusion, we don’t count the NB as base classifier.
According to the above analysis, 1-NN, MLP and SMO are selected as candidates of base classifiers for stacking ensemble. In making the final determination, the performance of the pairs of classifiers SMO and the related MLP versus the unrelated 1-NN are considered. We can find from the Table 2 , the error and kappa of those pairs are located at the middle with moderate average error and lowish kappa. From this perspective, fusion of SMO and 1-NN wouldn’t cause unacceptable error and their diversity is also big enough for the ensemble to enhance expression recognition. Therefore MLP, 1-NN and the SMO implementation of SVM are selected as base classifiers in the paper.
We also make some further explanation in relation to some patricular points in Fig. 2 . Because we don’t use elaborate preprocessing filters, such as illumination normalization, all points in red in Fig. 2 are far from the bound curve. Here, all these points labeling in red symbols represent the performance of all pairwise classifiers based on the Gabor feature in Cohn-Kanade database. It is usually caused by other factors such as unstable of illumination. We prefer to make comparative comparison across all filters, features and databases.
- 3. 3 Determining of Meta-classifier
The role of meta-classifier is to fuse each individual classifier’s recognition results to obtain more robust decision by retraining approach. The input of the meta-classifier is each individual classifier’s results and output is fusion recognition results of six basic expressions. We can’t have clear idea about their distribution to determine discriminate form. As the sample size in our system is not large, we consider SMO as our meta-classifier. SMO is able to provides a solution for small sample problem and training dataset. It has the good ability to solve the non-linear classification problem by transform into the high dimension feature space to construct the hyper-plane to do the linear classification in hyper-space.
4. Experiments and Analysis
To verify the robustness and effectiveness of stacking fusion strategy, we did experiments on two public databases and with three different expression features. Full comparison was done among our proposed stacked ensemble system with KNN, MLP and SMO. To get the further understanding of stacking-based multi-classifier ensemble in facial expression recognition system, further comparison were made among the results between stacking based approach and vote-based [20] and bagging-based [21] ensemble methods respectively.
- 4.1 Preprocess of Facial Expression Images in Databases
The evaluation of our proposed multi-classifier fusion approach is done on the public databases JAFFE with 183 samples and Cohn-Kanade with 355 samples. Samples in JAFFE are independent static expression images, while samples in Cohn-Kanade database are a series of frames of six basic expressions. In every expression sequence, the first frame is normally neutral state and the last frame is the extreme emotional state when making expression by face, which we called peak frame. So we use this database to extract the dynamic feature to verify our stacking fusion strategy. The images in two databases are preprocessed to extract the corresponding features respectively as follows.
For images in JAFFE database are only static facial expression images, we only extract the holistic feature Gabor as the representation. First, we use the Viola-Jones face detection algorithm to locate and partition the rectangular area of face part, as shown Fig. 3 (a). Then resize the face images to 80×100 pixels. Second, process the resized image with the Gabor wavelet filter. To get the low dimension feature, PCA is utilized to reduce dimension. Here we select the first 300 dimensions as features. In below experiments, we marked the preprocessed result as “JAFFE-Gabor Feature”.
PPT Slide
Lager Image
(a) Face sub-image gotten by using Haar-wavelet. (b) 70 key points on the peak frame. (c) Normalization face. (d) 47 key points used as geometry feature.
On Cohn-Kanade database, we used AAM to do the point location, where we select 70 points referring to the FDP in MPEG-4 on the peak frame, as shown in Fig. 3 (b). Based on the point location, the face part is segmented as shown in Fig. 3 (c) and resize into 90×96. Similar as the “JAFFE-Gabor Feature”, we used Gabor and PCA to extract features for a total of 300 dimensions. The result was named “CK-Gabor Feature”
Based on the extracted 70 key points on the Cohn-Kanade database, we generate a kind of geometric feature. Considering the areas of mouth, nose and eyes contribute most information to facial expression [22] , we select 46 key points from above 70 points around these areas as feature points. In addition, we select the point of the lower jaw to keep global information as well. So, there are 47 feature points, as shown Fig. 3 (d), which are selected as key points for geometric features. To alleviate the influence of head movement, we adjust the coordinate of the above key points by taking the point at the tip of the nose as reference to do the normalization. Then the location of these 47 points are concatenated as the static geometry features, which were called “CK-Static Geometry Feature” in following experiments, with a total of 94 dimensions.
Considering the Cohn-Kanade database contains multi-frame sequences and dynamic information, we sought to take the dynamic features into account. Dynamic feature is represented by computing the difference between locations of above 47 key points of the first frame (the neutral expression) and that of the last frame (the peak expression). The derived dynamic geometry features were called “CK-Dynamic Geometry Feature” in following experiments, with a total of 94 dimensions.
- 4.2 Evaluation Metric
Accuracy is a usual performance measure criterion for evaluating the classifier’s effectiveness. However, it’s highly likely that classifier will get a significant proportion of its classification correct by chance. Therefore, to get more objective and comprehensive knowledge of classifier’s performance, we adopt three common evaluation criteria, which are Cohen’s kappa and informedness besides accuracy.
The Statistic Result for Recognition
PPT Slide
Lager Image
The Statistic Result for Recognition
For convenience to introduce accuracy, kappa and informedness, the paper gives Table 3 which shows the statistic result for recognition. Ci represents one expression set(6 basic expressions in all used in the paper), N is the amount of total samples in a database and aij represents the amount of samples, which belong to Ci expression set and was classified to Cj expression set.
PPT Slide
Lager Image
are derived values indicating the total amounts of sample in original and predicting set respectively of each expression.
Accuracy is obtained as Eq. (6), indicating the proportion of right prediction amount from total samples.
PPT Slide
Lager Image
Cohen’s kappa [23] is a more conservative metric since it cancels off the chance component (and renormalizes to the form of a probability). Eq. (7) gives the method to calculate kappa.
PPT Slide
Lager Image
Where Po is the observed probability, and Pc is the hypothetical probability of chance. Further,
PPT Slide
Lager Image
The last metric called Informedness [24] which corresponds to the probability that you are making an informed decision versus guessing. Informedness is calculated by Eq. (8).
PPT Slide
Lager Image
Where
PPT Slide
Lager Image
- 4.3 Experiment Results and Analysis
After preprocessing of the facial expression images in JAFFE and Cohn-Kanade databases respectively, we train and test our stacking fusion expression recognition approach by using 10-fold cross-validation (CV). This effectively makes up the shortage of insufficient samples with running classifiers 10 times in round by partitioning the sample into 10 with one “fold” containing 10% of the data reserved for testing and the remainder of the data (90% data) used for training the classifiers in each cycle.
In order to have further understanding of our approach’s performance, we make comprehensive comparison with the several individual classifiers and other common multi-classifier fusion methods: vote and bagging which are used into the expression recognition on the same samples. The comparisons with the existing facial expression recognition are also given in the paper.
- 4.3.1 Comparison with Each Individual Base Classifier
To verify if the fusion approach improves the recognition result comparing with the individual classifiers, we use the three base classifiers and our proposed stacking fusion approach to do the facial expression recognition respectively. The results are counted up by computing the corresponding performance evaluation metric: accuracy, kappa and informedness in two databases and with different features. The results of the experiments are shown in Table 4 . The best recognition result is emphasized with writing values in boldface. The numerals in brackets represent the standard error (SE) of the mean.
Evaluation of Stacking Ensemble Approach Comparing with a Single Base Classifier
PPT Slide
Lager Image
Evaluation of Stacking Ensemble Approach Comparing with a Single Base Classifier
From the experiment results, we can find our proposed stacking fusion expression recognition approach has the best performance in all different cases and with different evaluation metric. Here the three evaluation parameters show the similar rules of each classification approach and provide the further proof of reliability of our experiment results. Now we neglect the fusion results and only analyze the individual performance of each classifier first, we could find their performance varies in different databases and features. As Table 4 shows, 1-NN displays the best performance in the first case, MLP is the best one in the second and third cases, and SMO outperforms the other two in the last cases. By fusion their complementary contribution in different cases, the stacked ensemble system outperforms the individual classifiers. Analyzing the SE of the mean shown in brackets in Table 4 , we could find that SE of stacking approach displays good performance being or approximate to lowest SE. It verifies the effectiveness of our proposed stacking fusion facial expression recognition approach again from another perspective.
Nevertheless, we still find that stacking ensemble system is higher but not surpasses the best single classifier very much according to the each evaluation value under different cases as in Table 4 . For example, all the classifiers perform well in the JAFFE database. Using the dynamic feature, the recognition results are better than the corresponding ones in the static feature. So, performance of multi-classifier ensemble system relies on individual classifiers. It is important to select individual classifiers which have good individual performance and diverse contribution as base classifiers to improve the fusion result.
- 4.3.2 Comparison with Bagging and Vote Fusion Approaches
Bagging, that is bootstrap aggregating, involves having each model in the ensemble vote with equal weight. In order to promote model variance, bagging trains each model in the ensemble using a randomly drawn subset of the training set. According to the existing study, bagging with a decision tree is able to achieve good classification accuracy [25] . So, our experiments adopt REPTree as one of base classifiers for bagging to do the comparison besides base classifiers: 1-NN, SMO and MLP, which are the base classifiers for our stacking fusion approach. Table 5 shows the experiment results about bagging with different classifiers under corresponding cases. We could find in the Cohn-Kanade database, the bagging with SMO as the base classifier has best recognition results and the bagging with 1-NN displays best performance in JAFFE database.
Recognition of Bagging
PPT Slide
Lager Image
Recognition of Bagging
Vote is another ensemble method which fuses several classifiers by voting algorithms. In our experiments we use four common voting combining rules respectively, viz: “average probabilities”, “product probabilities”, “maximum probability” and “majority voting”. The results of vote which fuses 1-NN, SMO and MLP are shown in Table 6 . It shows that majority based voting rule outperforms the others.
Recognition of Vote
PPT Slide
Lager Image
Recognition of Vote
Based on the above results from Table 5 and Table 6 , we compare our stacking approach with the best ones under each case. As shown in Table 7 , our stacking fusion approach outperforms the others under the first three cases. Although the value is slightly lower than that of vote (majority voting), the gap is less than 0.35% under the last case. It demonstrates that dynamic information plays important roles in representing the emotion expressing. Most classifiers can play good recognition results with the dynamic feature. So the majority displays good performance. However, the performance of our approach still approximates the best one under this case. In general, our proposed approach shows robustness performance in facial expression recognition with several different features and in two different common databases.
Recognition of Ensemble Method
PPT Slide
Lager Image
Recognition of Ensemble Method
PPT Slide
Lager Image
Comparison among all the methods
To give a clearer overall idea of the above comparison, we draw the bar graph in Fig. 4 . Obviously, stacking outperforms the others including both the individual and the ensemble classification approaches for both the standard databases using either Gabor or simple geometric features. Although the evaluation values appear slightly lower for the CK database with the dynamic geometric features, stacking is not significantly worse than the best one. Overall, our proposed stacking approach provides stable facial expression recognition.
- 4.4 Comparison with Existing Expression Recognition Methods
We compared our method with existing works that using either the Cohn-Kanade or the JAFFE database with recognition accuracy shown in Table 8 and Table 9 . Note, all the results come from the original published papers. Compared with the results available, our proposed stacking ensemble fusion expression recognition method achieve relatively good results achieving the highest recognition accuracy with the dynamic geometry feature in CK database in Table 8 and Gabor feature in JAFFE database in Table 9 . It is noted that we do not lay emphasis on the feature extraction, but focus on the multi-classifier integration by using the complementary contribution to improve the overall effectiveness. So, the performance could potentially be improved with more discriminative features and normalizing for factors such as unstable illumination.
Comparison with several existing Methods on Cohn-Kanade Database
PPT Slide
Lager Image
Comparison with several existing Methods on Cohn-Kanade Database
Comparison with several existing Methods on JAFFE Database
PPT Slide
Lager Image
Comparison with several existing Methods on JAFFE Database
5. Conclusion and Future Work
Classifier plays important roles in facial expression recognition, but individual classifier shows some extent of bias in different databases or with different representation as features. To overcome this shortage, stacking ensemble system is employed in our paper to integrate the performance of multi-classifiers. We especially propose to use the kappa-error diagram in selection of base classifiers from frequently-used classifiers with different mechanics. The experiment results show that the stacking always outperforms the others either in different databases or with different features. Our proposed ensemble stacking overcomes the bias of individual classifiers. Using the learning way to estimate and weight the contribution of base classifiers outperforms the voting-based multi-classifier fusion algorithm and bagging ensemble method in most cases.
In this paper, we used only one kind of feature respectively on each case. However, several kinds of feature may perform better to describe expression. So, to fuse different sets of features – in particular the static and dynamic features of Cohn-Kanade, and potentially classifiers on the cross product of features space and classifier choice is the future work.
BIO
Xibin Jia is currently Associate Professor of computer college, Beijing University of Technology. Member IEEE and CCF. She received the Ph.D. in Computer science and technology from Beijing University of Technology in 2007. Her main research interest is visual information perception, multi-source fusion. She now especially engages in expression recognition and behavior cognition based on multi-information.
Yanhua Zhang is a Master candidate is Beijing University of Technology. Her main research direction is visual facial expression recognition.
David M W Powers is currently Professor of Cognitive and Computer Science, Associate Dean (International) and Director of the Centre of Knowledge and Interaction Technologies, in the School of Computer Science, Engineering and Mathematics, Flinders University, Adelaide, South Australia, as well as Visiting Professor at the Bejing University of Technology, with support from the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions.
Humayra Binte Ali is a PhD candidate in Flinders University, South Australia. Her current research broad area is machine learning and pattern recognition. And her research interests are in machine learning, pattern recognition, computer vision, image analysis and 3D image analysis. She had research experience in mobile autonomous ground robotic vehicle (DSTO project) and early heart rate detection using machine learning in different university projects.
References
Usman Tariq , Huang Thomas S. 2012 "Features and fusion for expression recognition—A comparative analysis" in Proc. of IEEE Computer Society Conf. on Computer Vision and Pattern Recognition Workshops June 16-21 Article (CrossRef Link) 146 - 152
Paul Ekman , Friesen Wallace V. 1971 "Constants across cultures in the face and emotion" Journal of personality and social psychology Article (CrossRef Link) 17 (2) 124 - 129    DOI : 10.1037/h0030377
Ekman P. , Friesen W. , Hager J. 2002 “The Facial Action Coding System: The Manual on CD ROM” A Human Face Salt Lake City Article (CrossRef Link)
Andreas Lanitis , Taylor Christopher J , Cootes. Timothy F. 1997 "Automatic interpretation and coding of face images using flexible models" Pattern Analysis and Machine Intelligence, IEEE Transactions on Article (CrossRef Link) 19 (7) 743 - 756    DOI : 10.1109/34.598231
Yongmian Zhang , Ji Qiang 2005 "Active and dynamic information fusion for facial expression understanding from image sequences" Pattern Analysis and Machine Intelligence, IEEE Transactions on Article (CrossRef Link) 27 (5) 699 - 714    DOI : 10.1109/TPAMI.2005.93
Caifeng Shan , Gong Shaogang , McOwan. Peter W. 2009 "Facial expression recognition based on Local Binary Patterns: A comprehensive study" Image and Vision Computing Article (CrossRef Link) 27 (6) 803 - 816    DOI : 10.1016/j.imavis.2008.08.005
Yang P , Liu Q , Metaxas D N 2010 “Exploring facial expressions with compositional features” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition June 13-18 Article (CrossRef Link) 2638 - 2644
Bartlett M. , Littlewort G. , Frank M. , Lainscsek C. , Fasel I. , Movellan J. 2005 “Recognizing facial expression: Machine learning and application to spontaneous behavior” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition June 20-25 Article (CrossRef Link) 568 - 573
Koelstra Sander , Maja Pantic , Ioannis Patras 2010 “A dynamic texture-based approach to recognition of facial actions and their temporal models” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 32 (11) 1940 - 1954    DOI : 10.1109/TPAMI.2010.50
Zavaschi Thiago H.H. , Britto Alceu S. , Oliveira Luiz E.S. , Koerich Alessandro L. 2013 “Fusion of feature sets and classifiers for facial expression recognition” Expert Systems with Applications Article (CrossRef Link) 40 (2) 646 - 655    DOI : 10.1016/j.eswa.2012.07.074
Wolpert David H. 1992 “Stacked generalization” Neural Networks Article (CrossRef Link) 5 (2) 241 - 259    DOI : 10.1016/S0893-6080(05)80023-1
Sulzmann J N , Fürnkranz J 2011 “Rule Stacking: An approach for compressing an ensemble of rule sets into a single classifier” Discovery Science, Springer Berlin Heidelberg Article (CrossRef Link) 323 - 334
Margineantu D.D. , Dietterich T.G. “Pruning Adaptive Boosting” in Proc. of 14th Int’l Conf. Machine Learning vol.97, pp.211-218, July, 1997. Article (CrossRef Link).
Kittler J. , Hatef M. , Duin R.P.W. , Matas J. 1998 “On combining classifiers” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 20 (3) 226 - 239    DOI : 10.1109/34.667881
Li Shoushan , Huang ChuRan 2010 “Chinese Sentiment Classifi-cation Based on Stacking Combination Method” Journal of Chinese Information Processing Article (CrossRef Link) 24 (5) 56 - 61
Koutlas A. , Fotiadis D. I. 2008 “An automatic region based methodology for facial expression recognition” in Proc. of IEEE Conf. on Systems, Man and Cybernetics October 12-15 Article (CrossRef Link) 662 - 666
Xu Yong , Zhu Qi , Fan Zizhu , Qiu Minna , Chen Yan , Liu Hong 2013 “Coarse to fine K nearest neighbor classifier” Pattern Recognition Letters Article (CrossRef Link) 34 (9) 980 - 986    DOI : 10.1016/j.patrec.2013.01.028
Kuncheva Ludmila I. 2013 “A Bound on Kappa-Error Diagrams for Analysis of Classifier Ensembles” IEEE Transaction on Knowledge and Data Engineering Article (CrossRef Link) 25 (3) 494 - 501    DOI : 10.1109/TKDE.2011.234
Platt John 1998 “Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines” Article (CrossRef Link).
Kuncheva L I 2007 “Combining Pattern Classifiers: Methods and Algorithms (Kuncheva, LI; 2004)[book review]” Neural Networks, IEEE Transactions on Article (CrossRef Link) 18 (3) 964 - 964    DOI : 10.1109/TNN.2007.897478
Ting K. M. , Witten I. H. 1997 "Stacking Bagged and Dagged Models" in Proc. of 14th Conf. on Machine Learning Article (CrossRef Link) 367 - 375
Zhong L , Liu Q , Yang P 2012 “Learning active facial patches for expression analysis” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition June 16-21 Article (CrossRef Link) 2562 - 2569
1960 “A coefficient of agreement for nominal scales” Educational and Psychological Measurement Article (CrossRef Link) 20 (1) 37 - 46    DOI : 10.1177/001316446002000104
Powers D. M. W. 2011 “Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation” Journal of Machine Learning Technologies Article (CrossRef Link) 2, (1) 37 - 63
Breiman L. 1996 “Bagging Predictors” Machine Learning Article (CrossRef Link) 24 (2) 123 - 140
Cohen I. , Sebe N. , Garg A. , Chen L. , Huang T. S. 2003 “Facial expression recognition from video sequences: Temporal and static modeling” Computer Vision and Image Understanding Article (CrossRef Link) 91 (1) 160 - 187    DOI : 10.1016/S1077-3142(03)00081-X
Shan C. , Gong S. , McOwan P.W. 2005 “Robust facial expression recognition using local binary patterns” in Proc. of IEEE Conf. on Image Processing September 11-14 Article (CrossRef Link) 370 - 373
Bashyal S. , Venayagamoorthy G. K. 2008 “Recognition of facial expressions using gabor wavelets and learning vector quantization” Engineering Applications of Artificial Intelligence Article (CrossRef Link) 21 (7) 1056 - 1064    DOI : 10.1016/j.engappai.2007.11.010
Yu Kaimin , Wang Zhiyong , Zhuo Li , Wang Jiajun , Chi Zheru , Feng Dagan 2013 “Learning realistic facial expressions from web images” Pattern Recognition Article (CrossRef Link) 46 (8) 2144 - 2155    DOI : 10.1016/j.patcog.2013.01.032