Advanced
Gender Classification of Low-Resolution Facial Image Based on Pixel Classifier Boosting
Gender Classification of Low-Resolution Facial Image Based on Pixel Classifier Boosting
ETRI Journal. 2016. Apr, 38(2): 347-355
Copyright © 2016, Electronics and Telecommunications Research Institute (ETRI)
  • Received : August 10, 2014
  • Accepted : December 09, 2015
  • Published : April 01, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Kyu-Dae Ban
Jaehong Kim
Hosub Yoon

Abstract
In face examinations, gender classification (GC) is one of several fundamental tasks. Recent literature on GC primarily utilizes datasets containing high-resolution images of faces captured in uncontrolled real-world settings. In contrast, there have been few efforts that focus on utilizing low-resolution images of faces in GC. We propose a GC method based on a pixel classifier boosting with modified census transform features. Experiments are conducted using large datasets, such as Labeled Faces in the Wild and The Images of Groups, and standard protocols of GC communities. Experimental results show that, despite using low-resolution facial images that have a 15-pixel inter-ocular distance, the proposed method records a higher classification rate compared to current state-of-the-art GC algorithms.
Keywords
I. Introduction
Gender classification (GC) using facial images has been an important issue in face analysis communities in recent years. Some GC approaches may use biometric information, such as irises, fingerprints, and voice information. However, these approaches may require the touch or attention of users and may present a potential privacy problem. Many applications of GC prefer non-intrusive and non-cooperation methods. In this paper, we consider the GC methods of image processing and pattern classification using facial images.
Gender classification can be categorized into geometric-based and appearance-based methods. Conventional geometric-based GC methods utilize distances between fiducial points, such as the eyes, nose, and mouth. Fellous [1] used the lengths of 24 feature vectors. The classification rate of 100 FERET images was 90%. However, it is difficult to extract the exact fiducial points on the facial images under unconstrained real-world conditions.
Appearance-based GC methods are based on global or local features extracted from the intensities of pixels. Moghaddam and Yang [2] used a support vector machine (SVM) with radial basis function (RBF) kernels for GC on thumbnail faces (12 pixels × 21 pixels). They evaluated a classifier using 1,755 FERET images and achieved an accuracy of 96.6%. Baluja and Rowley [3] introduced a fast GC system that uses simple pixel comparisons in a face image. For the FERET dataset, their approach (500 comparison operations on 20 pixel × 20 pixel images) achieved a 94.3% accuracy, which matches the results of the SVM in [3] .
Most of the above GC methods were evaluated using a controlled laboratory dataset such as the FERET dataset [4] . Some researchers have made their own datasets to avoid limited conditions. Shakhnarovich and others [5] collected over 3,500 face images from the Internet. The classification rate of their AdaBoost classifier with 500 Haar-like features was 79.0%. Gao and Ai [6] collected 10,100 Mongoloid faces in a real environment. They obtained a 95.51% accuracy using a probabilistic boosting tree with simple Haar-like features.
Large datasets of uncontrolled real-world conditions have recently been in the spotlight. Labeled Faces in the Wild (LFW) [7] is a representative dataset for real-world conditions. The Images of Groups (TIOG) dataset [8] is also from real-world settings. These datasets contain various characteristics of a person, such as age, ethnicity, facial expression, and accessories, in various capture environments (including lighting conditions, low resolution, and noise).
Shan [9] used LFW for a GC evaluation. A total of 7,443 face images were aligned using commercial face alignment software, and all aligned images were normalized into 127 pixels × 91 pixels. The best classification rate was 94.81% when using local binary pattern (LBP) features and an SVM classifier. This is a very high classification rate, considering the fact that a real-world dataset has been used, but the evaluation protocol (train and test set information) of the 7,443 selected face images is not open to the public.
Dago-Casas and others [10] evaluated classifiers including principal component analysis (PCA), linear discriminant analysis (LDA), and SVM with several features, including pixels, Gabor-Jets, and LBPs, on the LFW and TIOG datasets. They followed the standard protocols for an evaluation of the LFW dataset and proposed evaluation protocols for the TIOG dataset. The best results (accuracy) were 94.01% for LFW and 86.61% for TIOG.
In the field of face recognition, a minimum face image resolution between 32 pixels × 32 pixels and 64 pixels × 64 pixels is required, else the recognition performance will be dramatically degraded [11] . Recent GC methods exhibiting state-of-the-art performances use a face width and height of about 100 pixels. In applications using a typical surveillance camera, it is difficult to obtain such large facial images. Therefore, it is important to reduce the base size of a face to apply GC to the real world.
We propose a facial GC method using an AdaBoost classifier with modified census transform (MCT) [12] features. An MCT feature can represent twice as many descriptions as a conventional LBP [13] feature, which is the current state-of-the-art feature of GC [9] , [10] . Usually, the occurrences of LBP codes in image blocks are collected into a histogram. A classification is then performed using an SVM or by computing the histogram similarities. However, the process of making a histogram may potentially cause detailed spatial information to be lost. Furthermore, a large face image is required compared with a pixel-based method. We extract the MCT features of specific pixel positions, which have a discriminant ability to classify gender, using AdaBoost learning. This process helps determine the proper region of a face for GC. The proposed method can achieve a state-of-the-art performance on the LFW and TIOG datasets, in spite of the low resolution of facial images.
The remainder of this paper is organized as follows. In Sections II and III, GC benchmark datasets, the evaluation protocol, and the metrics used in the proposed method are described. The proposed MCT-AdaBoost GC method and its experimental results are provided in Sections IV and V. Finally, Section VI offers some concluding remarks regarding this research.
II. Datasets and Evaluation Protocols
Early GC studies [1] [3] often use the FERET dataset. FERET is a standard dataset used for face recognition evaluations. However, in early studies, researchers used only a set of frontal faces and regular facial expressions. This dataset (FA-part of FERET, 1,759 images with 1,150 male and 609 female faces from 1,010 individuals) is not realistic and has a small number of samples; therefore, it is hard to tell whether this dataset reflects real-world environments. The use of a large and realistic dataset is a growing trend among recent facial analyses.
The LFW and TIOG datasets are good examples. The LFW dataset was designed for studying the unconstrained face recognition problem. This dataset contains more than 13,000 face images of 5,749 subjects (4,263 male, 1,486 female) collected from the Internet. Some individuals appear more than once in the dataset. The TIOG dataset was gathered from Flickr images. This dataset is more balanced in terms of gender compared with the LFW dataset. A total of 5,080 images containing 28,231 faces are labeled with age and gender. Compared with datasets with constrained conditions, there is a great deal of variety in these datasets, including makeup; accessories such as glasses and hats; the occlusion of faces; unusual facial expressions; poses; and various backgrounds.
Sample images of the above two large datasets are shown in Fig. 1 ; none of the images are cropped. In the case of the TIOG dataset, there are groups of people in the images. For an evaluation comparison, not only the name of the image file but also the coordinates of the target subjects must therefore be specified.
PPT Slide
Lager Image
Sample images of LFW (top) and TIOG (bottom) datasets.
To compare the evaluation results of different gender classifiers, it is very important to not only select the same dataset but also follow the same protocol configuration in the GC benchmarks or challenges. A protocol simply means the selected training and testing sets in the whole dataset and its fold-division information.
There are standard protocols for a GC evaluation of the LFW dataset. Gehrig and others [14] proposed an evaluation protocol for GC and age estimation in a facial image analysis conducted at the Benchmarking Facial Image Analysis Technologies (BeFIT) challenge [15] . The evaluation datasets were organized based on two conditions. One is a controlled laboratory environment, and the other is unconstrained real-world conditions. These datasets were chosen using the criteria available for research purposes. The BeFit website provides file lists for a five-fold cross validation. We also followed these protocols for the evaluation.
There is also an evaluation protocol for a TIOG dataset. Dago-Casas and others [10] proposed a GC protocol for this dataset, and conducted single- and cross-validation experiments on the LFW and TIOG datasets.
We followed the protocols of GC benchmarks [14] for the LFW dataset, and the protocol proposed by [10] for the TIOG dataset.
In the case of the LFW dataset, the authors of [10] modified the fold information of [14] because the OpenCV [16] face detector cannot detect all faces. They therefore used 13,088 of all 13,233 facial images of the original protocol. However, we did not modify the original fold information, because there are no specialized or standard face detectors for GC. Other authors [9] , [10] have stated that they use an OpenCV face detector. This detector has many tunable parameters. Therefore, even if the same OpenCV is used to detect a face, it is possible to have different face detection results. We thus consider that the ground truth of the eye coordinates should be used to detect and align a face. This might help with fairly comparing GC results invariant of the face alignment. Note that there were no restrictions regarding face detection, face alignment, or face region of interest (ROI) selection in the benchmark protocols described in [10] and [14] . An attempt to detect and align faces through the eye ground-truth may be a good baseline of future methods of GC.
III. Evaluation Metrics
Accuracy (ACC) is a common metric used to evaluate GC results.
(1) ACC=( TP+TN )/( P+N ),
where P is the total number of positive samples; N is the total number of negative samples; and TP and TN are the numbers of correctly classified positive and negative samples, respectively. This metric represents the whole accuracy of the classification. However, the ACC metric occasionally becomes useless, especially when conducting an evaluation using imbalanced data. For example, in a dataset including 90% males and 10% females, the classifier classifying all subjects as male will have a 90% ACC. Therefore, metrics enabling the performance of the separated classes to be measured are needed; that is, the true positive rate (TPR) and true negative rate (TNR).
(2) TPR=TP/P,
(3) TNR=TN/N.
IV. Proposed Method
- 1. MCT Features
MCT [12] features are local structure features that are robust to variations in illumination. The LBP [13] is the current state-of-the-art feature of GC, and has many similarities with the MCT. Therefore, we will describe the LBP briefly before embarking on a discussion on the MCT.
The LBP is a simple but efficient texture operator. The LBP operator forms codes for the image pixels by thresholding the neighborhood of each pixel with the center value. For 3 × 3 neighborhoods, 2 8 (= 256) different codes can be extracted, which are considered a binary number. Commonly, the occurrences of LBP codes in an image are collected into a histogram. The classification is then performed using an SVM or by computing simple histogram similarities.
Similarly, an MCT can determine the 511 (2 9 −1) structure kernels defined on a 3 × 3 neighborhood. The equation for an MCT is
(4) Γ( x )= ∑ n=0 8 ζ( i n − i ¯ ) 2 n ,
(5) ζ(α)={ 1 if  α>0, 0 otherwise,
where in is one of the nine pixel intensities in a local spatial 3 × 3 kernel (center position is x ), and
i ¯
denotes the mean value of all pixel intensities in the kernel. There can be ( n − 2) × ( n − 2) MCT codes in an n × n analytic window except for the outermost pixels. Figure 2 shows an example illustration of an MCT.
PPT Slide
Lager Image
Example of MCT computation: (a) example face and one arbitrary location, (b) pixel values of 3×3 rectangle, and (c) local structure kernel, its MCT value is 63.
- 2. Learning Method
AdaBoost is an algorithm used to construct one strong classifier as a linear combination of several weak classifiers. We use the AdaBoost procedure of Fröba and Ernst [12] to train the classifier. We briefly summarize their boosting procedure in this subsection.
As can be seen from Fig. 2 , an MCT value is an integer number lying within the range [0, 511]. This value is used as a lookup table (LUT) index. Each LUT holds a weight for each kernel index, and is used to construct weak classifiers. The pixel classifier h x is the weighted sum of all weak classifiers at location x . The final classifier H(Γ) is the sum of all pixel classifiers.
(6) H(Γ)= ∑ x∈ W ′ h x ( Γ(x) ),
where W ′ is the set of pixel locations with an associated pixel classifier, h x . The number of pixel classifiers needs to be predetermined before the learning AdaBoost. The threshold for the final classifier H(Γ) is determined in the training stages. When positive and negative sample sizes are balanced (such as in the TIOG dataset), we use the threshold as the mean value of the classification score of the training samples. However, when the class sizes are very different (such as in the LFW dataset), it is difficult to determine an optimal threshold. We therefore produce two kinds of thresholds using the scores of the validation samples (5% of the training samples). The first is the threshold to maximize the ACC of the validation sets, and the second is the threshold to minimize the difference between the TPR and TNR of the validation sets.
V. Experiments and Results
- 1. Face Detection and Alignment
In terms of the system, a gender classifier basically has to be able to localize the face region in an input image. Before extracting features or computational processing in the same way as a usual image processing system, a face image needs preprocessing for elements such as pose and illumination compensation; size normalization; and denoising. Among these preprocessing methods, the face alignment is critical, especially for feature-based GC. Recent feature-based GC methods are usually based on local descriptor features such as LBP, HOG, and Gabor jets. The proposed MCT feature is also a kind of local feature, and face alignment is therefore also important.
The most common method of face alignment uses the locations of the fiducial points on the face (eyes, mouth, and so on). This method aligns the face based on its geometrical relationships between facial features. In the GC benchmarks or challenges, there are two different evaluation methods. The first is deriving the results from the whole process such as face localization, feature extraction, and classification; the second is only focused on features and classification methods without face localization. In certain face datasets, such as the TIOG dataset, images contain multiple faces. This means that all classifiers trying to evaluate such a datasets have to be able to find or localize specific faces listed in the relevant evaluation protocols. Empirically, however, it is a difficult task to develop a face detector that can find whole faces with in a large face dataset, because such datasets reflect uncontrolled conditions. We therefore consider that an evaluation method including a face detection process increases the burden for a fair evaluation among several gender classifiers. We thus use the eye points to align the faces used in our experiment. Fortunately, the ground truth of the eye positions is provided by the dataset website and benchmarks, or can be easily found on the Internet (in the case of the LFW, we obtained the eye ground truth from [17] ).
To obtain aligned images, the size of an original image is normalized based on the inter-ocular distance and rotated around the center point of the eyes, making its orientation match that of the x -axis. In this way, the cropped images have a face with the eyes at the same predetermined coordinates.
- 2. Defining ROI of Face
Although some researches [9] , [10] have stated that they did not include the hair region, there is still no common agreement on setting the face ROI, such as the proportion of the face area in the analytic windows and the relative inter-ocular distance from the image width. Every GC researcher set a different face region.
The fact that the outside region of the face must be included was proved through the following experiment. We defined the face ROI through a systematic approach. First, we trained the proposed gender classifier with the facial image including a wide background area. Pixel positions having gender discriminate ability are then identified. On this basis, we finally determined a region of the face for further experiments.
With the purpose of checking and preventing the ROI selection from being dependent on the LFW and TIOG datasets, we gathered another face dataset, which includes 8,287 face images (3,407 female and 4,880 male) from the Internet. In this dataset, the eye coordinates are manually annotated for the exact face alignment. For the LFW, TIOG, and proposed datasets, we gave all images dimensions of 250 pixels × 250 pixels, and a face with eye coordinates of (103, 114) and (147, 114) through rotation, resizing, and cropping processes. All aligned images were resized to 57 pixels × 57 pixels for training. These processes are illustrated in Fig. 3 .
PPT Slide
Lager Image
Face image preparation before training classifier to determine face region that has impact on GC: (a) original image, (b) eye normalized image, and (c) resized image for training.
In MCT-AdaBoost learning, we extract the same 1,500 pixel classifiers from the three datasets. The results are shown in Fig. 4 . The positions of the pixel classifiers are dependent on the datasets. The small blue and red rectangles in each image indicate the positions of the pixel classifiers that have a gender discrimination capability. In particular, red rectangles indicate the positions extracted by the initial training process. This means these locations have a greater discrimination ability. The yellow rectangles are thought to be the optimized face ROI for each dataset. We fixed the center to the mean of the pixel classifier positions and reduced the rectangle size until it included 90% of the pixel classifiers.
The pixel classifiers in the regions of the inner face, which include the forehead, eyebrow, eyes, glabella, and nasolabial folds, are commonly extracted. In addition, the outer part of the face area covering the hair and shoulder regions also helps classify gender. Although the best proportion of the face region for a dataset validation will be different for each dataset, we fix the face ROI to evaluate the LFW and TIOG datasets. The mean position of the yellow rectangles in Fig. 4 is selected. The final face ROI with respect to the inter-ocular distance is shown in Fig. 5 .
PPT Slide
Lager Image
Positions of pixel classifiers that have gender discrimination capability from (a) LFW, (b) TIOG, and (c) our gender datasets.
PPT Slide
Lager Image
Selection of face ROI for GC evaluation.
- 3. Determining Number of Pixel Classifiers
In this subsection, we look at the relationship between the number of pixel classifiers and the GC performance. We used our gender dataset from the above subsection. The images were cropped to 193 pixels × 193 pixels and resized to 60 pixels × 60 pixels. The number of pixel classifiers is limited to ( n − 2) × ( n − 2) in an n pixel × n pixel analytic window. The MCT codes of all possible pixel positions, however, cannot help in discriminating gender. In particular, the pixel classifiers in the cheek, chin, and some of the background region do not have the ability to discriminate between males and females. Figure 6 illustrates the number of features that can help the GC. Maximally, 3,364 pixels positions can be extracted in the 60 pixel × 60 pixel images, but only about 70% of these feature positions actually have a discriminating capability. An interesting point is that the GC rate using only ten MCT features is 80.96%. When using 2,400 locations, the classification rate is 92.49%. Using a PC with an Intel Core i7 930 CPU and 4 GB of RAM, it took 23 h to extract 2,400 locations, but 1 h for 600 locations, and 1 min for 10 locations. It was expected that extracting more than 2,400 locations would take quite a long time, but these features did not significantly affect the final classification rate.
PPT Slide
Lager Image
Increase in GC rates with number of MCT features.
- 4. Dataset Evaluation
The evaluation results of the LFW and TIOG datasets are described below. We followed the protocols in [14] for the LFW dataset. The fold information of the LFW dataset contains 13,233 images (10,256 male images and 2,977 female images of 4,263 males and 1,486 females), which is the same as the total size of the dataset it self LFW. In the case of the TIOG dataset, the fold information [10] contains 14,760 (7,380 male images and 7,380 female images of different individuals) subsamples out of a total of 28,231 images. The LFW and TIOG evaluation protocol consist of a five-fold cross validation. Therefore, we trained five AdaBoost classifiers and averaged the results for a final evaluation.
Before the AdaBoost training, we prepared positive and negative data samples of the gender database. First, 193 pixels × 193 pixels of eye coordinate–normalized images were obtained from the original images. In these images, the eyes are located at (74, 70) and (118, 70). We made three sets with different image sizes of 23 pixels × 23 pixels, 45 pixels × 45 pixels, and 67 pixels × 67 pixels resized from the eye-normalized images. The inter-ocular distances are about 5, 10, and 15 pixels, respectively. Empirically, in the existing GC studies that did not use the hair region, the face width is about double the inter-ocular distance. In such cases, we can state that our gender classifier uses face widths of 10, 20, and 30 pixels.
We added one mirror image and two alignment variants for every training set to expand the data. We generated variant images by aligning the face images based on randomly perturbed eye locations (one or two pixels) adjacent to the ground truth of the eye positions. This helps with the generalization of the GC from a face alignment perspective.
Table 1 shows the LFW GC results of several methods, including our own. The experiments in [9] , [18] did not use a standard evaluation protocol; therefore, their results are not directly comparable. The proposed MCT/AdaBoost method approached the highest classification rate, despite the low-resolutions of faces. The fold information of the LFW GC benchmarks consists of 10,256 male images and 2,977 female images. The gender distribution of this dataset is highly imbalanced. Therefore, when we used the threshold of maximizing the ACC of the validation sets, the asymmetric results between the TPR and TNR of the proposed method and that of [10] follow similar patterns. The results of the TIOG dataset are shown in Table 2 . All three methods adopt the same evaluation protocol, but use a different face alignment. The proposed MCT/AdaBoost method obtained a higher ACC than the Gabor jets/PCA and SVM in spite of using an input image size of only 16% of the original size. The GC results of the LFW dataset are shown in Fig. 7 .
Gender classification rates (%) of LFW dataset.
Methods (feature/classifier) Threshold ACC TPR TNR Description
(Proposed) MCT/AdaBoost Max (ACC) 90.04 94.96 73.10 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
Min (|TPR-TNR|) 88.20 89.30 84.38 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
Max (ACC) 93.83 97.31 81.83 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels Image size: 45 pixels × 45 pixels Inter-ocular distance: 10 pixels
Min (|TPR-TNR|) 92.45 92.94 90.76 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
Max (ACC) 94.72 97.52 85.05 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels Image size: 67 pixels × 67 pixels Inter-ocular distance: 15 pixels
Min (|TPR-TNR|) 93.02 93.20 92.41 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
Gabor jets/PCA, SVM [10] Min (|TPR-TNR|) 93.02 93.20 92.41 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
LBPs/PCA, W-SVM [10] Min (|TPR-TNR|) 92.60 93.84 89.96 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
LBP / SVM [18] N/A 90.60 N/A N/A Face alignment: unknown Folds division: same with [14] Image size: 75 × 90 Inter-ocular distance: about 25 pixels
Boosted LBP / SVM [09] N/A 94.81 96.94 92.02 Face alignment: commercial S/W Folds division: N/A(his own) Image size: 127 × 91 Inter-ocular distance: 40 pixels
Gender classification rates (%) of TIOG dataset.
Methods (feature/classifier) ACC TPR TNR Description
(Proposed) MCT/AdaBoost 84.10 83.71 84.49 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels
87.35 87.42 87.28 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels Image size: 45 pixels × 45 pixels Inter-ocular distance: 10 pixels
87.92 87.59 88.25 Face alignment: eyes ground-truth Folds division: same with [14] Image size: 23 pixels × 23 pixels Inter-ocular distance: 5 pixels Image size: 67 pixels × 67 pixels Inter-ocular distance: 15 pixels
Gabor jets/PCA, SVM [10] 86.61 87.24 85.98 Face alignment: semi-automatic Folds division: same with [14] Image size: 120 pixels × 105 pixels Inter-ocular distance: 45 pixels
LBP/SVM [18] 82.65 N/A N/A Face alignment: unknown Folds division: same with [14] Image size: 59 pixels × 65 pixels Inter-ocular distance: about 25 pixels
PPT Slide
Lager Image
Some results of proposed GC of LFW dataset (original image): (a) true positive, (b) true negative, (c) false negative, and (d) false positive.
VI. Conclusion
In this paper, GC using MCT-AdaBoost was proposed to classify the gender from a facial image. For a fair comparison with other methods, we followed the standard protocols of the GC community and focused on feature extraction and classification methods. The overall performance of the proposed MCT-AdaBoost is comparable to that of state-of-the-art algorithms such as the SVM, PCA, and LDA classifiers with LBP and Gabor jet features. In addition, our method requires face images of a smaller size than other algorithms, which is an advantage in long-range facial GC applications.
This research was supported by the ICT R&D program of KEIT (10041610, the development of automatic user information (identification, behavior, location) extraction and recognition technology based on perception sensor network (PSN) under real environment for intelligent robot) and the Robot R&D Program funded by MOTIE and KEIT (10041659).
BIO
Corresponding Author kdban@etri.re.kr
Kyu-Dae Ban received his PhD degree in computer software and engineering from the University of Science and Technology, Daejeon, Rep. of Korea, in 2011. He has been a researcher at ETRI since 2011. His research interests include image processing and pattern recognition.
jhkim504@etri.re.kr
Jaehong Kim received his PhD degree in computer engineering from Kyungpook National University, Daegu, Rep. of Korea, in 1996. He has been a researcher at ETRI since 2001. His research interests include elderly-care robotics and social HRI frameworks.
yoonhs@etri.re.kr
Hosub Yoon received his BS and MS degrees in computer science from Soongsil University, Seoul, Rep. of Korea, in 1989 and 1991, respectively. He received his PhD degree in image processing from the Korea Advanced Institute of Science and Technology, Daejeon, Rep. of Korea, in 2003. He joined KIST/SERI, Daejeon, Rep. of Korea, in 1991 and transferred to ETRI in 1999. His research interests include HRI, image processing, audio processing, and pattern recognition.
References
Fellous J. 1997 “Gender Discrimination and Prediction on the Basis of Facial Metric Information,” Vis. Res. 37 (14) 1961 - 1973    DOI : 10.1016/S0042-6989(97)00010-2
Moghaddam B. , Yang M. 2002 “Learning Gender with Support Faces,” IEEE Trans. PAMI 24 (5) 707 - 711    DOI : 10.1109/34.1000244
Baluja S. , Rowley H.A. 2007 “Boosting Sex Identification Performance,” Int. J. Comput. Vis. 71 (1) 111 - 119    DOI : 10.1007/s11263-006-8910-9
Phillips P.J. 1998 “The FERET Database and Evaluation Procedure for Face-Recognition Algorithms,” Image Vis. Comput. 16 (5) 295 - 306    DOI : 10.1016/S0262-8856(97)00070-X
Shakhnarovich G. , Viola P. , Moghaddam B. “A Unified Learning Framework for Real Time Face Detection and Classification,” Proc. IEEE Int. Conf. Autom. Face Gesture Recogn. Washington, DC, USA May 20–21, 2002 14 - 21
Gao W. , Ai H. “Face Gender Classification on Consumer Images in a Multiethnic Environment,” Int. Conf. ICB Alghero, Italy June 2–5, 2009 169 - 178
Huang G.B. 2007 “Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments,”, Tech. Rep. Univ. of Massachusetts Amherst, USA
Gallagher A.C. , Chen T. “Understanding Images of Groups of People,” IEEE Conf. Comput. Vis. Pattern Recogn. Miami, FL, USA June 20–25, 2009 256 - 263
Shan C. 2012 “Learning Local Binary Patterns for Gender Classification on Real-World Face Images,” Pattern Recogn. Lett. 33 (4) 431 - 437    DOI : 10.1016/j.patrec.2011.05.016
Dago-Casas P. “Single- and Cross-Database Benchmarks for Gender Classification under Unconstrained Settings,” IEEE Int. Conf. Comput. Vis. Workshops Barcelona, Spain Nov. 6–13, 2011 2152 - 2159
Zou W. , Yuen P. 2012 “Very Low Resolution Face Recognition Problem,” IEEE Trans. Image Process 21 (1) 327 - 340    DOI : 10.1109/TIP.2011.2162423
Fröba B. , Ernst A. “Face Detection with the Modified Census Transform,” IEEE Int. Conf. Autom. Face Gesture Recogn. Seoul, Rep. Korea May 17–19, 2004 91 - 96
Ojala T. , Pietikäinen M. , Mäenpää T. 2002 “Multi-resolution Grayscale and Rotation Invariant Texture Classification with Local Binary Patterns,” IEEE Trans. PAMI 24 (7) 971 - 987    DOI : 10.1109/TPAMI.2002.1017623
Gehrig T. , Steiner M. , Ekenel H.K. 2011 Draft: Evaluation Guidelines for Gender Classification and Age Estimation KIT – The Research Univ. Helmholtz Association http://fipa.cs.kit.edu/downloads/befitevaluation_guidelines. pdf
Ekenel H.K. 2011 Proposed Benchmarks, Gender Classification KIT – The Research Univ. Helmholtz Association http://fipa.cs.kit.edu/431.php
Itseez 2013 Open Source Comput. Vision (OpenCV) Itseez http://opencv.org/
Mottl V.V. 2010 LFW_EYES The Laboratory of Data Analysis, Tula State Univ. Russia http://lda.tsu.tula.ru/papers/Krasotkina_Akaike-Buenos-Aires.pdf
Ramón-Balmaseda E. , Lorenzo-Navarro J. , Castrillón-Santana M. 2012 “Gender Classification in Large Databases,” Progress Pattern Recogn., Image Anal., Comput. Vis, Appl. 7441 74 - 81