Advanced
Learning Free Energy Kernel for Image Retrieval
Learning Free Energy Kernel for Image Retrieval
KSII Transactions on Internet and Information Systems (TIIS). 2014. Aug, 8(8): 2895-2912
Copyright © 2014, Korean Society For Internet Information
  • Received : January 12, 2014
  • Accepted : June 19, 2014
  • Published : August 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Cungang Wang
School of Computer Science, Liaocheng University Liaocheng 252000, China
Bin Wang
College of Information, Mechanical and Electrical Engineering, Shanghai Normal University Shanghai 200234, China
Liping Zheng
School of Computer Science, Liaocheng University Liaocheng 252000, China

Abstract
Content-based image retrieval has been the most important technique for managing huge amount of images. The fundamental yet highly challenging problem in this field is how to measure the content-level similarity based on the low-level image features. The primary difficulties lie in the great variance within images, e.g. background, illumination, viewpoint and pose. Intuitively, an ideal similarity measure should be able to adapt the data distribution, discover and highlight the content-level information, and be robust to those variances. Motivated by these observations, we in this paper propose a probabilistic similarity learning approach. We first model the distribution of low-level image features and derive the free energy kernel (FEK), i.e., similarity measure, based on the distribution. Then, we propose a learning approach for the derived kernel, under the criterion that the kernel outputs high similarity for those images sharing the same class labels and output low similarity for those without the same label. The advantages of the proposed approach, in comparison with previous approaches, are threefold. (1) With the ability inherited from probabilistic models, the similarity measure can well adapt to data distribution. (2) Benefitting from the content-level hidden variables within the probabilistic models, the similarity measure is able to capture content-level cues. (3) It fully exploits class label in the supervised learning procedure. The proposed approach is extensively evaluated on two well-known databases. It achieves highly competitive performance on most experiments, which validates its advantages.
Keywords
1. Introduction
M otivated by the rapid increasing amount of tremendous digital images, a number of techniques for image store, search and browse have been investigated during past decades [1 , 2] . To retrieve images that might be of interest from a huge database is highly challenging, which requires the service providers annotate the images beforehand. Traditional approaches for image retrieval usually annotate images manually or according to their surrounding text, and then resort to text-based techniques to perform image retrieval. However, these text-based image retrieval approaches are sensitive to the keywords input by the users [3] . To overcome the difficulties existing in text-based image retrieval techniques, content-based image retrieval (CBIR) is proposed [4] . CBIR systems aim to retrieve relevant images for a query image from a given image dataset based on the content- level similarity, which has attracted increasing attention in recent years and is widely applied to diverse areas, such as advertising, entertainment, fashion design and other industrial applications [5] .
CBIR systems are typically comprised of two main components: (a) image representation and feature extraction; (b) image similarity measures defined on the feature space. Usually, for image representation, CBIR systems exploit low-level visual features, e.g., texture, color and shape of images [6 , 7 , 8 , 9 , 10] . Ideally, image features are expected to encode the sematic content of images. With the extracted image features, another important issue is how to measure the similarity between images [4] . Similarity measures target to merge the cues from low-level image features so as to build the content-level connection among images, which is well known as the problem of “reduce semantic gap”. [11] categorized similarity measures into four typical classes: non-parameter test statistic, e.g. χ 2 -statistics; heuristic distance such as Minkowski form distance Lp ; information theory divergence like Kullback-Leibler divergence. However, these similarity measures do not explicitly consider the broad diversities of different image databases. Consqeuently, their adaption abilities to the data distribution varying along with image databases are limited.
To build the content-level connections among images and adapt to image distribution, a promising and high-level perspective is to learn the similarity function from the dataset. Motivated by the above observations, a number of learning based approaches are proposed to attack the problem, where distance metric learning and similarity metric learning [12] are two types of representative approaches. Because distance metric learning can be converted to similarity metric learning, in this paper, we don’t distinguish similarity metric learning from distance metric learning. And, for brevity, we refer to it as similarity learning . Depending on the availability of class label, these approaches fall into two classes: unsupervised similarity learning and supervised similarity learning. The approach proposed in this paper is a type of supervised similarity learning approach.
Unsupervised similarity learning typically constructs a manifold with low dimensionality, in which the geometric relationship between most observed data are largely preserved [12] . Some unsupervised learning approaches leverage eigen-decomposition to obtain a low dimensional embedding of data points that lie on non-linear manifold. Techniques under this criterion include Multiple Dimension Scaling (MDS), Laplacian eigenmap [13] , principle component analysis (PCA) [14] , ISOMAP [15] and local linear embedding (LLE) [16] . Multiple dimension scaling (MDS) [17] finds a rank projection that preserves the dissimilarity defined in pairwise distance matrix. PCA attempts to find a subspace, with the data variance preserved. In contrast with MDS and PCA, LLE can find non-linear structure of the data, under the principle that preserves the local rank relation between data in both the intrinsic space and the embedding space. [45] is an unsupervised similartiy learning method used for image retrieval, which is to project data points into a lower-dimensional space so as to exploit the advantage of multiple kd-trees over low-dimensional data.
Supervised learning approaches learn similarity functions under the criterion that keeps data points within the same class close while separates data points of different classes far away [17 , 18] . The representative approaches include local Fisher discriminant analysis (LFDA) [18] , relevant component analysis (RCA) [20] and local linear discriminative analysis (LLDA) [21] . Among these approaches, the most representative work is proposed by Xing et al. [22] . It formulates distance learning into a constrained convex programming problem, and learns the similarity metric through minimizing the distance between the data points in the equivalent constraints, subject to the constraint that the data points in the inequivalent constraints are well separated [17] . LFDA [19] extends the classical latent discriminant analysis (LDA) to the case that the form of side information is pairwise constraints. The large margin nearest neighbor (LMNN) [23] extends the neighborhood component analysis (NCA) [24] via a maximum margin framework. [43] is a supervised learning approach used to attack image retrieval problem, which learns a linear combination of a set of base kernels by optimising two objective functions that are commonly used in distance metric learning.
The above approaches exhibit a number of advantages in image retrieval. At the same time, probabilistic approaches [25 - 30] show promising performance in a wide rang of applications, especially well known for their great adaption to data distribution. The so called probabilistic similarity learning methods derive the middle level feature and subsequently the similarity measures based on the probabilistic modeling of data. Therefore, they inherit the adaption abilities from probabilistic models, and are able to exploit hidden information inferred by Bayes inference. These approaches, Fisher kernel [26] , probability product kernel [25] , free energy score space (FESS) [27] and posterior divergence (PD) [28] , can be unsupervised or supervised according to the availability of class label [27] . Nevertheless, we note that, these approaches can be further boosted by exploiting the class label when learning probabilistic models as well as similarity measures.
In this paper, we construct a free energy kernel based on the free energy score space [27] , and then propose a similarity learning method for free energy kernel for CBIR. The framework of the proposed approach is graphically illustrated in Fig. 1 . First, we model the probabilistic distribution of low-level image features using Gaussian mixture model (GMM). Second, based on GMM, we derive free energy kernel as a function of image features, mixture indicators and model parameters. At last, a supervised learning method is proposed for free energy kernel, so as to exploit class label. The learned free energy kernel measures the similarity between images. The advantages of the proposed similarity learning approach are threefold: (1) it could fully exploit class label and hidden information while being adaptive to data distribution; (2) the learning method for free energy kernel is very efficient in computation because of the form of the free energy kernel; (3) the proposed learning approach shows highly competitive performance over a set of datasets in image retrieval. The kernel similarity learning approach proposed in this paper could be considered as a type of “metric learning” approach.
PPT Slide
Lager Image
The framework of the proposed approach
The remainder of this paper is organized as follows. Section 2 presents the proposed approach in details. We verify the effectiveness of the kernel learning approach in comparison with the state-of-the-art similarity learning approaches and image retrieval approaches in Section 3. Section 4 draws a conclusion.
2. Learning free energy kernels
This section will present the learning approach for the probabilistic kernel derived from Free Energy Score Space (FESS). We first employ Gaussian Mixture Model (GMM) to model the distribution of image features. The reason for using GMM is that the effectiveness of using GMM for image feature modeling [31] has been extensively verified. Second, we derive the FESS feature mapping based on GMM. Third, construct the free energy kernel based on the FESS feature mapping. Forth, we propose a learning approach for free energy kernel. The mathematical illustration can be found in Fig. 2 . For readability, we make a summation of the involved notations in Table 1 .
PPT Slide
Lager Image
The mathematical illustration of the proposed approach
The mathematical notation list
PPT Slide
Lager Image
The mathematical notation list
- 2.1. Gaussian Mixture Model: A Generative Perspective
First, we introduce Gaussian Mixture Model (GMM) in the generative perspective. It is a probabilistic generative model with hidden variables, composing of multiple mixture centers each of which follows Gaussian distribution. It assumes a generation procedure that, to generate a sample, one first randomly chooses a mixture center and then draws the sample from a Gaussian distribution of this mixture center. It is widely used to model dimension-fixed real-valued data. Let x RD be the observed variable of D -dimension. Specifically, x is the local image feature in this work of image retrieval. Let z = ( z 1 ,⋯, zK ) T be the binary-valued hidden variable indicating which mixture center is selected to generate the samples. That is, zk = 1 if the k -th mixture center is selected to generate the samples and zk = 0 otherwise. Typically, the probabilistic distribution over z is chosen to be Multinomial distribution,
PPT Slide
Lager Image
where ak = E P(z) [ zk ], ak ∈ [01], and
PPT Slide
Lager Image
ak = 1. Let u k be the mean vector and Σ k be the covariance matrix of the k -th Gaussian distribution for the k -th mixture center. Then the distribution over x , conditioned on the hidden variable z can be written as,
PPT Slide
Lager Image
Combining P ( z ) and P ( x | z ), then the joint distribution of GMM can be expressed as,
PPT Slide
Lager Image
where θ = { u k k , ak } K k=1 . For computational efficiency, we assume that the covariance matrixes Σ k are diagonal, i.e., Σ k = diag( σ 2 k1 ,⋯ σ 2 kD ). Note that, in real applications [31] , this assumption will not bring negative effect to the performance of GMM.
- 2.2. Variational Inference and Parameter Estimation
It is worth noting that the log likelihood function P ( x | θ ) =
PPT Slide
Lager Image
P ( x,z | θ ) is difficult to be maximized. A more sophisticated approach is the Variational Expectation Maximization (VEM) algorithm which alternatively maximizes the log likelihood function over the training set with respect to the posterior distribution of z (E-step or inference step) and the parameters (M-step or parameter estimation step). Let Qc ( z ) be the posterior approaching to P ( z | x c ), then we have the following variational lower bound,
PPT Slide
Lager Image
Assuming that the posterior for the sample x c takes the same form with its prior but with different parameter Qc ( z ) = Π K k=1
PPT Slide
Lager Image
[28] , the E-step updates the posterior of the hidden variable, for each observed sample x c of the training set X = { x 1 , L , x N },
max gcL ( gc ), s . t .
PPT Slide
Lager Image
gck = 1 ⇒ max gc,λ f ( gc , λ ) = L ( gc )+ λ (
PPT Slide
Lager Image
gck -1)
PPT Slide
Lager Image
= 0 ⇒ log N ( x c ; u k k )+log
PPT Slide
Lager Image
-1+ λ = 0 ⇒ gck =
PPT Slide
Lager Image
  • = 0 ⇒gck-1 = 0 ⇒ exp(1-λ) =αkN(xc;uk,Σk)
Then we have,
PPT Slide
Lager Image
where λ is a multiplier. The M-step updates the parameters of GMM,
PPT Slide
Lager Image
The expression of ak is actually the average value of posterior probabilities gck across samples. Similarly we have,
PPT Slide
Lager Image
Here, u k and σ 2 kd are the weighted mean and variance, where gck weights the contribution of the sample x c to the k -th mixture center. The learning algorithm for GMM is the iteration of the E-step and M-step, which is summerized in Algorithm 1 .
PPT Slide
Lager Image
- 2.3. Free energy feature mapping
We now proceed to derive the free energy score function [27] based on GMM and then the kernel based on the score function. Having the lower bound Fc for log P ( x c | θ ) in Eq. (4), we have the following decomposition,
PPT Slide
Lager Image
The elements of free energy score function are the summation terms of the above variational lower bound, and can be divided into three groups [27] ,
PPT Slide
Lager Image
where the fit group measures how well the sample fits the model, and the ent group measures the uncertainty in the fitting. We note that the elements of free energy score function are the expectation of the functions over the observed variable x , hidden variables z and model parameters θ . The hidden variables enable free energy kernel the ability to exploit hidden information, and model parameters enable it ability to adapt to data distribution. The free energy score function is the combination of the above functions,
PPT Slide
Lager Image
- 2.4. Learning free energy kernel
Having the score functions or feature mappings for image patches, we now proceed to define the kernel similarity function for images. The above modeling the score function works with image patches. Note that, each image contains a set of image patches, each of which has a corresponding free energy score feature. The distribution of these score features for an image encodes the information of the image and is able to identify the image. We follow an effective and widely used strategy [31] that uses the first order statistics, i.e., the mean of these score features, as the feature of images.
PPT Slide
Lager Image
where Փ( x ic ) is the feature mapping for the c -th patch of i -th image. Let y i = ( yi 1 , L , yiC ) T be the label vector for the image I i , where yic = 1 iif the c -th label of all C ones belongs to the image I i and yic = 0 otherwise. Then the kernel similarity of two images, simultaneously considering image and its corresponding label, can be defined as follows,
PPT Slide
Lager Image
where KI ( I i , I j ) is the kernel similarity without taking class label into account; w ( y i , y j ) is a weight function depending on the similarity of the two label vectors, and is expected to take a positive value if they have shared labels and to take a negative value if they have no shared label. Here we choose the following sigmoid based function:
PPT Slide
Lager Image
where y iT y j is the number of labels shared by image I i and I j ; a , b , u , v are parameters to be determined. The function is illustrated in Fig. 3 . In the following part, we will show how to determine these parameters.
PPT Slide
Lager Image
Illustration of the weight function w(yi,yj), where a = 1.5,b = 1,u = 2, v = 1.
We consider the 1-nearest neighbor criterion [32] which favors higher similarity for images with more shared class label and favors lower similarity for images with less shared labels. The the objective function can be expressed as,
PPT Slide
Lager Image
The above objective function can be maximized using gradient descend algorithms. The gradient of O with respect to a , b , u , v , θ are as follows,
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
The learning approach is an iteration procedure, which is summarized in Algorithm 2 .
PPT Slide
Lager Image
Here we make a summary. In the training step, the label vectors for training samples are available, and thus w ( y i , y j ) is available. This model can be trained using the approach described above. In the test step, the label vectors for test samples are no longer available, and thus w ( y i , y j ) is unavailable. In this situation, we treat w ( y i , y j ) = 1 and run the regular retrieval using the parameters learned in the training procedure. It is worth noting that, the reason for introducing w ( y i , y j ) is to exploit label information when learning the generative model θ , which is essentially a discriminative learning approach (supervised) for generative model as well as FESS, differing from the native FESS where the label information is absent (unsupervised). Namely, θ is determined by x,y in our proposed approach and is determined by x in FESS.
3. Experiments
This section will experimentally validate the effectiveness of our proposed similarity learning approach, by comparing our approach with state-of-the-art approaches for CBIR over two popular databases across different evaluation criteria.
- 3.1. Databases
Two popular databases, Wang’s [33 , 34] and Caltech 101 [35 , 36] , are chosen for experimental evaluation.
Wang’s database 1 [33 , 34] contains 1,000 challenging images selected from Corel database. The database is composed of images with various contents, ranging from natural images to animals. It contains images with the size of 256 × 384 and 384 × 256. The database is divided into 10 groups each of which contains 100 images. The images in the same group are considered to be similar. The group names are respectively African people village, beach, Building, buses, dinosaurs, elephants, flowers, horses, mountains and glaciers, and food. Some sample images from all the 10 categories in the Wang’s database are shown in Fig. 4 .
PPT Slide
Lager Image
Sample images of Wang’s database.
Caltech-101 database [35] is composed of 9,196 images, which is often used for larger scale experiments. The images in the database are categorized into 101 categories, for example, beaver, ant, crayfish, dolphin and llama, etc. The number of images in the database varies along category from 31 to 800. Most of the images are medium resolution, about 300×300 pixels [37] . The Caltech-101 database is probably the most diverse database available today. Some sample images from certain categories of Caltech-101 database are shown in Fig. 5 .
PPT Slide
Lager Image
Sample images of Caltech-101 database.
It is worth noting that, in Wang’s dataset, each image has multiple labels. For this dataset, images belonging to a certain category not necessarily have the same label vector and w ( y i , y j ) not necessarily equals 1. On contrary, in Caltech 101, each image has only one label. For this dataset, images belonging to a certain category have the same vector and w ( y i , y j ) = 1.
The most important parameter in the proposed approach is the number of mixture centers of GMM. In general, GMM with small number of mixture centers tends to lose information and discrimination ability because it in this case is not capable enough to model the distribution of data. On the other hand, GMM with large number of mixture centers tends to procedure high dimension feature space that leads to poor generalization ability according to generalization theory, and therefore suffers from the so called “curse of dimensionality”. Subsequently, it is of great importantance to determine an appropriate number of mixture centers. In this paper, we use cross validation over the range of [20,260] with a step of 20 to choose the parameter, and find that, a wide range of about [60,160] could produce satisfied results. Moreover, we also found that two primary factors dominate the number of mixture centers, (1) the number of mixture centers is generally proportional to the number of categories of the dataset in certain range; (2) the number of training samples in the feature spaces.
- 3.2. Image Representation
To cover the diverse visual attributes within images, we use a set of comprehensive features to represent the images. More specifically, we use multiple color SIFT descriptors for representation, due to their state-of-the-art performance in image retrieval and recognition [38] . Following the recommendation by [38] , we use OpponentSIFT, C-SIFT, rgSIFT and RGB-SIFT. These color SIFT descriptors are extracted from the patches sampled by dense sampling and Harris-Laplace point sampling, followed by spatial pyramid. For dense sampling, descriptors are extracted around the points of a grid with the step size of 4 pixels. These descriptors are computed from three different scales: 16×16, 24×24 and 32×32.
- 3.3 Evaluation criteria
To comprehensively evaluate the proposed approach, we use the following criterions to measure image retrieval approaches:
Average Precision (AP) [33 , 39] : the average of the precision values at the ranks where relevant images appear. Specifically, for a query image Iq , the precision (P) and recall (R), as two most commonly used criteria in image retrieval system, can be defined as follows: P ( Iq ) = nq / L and R ( Iq ) = nq / N , where L is the number of retrieved images; nq is the number of images relevant to the query image in the retrieved images; N is the number of relevant images in the database. Finally, the average precision (AP) and average recall (AR) are computed over all reference images.
Average Retrieval Precision (ARP) [33] : the average precision of the retrieval results of the various images with the number of returned images. It is worth noting that ARP is obtained by means of computing the average precision versus the number of searched images. That is to say, to obtain ARP graph, we calculate the precision for different numbers of retrieved images [34 , 40] .
Average Retrieval Rate (ARR) [33] : the average recall of the retrieval results of the various images with the number of returned images. Similar with ARP graph, to obtain ARR graph, recall values are calculated for varied number of retrieved images [34 , 40] .
- 3.4 Experimental results
- 3.4.1. Experiments on Wang’s database
The first experiment is performed on Wang’s database. This database is thought to meet all the requirements of evaluating the image retrieval systems, because of its diversity in content. The performance criterions in this experiment include average precision, average recall and average retrieval rate. The detailed definition can be found in [33] .
In each round of experiment, 20% samples are randomly chosen from the database to form the training set and the rest 80% samples to form the test set. In our experiment, each image is used as a query image for evaluation. We firstly carry out the experiment to compute the precision P of every query image with setting the number L of returned retrieved images as 20, and finally obtain the average precision. The total average recall is obtained in the same manner with the number of returned images set to be 100. In this experiment, the Euclidean distance is still a baseline method. Other comparison approaches include motif cooccurence matrix (MCM) [41] , large margin nearest neighbor classification (LMNN) [23] , CTCHIRS [34] , semi-supervised distance metric learning( defined as SS ) [42] , multiple kernel learning via distance metric learning (defined as MKL) [43] , and FESS [27] . Among these approaches, MCM and CTCHIRS are two state-of-the-art image retrieval methods, SS and MKL are two distance metirc learnig-based approaches used for image retrieval, LMNN is a supervised similarity learning approach, FESS is the probabilistic similarity learning methods closely related to our approach. The experimental results are summarized in Table 2 and Table 3 . We find that, MCM and LMNN gain significant improvement over the baseline method. Due to the adoptionof an optimal feature selection technique, CTCHIRS obtains a better performance for image retrieval than MCM and LMNN. As two distance metric learning approaches, the performance of MKL is better than SS. The reason is that MKL learns a linear combination of a set of base kernels by optimising two objective functions that are commonly used in distance metric learning [43] . From Table 2 and Table 3 , we can observe that the proposed approach obtains the best average precision than other comparison methods. Specifically, comparing with SS, which is a representative distance metric learning approach used for image retrieval, the proposed method achieves 5.2% improvement in AP and 3.5% improvement in AR respectively. Moreover, in comparison with FESS which is most closely related to our approach, we achieve 1.6% improvement in AP and 1.7% improvement in AR respectively. The reason for this observation is that our approach fully exploits class label which is very informative for image retrieval task. Fig. 6 illustrates the average precision and the average recall of the retrieval results of various images with the number of retrieved images respectively. The experimental results clearly present that, for the first 20 to 100 retrieved images of the 1000 ten-category image database, our approach consistently outperforms the other methods. In the average recall experiment (ARR), the precision of image retrieval increases with the number of retrieved images. So, our approach is superior to other models.
The average precision (%) of comparison approaches on Wang’s database
PPT Slide
Lager Image
The average precision (%) of comparison approaches on Wang’s database
The average recall (%) of comparison approaches on Wang’s database
PPT Slide
Lager Image
The average recall (%) of comparison approaches on Wang’s database
PPT Slide
Lager Image
The average retrieval precision (top) and average retrieval rate (bottom) of these approaches on Wang’s dataset.
- 3.4.2. Experiments on Caltech-101 database
To further validate the abilities of our approach in adapting different databases and in scaling to larger database, we further evaluate the proposed approach on Caltech101 database. For confident conclusion, we repeatedly run the experiment and report the average results. In each round of experiment, 20% samples are randomly chosen from the database to formthe training set and the rest 80% samples to form the test set. It is worth noting that, the training set is used to learn GMM as well as free energy kernel.
The experiment is performed using each image of each category as a query image. We carry out the experiment with setting the number of returned images as 20 to calculate the precision P for each query, and finally get the average precision P / Nc ( Nc images per category). The experimental results over Caltech101 database are reported in Table 4 . Different from that on Wang’s database, we compare with Xing’s method [22] , DML-eig [44] , large margin nearest neighbor (LMNN) [23] , semi-supervised distance metric learning( defined as SS ) [42] , multiple kernel learning via distance metric learning (defined as MKL) [43] , and free energy score space (FESS) [27] . Euclidean distance is still included as a baseline method here. It’s worth noting that FESS is the basis of our approach. It is of interest to find that, the relative comparison results are close to that over Wang’s database, which indicates that the results are stable across two databases. As shown in Table 4 , FESS, as an unsupervised similarity learning approaches derived from probabilistic models, shows highly competitive performance against other comparison methods. Our proposed approach again achieves improvement over all the other compared approaches with distinct methodologies. A reason accounting for the results is that, it incorporates different content level information together to form a comprehensive similarity for image retrieval.
The average precision of these approaches on Caltech-101dataset.
PPT Slide
Lager Image
The average precision of these approaches on Caltech-101dataset.
available at http://wang.ist.psu.edu/docs/related/
4. Conclusions
In this paper, we propose a free energy kernel based on the well-known free energy score space (FESS), and then learn the derived kernel in a supervised manner. Specifically, we first model the distribution of image features using GMM. Second, we derive a free energy kernel from GMM, which is a function of image feature, mixture indicator and model parameter. Third, we propose a supervised learning approach for the free energy kernel to exploit label information. The experimental results on two databases demonstrate that the proposed approach is superior to other comparison approaches for the content-based image retrieval task.
BIO
Cungang Wang received his B.E. degree in educational technology from Liaocheng University, Liaocheng, China, in 2000, and his M.S. degree in computer science and technology from Ocean University of China, Qingdao, China, in 2006. He worked as a lecturer in the School of Computer Science at Liaocheng University since 2008. His research interests include machine learning, image processing, pattern recognition.
Bin Wang received her BE degree in information and computional science from Shandong Normal University, Ji’nan, China, in 2006; and the MS degree in applied mathmatics from University of Science and Technology Beijing, Beijing, China. She received her PhD degree in 2014 from Department of Automation, Shanghai Jiao Tong University, Shanghai, China. She is currently an assistance professor in the College of Information, Mechanical and Electrical Engineering, Shanghai Normal University. Her research interests include computer vision, machine learning, image processing, multimedia analysis.
Liping Zheng received her Ph.D. degree from Tongji University, Shanghai, China. She is now an associate professor in School of Computer Science, Liaocheng University, Shandong, China. Her Research directions include image processing and artificial intelligence.
References
Cerra D. , Datcu M. 2012 “A fast compression-based similarity measure with applications to content-based image retrieval” Journal of Visual Communication and Image Representation Article (CrossRef Link). 23 (2) 293 - 302    DOI : 10.1016/j.jvcir.2011.10.009
Wang B. , Shen Y. , Liu Y. 2011 “Integrating distance metric learning into label propagation model for multi-label image annotation” IEEE in Proc. of IEEE International Conference on Image Processing Article (CrossRef Link). 3649 - 3652
ElAlami M. E. 2011 “A novel image retrieval model based on the most relevant features” Knowledge-Based Systems Article (CrossRef Link). 24 (1) 23 - 32    DOI : 10.1016/j.knosys.2010.06.001
Ziou D. , Hamri T. , Boutemedjet S. 2009 “A hybrid probabilistic framework for content-based image retrieval with feature weighting” Pattern Recognition Article (CrossRef Link). 42 (7) 1511 - 1519    DOI : 10.1016/j.patcog.2008.11.025
Arevalillo-Herr´aez M. , Ferri F. , Domingo J. 2010 “A naive relevance feedback model for content-based image retrieval using multiple similarity measures” Pattern Recognition Article (CrossRef Link). 43 (3) 619 - 629    DOI : 10.1016/j.patcog.2009.08.010
Bian W. , Tao D. 2010 “Biased discriminant Euclidean embedding for content-based image retrieval” IEEE Transactions on Image Processing Article (CrossRef Link). 19 (2) 545 - 554    DOI : 10.1109/TIP.2009.2035223
Jain A. K. , Vailaya A. 1996 “Image retrieval using color and shape” Pattern Recognition Article (CrossRef Link). 29 (8) 1233 - 1244    DOI : 10.1016/0031-3203(95)00160-3
Oliva A. , Torralba A. 2001 “Modeling the shape of the scene: A holistic representation of the spatial envelope” International Journal of Computer Vision Article (CrossRef Link). 42 (3) 145 - 175    DOI : 10.1023/A:1011139631724
Swain M. J. , Ballard D. H. 1991 “Color indexing” International Journal of Computer Vision Article (CrossRef Link). 7 (1) 11 - 32    DOI : 10.1007/BF00130487
Kim K. , Hasan M. , Heo J. , Tai Y. , Yoon S. 2012 “Probabilistic cost model for nearest neighbor search in image retrieval” Computer Vision and Image Understanding Article (CrossRef Link).
Puzicha J. , Buhmann J. M. , Rubner Y. , Tomasi C. 1999 “Empirical evaluation of dissimilarity measures for color and texture” IEEE in Proc. of Proceedings of IEEE International Conference on Computer Vision Article (CrossRef Link). 1165 - 1172
Yang L. , Jin R. , Mummert L. , Sukthankar R. , Goode A. , Zheng B. , Hoi S. , Satya-narayanan M. 2010 “A boosting framework for visuality-preserving distance metric learning and its application to medical image retrieval” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link). 32 (1) 30 - 44    DOI : 10.1109/TPAMI.2008.273
Belkin M. , Niyogi P. 2003 “Laplacian eigenmaps for dimensionality reduction and data representation” Neural Computation Article (CrossRef Link). 15 (6) 1373 - 1396    DOI : 10.1162/089976603321780317
Webb A. 2003 Statistical pattern recognition Wiley Article (CrossRef Link).
Tenenbaum J. B. , De Silva V. , Langford J. C. 2000 “A global geometric framework for non-linear dimensionality reduction” Science Article (CrossRef Link). 290 (5500) 2319 - 2323    DOI : 10.1126/science.290.5500.2319
Saul L. K. , Roweis S. T. 2003 “Think globally, fit locally: unsupervised learning of low dimensional manifolds” The Journal of Machine Learning Research Article (CrossRef Link). 4 119 - 155
Yang L. , Jin R. 2006 “Distance metric learning: A comprehensive survey” Michigan State Universiy Article (CrossRef Link). 1 - 51
Wang B. , Liu Y. 2013 “Collaborative similarity metric learning for semantic image annotation and retrieval” KSII Transactions on Internet & Information Systems Article (CrossRef Link). 7 (5)
Sugiyama M. 2006 “Local fisher discriminant analysis for supervised dimensionality reduction” ACM in Proc. of International Conference on Machine learning Article (CrossRef Link). 905 - 912
Bar-Hillel A. , Hertz T. , Shental N. , Weinshall D. 2003 “Learning distance functions using equivalence relations” in Proc. of International Conference on Machine Learning 20(1), Article (CrossRef Link). 11 -
Hastie T. , Tibshirani R. 1996 “Discriminant adaptive nearest neighbor classification” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link). 18 (6) 607 - 616    DOI : 10.1109/34.506411
Xing E. , Ng A. , Jordan M. , Russell S. 2002 “Distance metric learning, with application to clustering with side-information” Advances in Neural Information Processing Systems Article (CrossRef Link). 15 505 - 512
Blitzer J. , Weinberger K. Q. , Saul L. K. 2005 “Distance metric learning for large margin nearest neighbor classification” Advances in Neural Information Processing Systems Article (CrossRef Link). 1473 - 1480
Goldberger J. , Roweis S. , Hinton G. , Salakhutdinov R. 2004 “Neighbourhood components analysis” Advances in Neural Information Processing Systems Article (CrossRef Link).
Jebara T. , Kondor R. , Howard A. 2004 “Probability product kernels” The Journal of Machine Learning Research Article (CrossRef Link). 5 819 - 844
Jaakkola T. , Haussler D. 1999 “Exploiting generative models in discriminative classifiers” Advances in Neural Information Processing Systems MIT Article (CrossRef Link). 487 - 493
Perina A. , Cristani M. , Castellani U. , Murino V. , Jojic N. 2012 “Free energy score spaces: Using generative information in discriminative classifiers” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link). 34 (7) 1249 - 1262    DOI : 10.1109/TPAMI.2011.241
Li X. , Lee T. , Liu Y. 2011 “Hybrid generative-discriminative classification using posterior divergence” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition Article (CrossRef Link). 2713 - 2720
Li X. , Liu Y. , Lee T. 2012 “Stochastic feature mapping for PAC-Bayes classification” arXiv:1204.2609, Article (CrossRef Link).
Li X. , Zhao X. , Fu Y. , Liu Y. 2010 “Bimodal gender recognition from face and fingerprint” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition Article (CrossRef Link). 2590 - 2597
Chatfield K. , Lempitsky V. , Vedaldi A. , Zisserman A. 2011 “The devil is in the details: an e-valuation of recent feature encoding methods” in Proc. of British Machine Vision Conference Article (CrossRef Link).
Maaten Laurens van der 2011 “Learning discriminative fisher kernels” in Proc. of International Conference on Machine Learning Article (CrossRef Link). 217 - 224
Subrahmanyam M. , Maheshwari R. , Balasubramanian R. 2012 “Expert system design using wavelet and color vocabulary trees for image retrieval” Expert Systems with Applications Article (CrossRef Link). 39 (5) 5104 - 5114    DOI : 10.1016/j.eswa.2011.11.029
Lin C.-H. , Chen R.-T. , Chan Y.-K. 2009 “A smart content-based image retrieval system based on color and texture feature” Image and Vision Computing Article (CrossRef Link). 27 (6) 658 - 665    DOI : 10.1016/j.imavis.2008.07.004
Fei-Fei L. , Fergus R. , Perona P. 2007 “Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories” Computer Vision and Image Understanding Article (CrossRef Link). 106 (1) 59 - 70    DOI : 10.1016/j.cviu.2005.09.012
Holub A. D. , Welling M. , Perona P. 2005 “Combining generative models and fisher kernels for object recognition” IEEE in Proc. of IEEE International Conference on Computer Vision 1, Article (CrossRef Link). 136 - 143
Lazebnik S. , Schmid C. , Ponce J. 2006 “Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories” IEEE in Proc. of IEEE Conference on Computer Vision and Pattern Recognition 2, Article (CrossRef Link). 2169 - 2178
Sande K. Van De , Gevers T. , Snoek C. 2010 Evaluating color descriptors for object and scene recognition IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link). 32 (9) 1582 - 1596    DOI : 10.1109/TPAMI.2009.154
Wang X. , Yang M. , Cour T. , Zhu S. , Yu K. , Han T. 2011 “Contextual weighting for vocabulary tree based image retrieval” in Proc. of IEEE International Conference on Computer Vision Article (CrossRef Link). 209 - 216
Rashedi E. , Nezamabadi-Pour H. , Saryazdi S. 2013 “A simultaneous feature adaptation and feature selection method for content-based image retrieval systems” Knowledge-Based Systems Article (CrossRef Link). 39 85 - 94    DOI : 10.1016/j.knosys.2012.10.011
Jhanwar N. , Chaudhuri S. , Seetharaman G. , Zavidovique B. 2004 “Content based image retrieval using motif co-occurrence matrix” Image and Vision Computing Article (CrossRef Link). 22 (14) 1211 - 1220    DOI : 10.1016/j.imavis.2004.03.026
Zhang S , Yang M , Cour T , Yu K , Metaxas DN 2010 “Semi-supervised distance metric learning for collaborative image retrieval and clustering” ACM Transactions on Multimedia Computing, Communications and Applications Article (CrossRef Link).
He X. 2011 “Multiple kernel learning via distance metric learning for interactive image retrieval” Multiple Classifier Systems Article (CrossRef Link). 6713 147 - 156
Ying Y. , Li P. 2012 “Distance metric learning with eigenvalue optimization” The Journal of Machine Learning Research Article (CrossRef Link). 13 1 - 26
Wu P , Hoi SCH , Nguyen DD , He Y 2011 “Randomly Projected KD-Trees with Distance Metric Learning for Image Retrieval” Advances in Multimedia Modeling Article (CrossRef Link). 371 - 382