It is a challenging problem to search the intended images from a large number of candidates. Content based image retrieval (CBIR) is the most promising way to tackle this problem, where the most important topic is to measure the similarity of images so as to cover the variance of shape, color, pose, illumination etc. While previous works made significant progresses, their adaption ability to dataset is not fully explored. In this paper, we propose a similarity learning method on the basis of probabilistic generative model, i.e., probabilistic latent semantic analysis (PLSA). It first derives Fisher kernel, a function over the parameters and variables, based on PLSA. Then, the parameters are determined through simultaneously maximizing the log likelihood function of PLSA and the retrieval performance over the training dataset. The main advantages of this work are twofold: (1) deriving similarity measure based on PLSA which fully exploits the data distribution and Bayes inference; (2) learning model parameters by maximizing the fitting of model to data and the retrieval performance simultaneously. The proposed method (PLSAFK) is empirically evaluated over three datasets, and the results exhibit promising performance.
1. Introduction
T
he last decade has seen the increasing popularity of digital images, especially along with the development of Internet. How to search images according to users’ intention from a large number of candidates has been an important yet challenging problem
[1

9]
. In real application, there are two main cases. The first one is “retrieval by text”, i.e., the users describe their intention through keywords and demand images matching these keyworks. This is trackled by image annotation tecniques, i.e., assigning keyworks to candidate images and thus convert the problem to text matching, which is popular in commercial search engines. The second is “retrieval by image”
[1
,
3
,
4
,
5]
, i.e., the useres input a sample image and expect to get those images like the input one. The methods for this case is mainly so called image similarity or distance learning. In this work, we focus on the second case since it is more descriptive and requires no keyword or textural metadata to describe the image. This case is also referred to as content based image retrieval (CBIR). Note that, although user click of search engine can significantly improve the retrival performance
[8

10]
, we in this work foucs on the visual channel and the proposed method can be extended to work with user click information.
As the most important component of CBIR systems, image similarity or distance measure greatly determines the retrieval performance
[1
,
5]
. Noticing the fact that the similarity measure and distance measure are technically convertible and functionally equal, we in this work stick to similarity measure, but the proposed method could be directly applied to distance measure learning. In the technical perspective, similariyt measure has two important factors. The first factor is the image feature representation. A satisfied image feature is expected being robust to illumination, pose, shape and other variances, and simutaniouly being effective in capturing useful information, i.e., reaching balance between robustness and selectivity. It is worth noting that the evaluation of feature representation is taskspecific. Therefore, it would be important to select or learn feature representation under the criterion of the task. The second factor is the similarity function which is a function defined over the feautre space, outputing large value for a similar pair and small value for a distinct pair. A typical kind of similarity measure is the predefined similarity measures
[11]
, such as L1 distance and Euclidean distance. This kind of predefined similarity measures, however, are usually not adaptive enough to data distribution
[1]
. Alternatively, similarity learning methods
[12

17]
have been proposed, to improve the adaption ability to data distribution. The method proposed in this work belong to this branch.
Similarity learning methods in general fall to three typical categories, unsupervised method, semisupervised method and supervised method. Unsupervsed learning methods
[18
,
19
,
20]
seek to find a feature space and a similarity measure for the given data, without taking class label into account. The typical methods include subspace based methods, e.g. locally linear embedding (LLE)
[18]
, nonnegative matrix factorization (NNMF)
[19]
, probabilistic model based methods
[20
,
21
,
22
,
23
,
24
,
25]
. Among them, probablistic model based similarity shown promsing performance and received increasing attention. These methods formulated the feature mapping
[24
,
25]
or similarity measure
[20
,
21
,
22]
based on the probabilistic model. Thus, they inherited the abilities of probabilistic model and exhibited adaption ability to data distribution. These methods include probability product kernels
[21]
, Kullback Leibler divergence based similarity
[22]
, Fisher kernel
[20]
, free energy score space
[24]
. These methods are useful when the class label is missed or is expensive to otain.
Semisupervised learning methods
[11
,
14]
make use of both labeled data and unlabeled data, laying somewhere between unsupervised learning and supervised learning. They are highly effecitive when the number of labeled data is limited and the number of unlabeled data is easy to access. On the other hand, supervised learning methods learn similarity measure by exploiting class label, seeking to find a similarity function that outputs large value for images with the same labels and outputs small value for images with distinct labels. Popular methods include large margin nearest neighborhood (LMNN)
[26]
, local distance metric learning (LDML)
[12]
, linear transformation based metric learning (LTML)
[27]
and discriminative Fisher kernel learning
[28]
etc. These methods however not fully exploit data distribution and hidden information which will potentially improve the adaption ability and effectiveness of similarity measure.
To fully exploit data distribution, hidden information and class label for similarity learning, we in this paper propose an approach based on probabilistic latent semantic analysis (PLSA)
[29]
and Fisher kernel
[20]
. The proposed approach, referred to as PLSAFK, uses bagofwords feature represention for images, where the visual words are quantified from local descriptor, and then leverage PLSA to model the distribution of the visual words. On the basis of PLSA model, it then derives the Fisher kernel which is a function over the parameters and variables of PLSA. To exploit class label, i.e., tuning the similarity measure as well as the PLSA model to have good retreival performance, we developed a supervised learning approach for the PLSA based Fisher kernel. The motivations of the proposed method are twofold. First, exploit the semantic information by means of coupling with PLSA which is able to infer topic. Second, exploit the label information which is informative for retrieval. The proposed method has three main advantages. First, probabilistic models could well adapt to data distribution. Second, PLSA can exploit the ability of Bayes inference. Third, the supervised learning method can tune the similarity and model according to the retrieval performance.
The remaining part of this paper is organized as follows. Section 2 revisits the related works of similairty learning. Section 3 proposes our approach, PLSA based similarity learning. Section 4 experimentally evaluates the proposed approach over the real databases. Section 5 draws a conclusion.
2. Related Works
There are a number of works have contributed to similarity learning
[30

33
,
14

17]
and to content based image retrieval
[4
,
5
,
13
,
11
,
1]
. We in this work attempt to make a progress on the adaption ability, and thus naturally focus on supervised similarity learning approaches, and probabilistic model based approaches. For other related approaches, see references for a details. It is worth noting that, similarity learning and distance learning are essentially equal because they are convertable. Thus, we in this work treat them as the same notation.
Supervised similarity measure learning attempts to learn a similarity measure from a set of equivalence constraints for image pair within the same class, and inequivalence constraints for image pair of the different classes. The similarity measure is determined under the criterion that keeps images in equivalence constraints close and images in inequivalence constraints separated. A number of recent works attempted to cooperate relevance feedback
[3
,
30]
, dimensionality reduction
[31]
, Bayesian inference
[34]
and kernel method
[27]
.
[32]
casted the problem into a constrained convex optimization problem by minimizing the pairwise distance in the same classes so that images of different classes are well separated. Discriminative component analysis (DCA)
[2]
incorporated equivalence constraints for similarity learning. Largemargin nearest neighbor (LMNN)
[26]
took the class margin into account. SDPM
[35]
formulated Mahalanobis distance learning as a convex optimization problem. Distance metric learning with eigenvalue optimization (DMLeig.)
[36]
casted distance learning problem as a eigenvalue optimization problem.
[33]
learnt local perceptual distance function which is a combination of a set of local distance functions.
[37]
proposed to learn the Mahalanobis distance function subject to a set of pairwise constraints, i.e., mustlinks that associate images which must be in the same class and cannotlinks that associate images which must be in different classes.
[38]
made ues of context information to learn similarity measures.
[39
,
16]
leveraged discriminative learning techniques to learn similarity measure.
Probabilistic similarity methods formulate explicit feautre space or similarity measure based on the quantities of adopted probabilistic models. Probability product kernels
[21]
used the posterior distributions of hidden variables to characterize the samples, and define the similarity measure as the expectation of the inner product of the hidden variables, with respect to the posterior distribtuions.
[22]
used distributions to characterize the samples and uses Kullback–Leibler divergence over those distributions to measure the distance between samples.
[23]
developed a hierarchical probabilistic model to learn image representation and similarity. Fisher score (FS)
[20]
derived feature mapping by considering how the samples affect the model parameters, and defined the similarity, i.e., Fisher kernel, as the inner product of the feature mappings of samples. Free energy score space (FESS)
[24]
and posterior divergence (PD)
[25]
extended Fisher score by exploring more informative measures. These approaches are able to exploit information from probabilistic models, they however can be further boosted through fully exploiting the class label, by fitting the similarity measure to the retrieval performance. Dsicriminative Fisher kernel learning (DFK)
[28]
extents Fisher kernel to cooperate class label, where Gaussian mixture model is used to model the distribution of visual features. It does not utiliize semantic level information in an explicit way.
Recently, numerous works introduced a variety of techniques for similarity learning.
[40
,
6]
used graph to represent data and formulated similarity learning problem as a graph learning problem.
[9
,
8]
exploited user click information in real retrieval systems to boost the retrieval performance.
[15]
learned distance subject to the criterion that the semantic information is preserved.
[17]
proposed to learn the similarity by considering the neighborhood structure.
The work in this paper is based on the score space methods which have been validated to be very competitive in a wide range of applications
[24
,
25
,
28]
. The advantages of the proposed method are mainly twofold: (1) compared with deterministic similarity learning methods, our method fully exploits data distribution information and semantic level hidden variables by means of Bayesian inference; (2) compared with probabilistic similarity learning method, our method provides a sophisticated way to utilize class label.
3. Learning Fisher Kernel with PLSA
In this section, we will proceed to derive the Fisher kernel based on PLSA and propose a supervised learning approach for the derived kernel. First, we use PLSA to model the distribution of visual words, for its popularity and effectiveness in image modeling
[41]
. Then we derive the Fisher kernel
[20]
based on PLSA. At last, we propose a supervised learning method for Fisher kernel. See
Fig. 1
for the illustration and
Table 1
for the notations, of the proposed method.
The framework of our proposed approach PLSAFK.
Mathematical notations involved in this work
Mathematical notations involved in this work
 3.1. Probabilistic Latent Semantic Analysis
In this work, we utilize Probabilistic Latent Semantic Analysis (PLSA)
[29]
to model the distribution of images represented in bag of visual words quantized from image features. The effectiveness of PLSA in image and text representation has been extensively verified
[41]
.
PLSA is a probabilistic generative model orignally developped for text analysis
[29]
. Sepecifically, it could discover the semantic topics hidden in documents using the bag of words representation. PLSA can be also applied to image analysis, since images can also use bag of words representation where an image patch is quantified to a visual word.
Let
D
= {
d
_{1}
,…,
d_{N}
} be a collection of
N
documents formed by words from a dictionary
W
= {
w
_{1}
,…,
w_{V}
} of
V
terms. The data can be denoted by a
V
×
N
cooccurrence matrix where the element
n
(
w_{l}
,
d_{i}
) is the frequency of a term
w_{l}
appeared in a document
d_{i}
. Let
z
∈
Z
= {
z
_{1}
,…,
z_{K}
} be the hidden topic variable associating with each observation which is actually the occurrence of a word in a certain document. Let
P
(
d_{i}
) denote the probability of a certain document
d_{i}
;
P
(
w_{l}

z_{k}
) denote the conditional probability of a specific word
w_{l}
conditioned on the hidden topic
z_{k}
;
P
(
z_{k}

d_{i}
) denote the conditional probability of a hidden topic
z_{k}
conditioned on the document
d_{i}
. Then PLSA can be expressed as follows:

(1) Choose a documentdiaccording to a probabilistic distributionP(di);

(2) Choose a hidden topic variablezkaccording to a probabilistic distributionP(zkdi);

(3) Generate a wordwlaccording to a probabilistic distributionP(wlzk). Then we have an paired observation (wl,di).
The joint probability of the above model can be expressed as:
By marginalizing over the hidden variable
z
, it gives the marginal distribution,
Note that
P
(
w
,
d
) =
P
(
d
)P(
w

d
), we have the condition distribution
P
(
w

d
) as follows,
In this model, each document is modeled as a mixture of topics, and the word histogram for a certain document is composed of a mixture of the histograms corresponding to each topic. Especially, each document is a combination of the topic vector.
Learning this model involves the determination of the mixture coefficients which are specific for each document, and involves the determination of the topic parameters shared by all documents. To learn the PLSA model, we determine the conditional probabilities
P
(
z

d
) and
P
(
w

z
) by maximizing the following log likelihood function:
Maximizing this log likelihood function is equivalent to minimizing the KullbackLeibler divergence between the empirical distribution and the parameterized model. This model can be effectively learned using Expectation Maximization (EM) algorithm, as described in
[29]
.
 3.2. Fisher kernel based on PLSA
Having the PLSA model, to derive the Fisher kernel, we first give the variational lower bound
[42]
of the log likelihood function log
P
(
d
,
w
,
θ
), on which the derivation will be simple,
where
Q
(
z
) is the approximation of the real posterior
P
(
z

d
,
w
) and usually shares the same parameterizations with
P
(
z
). Then we have,
Given the variational lower bound 
F
(
θ
) of the log likelihood function log
P
(
d
,
w
,
θ
), the elements of Fisher score is its gradient with respect to the model parameters
[20]
,
Note that the elements of Fisher score is the expectation over a function of the observed variable
d
, hidden variables
z
and model parameters
θ
, where the hidden variables allow Fisher kernel to exploit hidden information and the model parameters make it adaptive to data distribution. The complete Fisher score is the combination of those gradients,
The Fisher kernel then can be defined as
[20]
,
where I=E
_{d}
[Φ(
d
)Φ(
d
)]
^{T}
is the Fisher information matrix. In order to exploit class label for similiarity learning, we here extend the kernel to the following parameterized form:
where
U
=diag(
u
_{1}
,
L
,
u_{M}
) is a diagonal matrix to be learnt, and
u_{m}
weights the importance of Φ
_{m}
, i.e. the
m
th element of Φ, to the similarity. In particular,
u_{m}
= 0 indicates that Φ
_{m}
is completely noninformative. Given the above parameterized form of Fisher kernel, we will show how to determine
U
in next section.
 3.3. Learning PLSA based Fisher kernel
Let
y_{i}
= (
y
_{1i}
,
L
,
y_{Ci}
) be the label vector of a sample
d_{i}
, where
y_{ci}
= 1 indicates that the
c
th label of all
C
ones is assigned to the sample
d_{i}
and
y_{ci}
= 0 otherwise. Here we consider the criterion that sample pairs take high similarity for sample pairs with the same class label, and takes low similarity for sample pairs with different class label,
where
measures the similarity of two label vectors.
Given the approximate posterior
Q
(
z
) over the hidden variable, we seek to minimize the objective function
J
(
θ
,
U
) using gradient descent,
where
m
' indexes the element of feature mapping Φ for
β_{kw}
(Eq. (4)) while
m
" indexes the element of feature mapping Φ for
α_{kd}
(Eq. (5)).
The learning procedure of the proposed approach is the iteration of the Estep and Mstep (Eq. (911)), which is summarized in
Algorithm 1
.
The learnt Fisher kernel can be embeded to kernelcompatible classifier for classification, and the kernel similarity of a pair of samples
d_{i}
,
d_{j}
can be computed following
Algorithm 2
.
4. Experiments
In this section, we will apply the proposed method, i.e., Probabilistic Latent Semantic Analysis based Fisher Kernel (PLSAFK) for image retrieval. The proposed method will be compared with several stateoftheart methods on three real datasets, Corel5K
[43]
, MIRFlickr 25,000
[44]
and Corel30K
[45]
.
 4.1. Image Representation
Image feature representation is crucial for CBIR systems due to the great variance of visual contents across image datasets. In this work, we ues color SIFT descriptors as the feature for its excellent performance which has been extensively validated
[46]
. Specifically, following the recommendation in
[46]
, four color SIFT descriptors (OpponentSIFT, rgSIFT, CSIFT and RGBSIFT) are adopted. These descriptors are extracted from the image patches given by dense sampling and HarrisLaplace point sampling, with spatial pyramid followed.
 4.2 Performance Measure
Following the previous works
[36
,
13
,
11]
, we evaluate the retrieval performance using leaveoneout manner. First, a query image is chosen from the test set. Then, search similar images from the candidate set according to the adopted similarity or distance measure. Mean average precision (MAP) is used to measure the performance of image retrieval. MAP is the summarization of the precisionrecall curve, where precision is defined as the percentage of returned images that contain the same label with the query image in all returned images.
Let
k
be the rank, the precision at cutoff
k
can be computed as:
Averaging the precision of those relevant returned images gives Average Precision (AP),
where
N
is the number of retrieved images, rel(
k
) is an indicator function outputting 1 if the image at the rank
k
is a relevant image and 0 otherwise. MAP is then given by averaging AP across all the query images,
 4.3 Experiments on Corel5K dataset
To evluate our approach PLSAFK, we first perform an experiment on Corel5K dataset
[43]
. Corel5k dataset is a subset selecting from Corel Photo Gallery, being composed of 50 categories, such as beach, tile, wave, food texture, tigers, France, bears, autumn, and tropical plants, where each category contains 100 images. It contains 371 word vocabulary. The sizes of the images are normlized to 192×128 or 128×192. The sample images are shown in
Fig. 2
. In this experiment, we randomly select 70% samples to form the training set and remain the rest as the test set. The training set is used for PLSAFK learning and the test set is used for performance evlauation. For all compared approaches, we measure the average precision for each category over the top 20 retrieved images.
Sample images from Corel5K dataset
We will compare our PLSAFK with other similarity or distance learning methods. Xing’s approach
[32]
casts the distance measure learning problem to a convex optimization problem. Discriminative components analysis (DCA)
[2]
introduces inequivalence constraints. SDPM
[35]
learns Mahalanobis distance through formulating it as a convex optimization problem. DMLeig.
[36]
learns distance by means of eigenvalue optimization. Large margin nearest neighbor (LMNN)
[26]
learns Mahalanobis distance under the criterion of nearest neighbor and large margin. Fisher kernel
[20]
and Free energy score space (FESS)
[24]
converts similarity learning to feature mapping learning on the basis of probabilistic generative models. Its similarity measure is the inner product of feature mapping. Discriminative Fisher kernel learning (DFK)
[28]
learns kernel similarity through exploiting class label.
For our approach, the number of topics for PLSA is determined through crossvalidation on the test set. It is set to
K
= 120 in this experiment. For compared approaches, we referred to the results for Xing’s
[32]
, DCA
[2]
, SDPM
[35]
, DMLeig.
[36]
, LMNN
[26]
from literatures, and implemented the algorithms of FK, FESS and DFK on the basis of authors’ implementations and parameter configurations.
The experimental results are reported in
Table 2
. It can be found that, Xing’s approach
[32]
and DMLeig
[36]
show competitive performance. Meanwhile, for distance measure learning approaches, DCA beat SDPM, DMLeig and LMNN. The underlying reason is that DCA introduced negative constraints which capture intrinsic structures within samples. And, the probabilistic similarity measure learning approach FK and FESS get better results due to the exploitation of probabilistic modeling of image distribution. Our approach PLSAFK, as shown in
Table 2
, achieves the best performance against the compared approaches in most cases. Specifically, PLSAFK approach outperforms FESS by about 2.7%. This improvement should be credited to that PLSAFK utilizes the label information while FESS does not. Also, PLSAFK outperforms DFK about 1.4%. The main reason is that, compared with GMM, PLSA can capture the image attributes and infer the semantic level hidden information better. These results demonstrate the effectiveness of the proposed method in image retrieval.
Retrieval performance of all algorithms over Corel5K dataset
Retrieval performance of all algorithms over Corel5K dataset
Fig. 3
shows an example of our approach PLSAFK on Corel5k dataset. Given the query image (lefttop), it returns relevant images and lists the top 11 images in the figure. We could find that most results are relevant, and even incorrect results exhibit similarity in both shape and color, which indicates that our proposed approach could potentially capture multiple kinds of information and comprehensively contributes to the retrieval.
Retrieval results of our approach PLSAFK. The query image is marked by blue box and the incorrect results of top 11 ones are marked by red box.
 4.4 Experiments on MIRFlickr dataset
In real applications, the dataset is usually very large, which requires that the similarity measure (1) is scalable and computationally efficient; (2) is able to characterize the semantic similarity between images, given the large variance. To evaluate performance of our method on large dataset, we experimented on MIRFlickr dataset
[44]
. The MIRFlickr25000 dataset contains 25,000 samples with highresolution images and text annotations, collected from Flickr which is an online photosharing website. The size of the images are normalized to 500×height where height<500 or width×500 where width<500. See
Fig. 4
for sample images. For fair comparison, we follow the typical experimental scheme. The dataset is split into two parts, 15,000 images for training and the rest 10,000 images for test. We randomly chosen 1,000 images from the test dataset as queries and remained the rest 24,000 images as the gallery. In the gallery, 15,000 images are with text annotations.
Sample images from MIRFlickr dataset
We compare our proposed method PLSAFK with several related methods: nonnegative matrix factorization (NNMF)
[19]
, large margin nearest neighbor (LMNN)
[26]
, free energy score space (FESS)
[24]
and posterior divergence (PD)
[25]
. NNMF is a stateoftheart image retrieval method on the basis of matrix factorization. LMNN is a supervised distance learning method under the criterion of large margin. FESS and PD are probabilistic similarity learning methods closely related to our method. For all these compared methods, we used the authors’ suggested settings. For our method, the number of topics of PLSA is set to
K
=160 according to cross validation.
The experimental results are reported in
Table 3
. It can be clearly found that, FK, FESS and PD outperform NNMF and LMNN. The underlying reason is that, compared with NNMF and LMNN, FK, FESS and PD exploit the data distribution more sophisticatedly. Further, our method PLSAFK shows superiority over FESS and PD. The reason accounting for this is that, PLSAFK exploits class label through tuning similarity measure according to performance. Again, PLSAFK outperforms DFK, which benefits from the semantic level information given by PLSA.
Fig. 5
presents the retrieval results of our method for a query image “flower”. For the query image on the lefttop, our method retrieves the relevant images and presents the top 11 ones in the figure. It can be seen that, 9 retrieved images are relevant. It is interesting to find that the 2 unrelevant images are similar to the query image in shape and pattern, which suggests that the results of our approach could give reasonable retrieval results.
The retrieval performance of compared algorithms on MIRFlickr dataset.
The retrieval performance of compared algorithms on MIRFlickr dataset.
Retrieval results of our proposed approach PLSAFK. The query image “flower” is highlighted by blue box and the incorrect results in top 11 relevant images are highlighted by red box.
 4.5 Experiments on Corel30K dataset
To further evaluate the performance of our proposed approach PLSAFK on large dataset, we experimented on Corel30k
[45]
for image retrieval. It contains annotated 31,695 images (28,525 training and 3,170 testing) with annotations from 950 words. It is worth noting that, only a few works, up to now, have experimented on this dataset
[47]
. This experiment compared our proposed approach PLSAFK with PLSAWORD
[48]
and GMPLSA
[47]
on a total of 950 keyword sets. PLSAWORD quantified visual features into discrete words and exploited PLSA model
[29]
for distribution modeling and image retrieving. GMPLSA is also based on PLSA but further cooperate with other models. We refer to the results of PLSAWORD and GMPLSA. For Fisher kernel
[20]
, free energy score space (FESS)
[24]
and posterior divergence (PD)
[25]
, we implemented them according to authors’ suggestions. For our PLSAFK, the number of topics in PLSA is set to
K
=180 according to cross validation.
The overall experimental results are summerized in
Table 4
. It shows that, for two evaluation criterion, our approach PLSAFK outperforms PLSAWORD significantly, and shows competitive performance with GMPLSA. It is worth noting that, both PLSAWORD and GMPLSA are based on PLSA and thus capture some semantic level information. Also, our PLSAFK shows superority over three score methods, FK, FESS and PD. Although they exploit semantic level information through PLSA, they do not benefit from the class label information. These results are firmly consistent with the above two experiments, which support the advantage of our approach that it can adapt to data distribution and fully exploit highlevel semantic information hidden in the images.
The retrieval performance on Corel30K
The retrieval performance on Corel30K
 4.6 Discussions
The learning procedure (
Algorithm 1
) of the proposed method is the iterations of the inference step (Estep) and parameter estimation step (Mstep). This procedure is relatively time consuming to reach convergence. However, this procedure can be greatly sped up by means of pretraining, i.e., train PLSA first and use the trained parameter as the initial value of
Algorithm 1
. The computation procedure (
Algorithm 2
) is highly effective, since it only involves two inference steps, and can be realtime. The limitation of applying the method for larger dataset, e.g. ImageNet, is the learning procedure. That is, to learn over larger dataset, the method should be compatible with incremental learning. In our future work, we will seek to develop the incremental algorithm and apply it for larger dataset.
5. Conclusions
In this paper, we exploited probabilistic latent semantic analysi (PLSA) for similarity measure learning towards content based image retrieval (CBIR). The proposed approach (PLSAFK) derived Fisher kernel based on PLSA and learnt the kernel similarity subject the criterion that image pairs with same label have large value and image pairs with different labels have small value. Because PLSA models the distribution of visual words, our approach can well adapt to data distribution. Further, PLSA infers the topic, thus our method exploits high level semantic information of retrieval. The proposed method is applied to image retrieval. The experimental results over three datasets approve the competitive performance of our method as well as its scaleablity to large datset. The method, however, can be further optimized for large dataset, which remains in the future work.
BIO
Xiong Li received the PhD degree in pattern recognition and intelligence system from Shanghai Jiao Tong University, China, in 2013. He is currently an engineer in National Computer Network Emergency Response Technical Team, China. His research interests include hybrid generative discriminative learning and probabilistic graphical model.
Qi Lv received the BS and MS degrees in Flight Vehicle Propulsion Engineering from Nanjing University of Aeronautics and Astronautics, China, in 2002 and 2005 respectively, and PhD degree in mathematics from Zhengzhou University, China, in 2010. His research interests include pattern recognition, statistics and Internet public opinion.
Wenting Huang received his BS degree in computer science and technology from Jilin University, China, in 2004, and the MS degree in computer science from Computer Network Information Center of the Chinese Academy of Sciences in 2007. He is currently an engineer in National Computer network Emergency Response technical Team of China. His research interests include multimedia
Smeulders A.
,
Worring M.
,
Santini S.
,
Gupta A.
,
Jain R.
2000
“Contentbased image retrieval at the end of the early years”
IEEE Transactions on Pattern Analysis and Machine Intelligence
22
(12)
1349 
1380
DOI : 10.1109/34.895972
Hoi S.
,
Liu W.
,
Lyu M.
,
Ma W.
“Learning distance metrics with contextual constraints for image retrieval”
in Proc. of IEEE Conference on Computer Vision and Pattern Recognition
2006
2072 
2078
Hoi S.
,
Lyu M.
,
Jin R.
2006
“A unified logbased relevance feedback scheme for image retrieval”
inProc. of IEEE Transactions on Knowledge and Data Engineering
18
(4)
509 
524
DOI : 10.1109/TKDE.2006.1599389
Faria F.
,
Veloso A.
,
Almeida H.
,
Valle E.
,
Torres R.
,
Gonc¸alves M.
,
Meira W.
“Learning to rank for contentbased image retrieval”
in Proc. of ACM conference on Multimedia Information Retrieval
2010
285 
294
ArevalilloHerr´aez M.
,
Ferri F.
,
Domingo J.
2010
“A naive relevance feedback model for contentbased image retrieval using multiple similarity measures”
Pattern Recognition
43
(3)
619 
629
DOI : 10.1016/j.patcog.2009.08.010
Wang M.
,
Li H.
,
Tao D.
,
Lu K.
2012
“Multimodal graphbased reranking for web image search”
inProc. of IEEE Transactions on Image Processing
21
(11)
4649 
4661
DOI : 10.1109/TIP.2012.2207397
Wang M.
,
Yang K.
,
Hua X.S.
,
Zhang H.J.
2010
“Towards a relevant and diverse search of social images”
inProc. of IEEE Transactions on Multimedia
12
(8)
829 
842
DOI : 10.1109/TMM.2010.2055045
Yu J.
,
Tao D.
,
Wang M.
,
Rui Y.
“Learning to rank using user clicks and visual features for image retrieval”
in Proc. of IEEE Transactions on Cybernetics
2014
vol. 99
2168 
2267
Yu J.
,
Rui Y.
,
Chen B.
2014
“Exploiting click constraints and multiview features for image reranking”
inProc. of IEEE Transactions on Multimedia
16
(1)
159 
168
DOI : 10.1109/TMM.2013.2284755
Yu J.
,
Rui Y.
,
Tao D.
2014
“Click prediction for web image reranking using multimodal sparse coding”
inProc. of IEEE Transactions on Image Processing
23
(5)
2019 
2032
DOI : 10.1109/TIP.2014.2311377
Hoi S.
,
Liu W.
,
Chang S.
2010
“Semisupervised distance metric learning for collaborative image retrieval and clustering”
inProc. of ACM Transactions on Multimedia Computing, Communications, and Applications.
6
(3)
Yang L.
,
Jin R.
,
Sukthankar R.
,
Liu Y.
“An efficient algorithm for local distance metric learning”
in Proc. of the National Conference on Artificial Intelligence
2006
Yang L.
,
Jin R.
,
Mummert L.
,
Sukthankar R.
,
Goode A.
,
Zheng B.
,
Hoi S.
,
Satyanarayanan M.
2010
“A boosting framework for visualitypreserving distance metric learning and its application to medical image retrieval”
inProc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
32
(1)
30 
44
DOI : 10.1109/TPAMI.2008.273
Yu J.
,
Wang M.
,
Tao D.
2012
“Semisupervised multiview distance metric learning for cartoon synthesis”
inProc. of IEEE Transactions on Image Processing
21
(11)
4636 
4648
DOI : 10.1109/TIP.2012.2207395
Yu J.
,
Tao D.
,
Lic J.
,
Cheng J.
2014
“Semantic preserving distance metric learning and applications”
Information Sciences
281
674 
686
DOI : 10.1016/j.ins.2014.01.025
Liu B.
,
Wang M.
,
Hong R.
,
Zha Z.J.
,
Hua X.S.
2010
“Joint learning of labels and distance metric”
inProc. of IEEE Transactions on Systems, Man and Cybernetics
40
(3)
973 
978
DOI : 10.1109/TSMCB.2009.2034632
Wang M.
,
Hua X.S.
,
Tang J.
,
Hong R.
2009
“Beyond distance measurement: constructing neighborhood similarity for video annotation”
inProc. of IEEE Transactions on Multimedia
11
(3)
465 
476
DOI : 10.1109/TMM.2009.2012919
Caicedo J.C.
,
BenAbdallah J.
,
González F.A.
,
Nasraoui O.
2012
“Multimodal representation, indexing, automated annotation and retrieval of image collections via nonnegative matrix factorization”
Neurocomputing
76
(1)
50 
60
DOI : 10.1016/j.neucom.2011.04.037
Jaakkola T.
,
Haussler D.
1999
“Exploiting generative models in discriminative classifiers”
NIPS
487 
493
Jebara T.
,
Kondor R.
,
Howard A.
2004
“Probability product kernels”
inProc. of Journal of Machine Learning Research
5
819 
844
Vasconcelos N.
2004
“On the efficient evaluation of probabilistic similarity functions for image retrieval”
inProc. of IEEE Transactions on Information Theory
50
(7)
1482 
1496
DOI : 10.1109/TIT.2004.830760
Schmid C.
“Constructing models for contentbased image retrieval”
CVPR
2001
Perina A
,
Cristani M.
,
Castellani U.
,
Murino V.
,
Jojic N.
“Free energy score spaces: using generative information in discriminative classifiers”
in Proc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
2011
Li X.
,
Lee T.S.
,
Liu Y.
“Hybrid generativediscriminative classification using posterior divergence”
CVPR
2011
Weinberger K.
,
Saul L.
2009
“Distance metric learning for large margin nearest neighbor classification”
The Journal of Machine Learning Research
10
207 
244
Jain P.
,
Kulis B.
,
Davis J.
,
Dhillon I.
2012
“Metric and kernel learning using a linear transformation”
The Journal of Machine Learning Research
13
519 
547
Wang B.
,
Li X.
,
Liu Y.
2013
“Learning discriminative Fisher kernel for image retrieval”
inProc. of KSII Transaction on Internet and Information System
7
(3)
Su J.
,
Huang W.
,
Yu P.
,
Tseng V.
2011
“Efficient relevance feedback for contentbased image retrieval by mining user navigation patterns”
inProc. of IEEE Transactions on Knowledge and Data Engineering
23
(3)
360 
372
DOI : 10.1109/TKDE.2010.124
Cai H.
,
Mikolajczyk K.
,
Matas J.
2011
“Learning linear discriminant projections for dimensionality reduction of image descriptors”
inProc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
33
(2)
338 
352
DOI : 10.1109/TPAMI.2010.89
Xing E.
,
Ng A.
,
Jordan M.
,
Russell S.
“Distance metric learning, with application to clustering with sideinformation”
NIPS
2002
505 
512
Frome A.
,
Singer Y.
,
Malik J.
“Image retrieval and classification using local distance functions”
NIPS
2007
Yang L.
,
Jin R.
,
Sukthankar R.
2012
“Bayesian active distance metric learning”
arXiv preprint arXiv
1206
(5283)
Kim J.
,
Shen C.
,
Wang L.
“A scalable algorithm for learning a Mahalanobis Distance Metric”
ACCV
2010
Ying Y.
,
Li P.
2012
“Distance metric learning with eigenvalue optimization”
The Journal of Machine Learning Research
13
1 
26
Xiang S.
,
Nie F.
,
Zhang C.
2008
“Learning a mahalanobis distance metric for data clustering and classification”
Pattern Recognitio
41
(12)
3600 
3612
DOI : 10.1016/j.patcog.2008.05.018
Becker H.
,
Naaman M.
,
Gravano L.
“Learning similarity metrics for event identification in social media”
in Proc. of ACM international conference on Web search and data mining
2010
291 
300
Cao S.
,
Snavely N.
“Learning to match images in largescale collections”
Springer
in Proc. of ECCV Workshops and Demonstrations
2012
259 
270
Wang M.
,
Hua X.S.
,
Hong R.
,
Tang J.
2009
“Unified video annotation via multigraph learning”
inProc. of IEEE Transactions on Circuits and Systems for Video Technology
19
(5)
733 
746
DOI : 10.1109/TCSVT.2009.2017400
Bosch A.
,
Zisserman A.
,
Muoz X.
2008
“Scene classification using a hybrid generative discriminative approach”
inProc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
30
(4)
Jordan M.
,
Ghahramani Z.
,
Jaakkola T.
,
Lawrence S.
1999
“Introduction to variational methods for graphical models”
Machine Learning
37
183 
233
DOI : 10.1023/A:1007665907178
Duygulu P.
,
Barnard K.
,
de Freitas N.
,
Forsyth D.
“Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary”
ECCV
2002
Huiskes M. J.
,
Lew M. S.
“The MIR Flickr retrieval evaluation”
in Proc. of ACM International Conference on Multimedia Information Retrieval
2008
Carneiro G.
,
Chan A. B.
,
Moreno P. J.
,
Vasconcelos N.
2006
“Supervised learning of semantic classes for image annotation and retrieval”
inProc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
29
(3)
394 
410
DOI : 10.1109/TPAMI.2007.61
Van De Sande K.
,
Gevers T.
,
Snoek C.
2010
“Evaluating color descriptors for object and scene recognition”
inProc. of IEEE Transactions on Pattern Analysis and Machine Intelligence
32
(9)
1582 
1596
DOI : 10.1109/TPAMI.2009.154
Li Z.
,
Shi Z.
,
Liu X.
,
Shi Z.
2010
“Modeling continuous visual features for semantic image annotation and retrieval”
Pattern Recognition Letters
32
(3)
516 
523
DOI : 10.1016/j.patrec.2010.11.015
Li Z.
,
Shi Z.
,
Liu X.
,
Li Z.
,
Shi Z.
2010
“Fusing semantic aspects for image annotation and retrieval”
Journal of Visual Communication and Image Representation
21
(8)
798 
805
DOI : 10.1016/j.jvcir.2010.06.004