At present, the ContentBased Image Retrieval (CBIR) system has become a hot research topic in the computer vision field. In the CBIR system, the accurate extractions of lowlevel features can reduce the gaps between highlevel semantics and improve retrieval precision. This paper puts forward a new retrieval method aiming at the problems of high computational complexities and low precision of global feature extraction algorithms. The establishment of the new retrieval method is on the basis of the SIFT and Harris (APISH) algorithm, and the salient region of interest points (SRIP) algorithm to satisfy users' interests in the specific targets of images. In the first place, by using the IPDSH and SRIP algorithms, we tested stable interest points and found salient regions. The interest points in the salient region were named as salient interest points. Secondary, we extracted the pseudoZernike moments of the salient interest points’ neighborhood as the feature vectors. Finally, we calculated the similarities between query and database images. Finally, We conducted this experiment based on the Caltech101 database. By studying the experiment, the results have shown that this new retrieval method can decrease the interference of unstable interest points in the regions of noninterests and improve the ratios of accuracy and recall.
1. Introduction
W
ith the development of computer networks and multimedia technologies, there have been an increasing number of people who have searched for the information from the internet. This has been an indispensable part of people's daily lives. However, the rapid growth of data amount has restricted the progress of image retrieval
[1

3]
. In order to solve this problem, a growing number of scholars have invested in the researches of the ContentBased Image Retrieval (CBIR) systems. The CBIR system is a process of extracting query images' characteristics such as colors, shapes and textures. Meanwhile, it can retrieve image information by comparing the images’ similarities. Although the CBIR has achieved remarkable successes in some fields, the semantic gap has restricted its development. Therefore, finding accurate lowlevel visual feature has become a key factor of extracting implicit highlevel semantic.
In respect to the researches of lowlevel visual features, the local characteristics methods for image retrievals have commonly been used. They include two categories which are segmentation based methods and interest points based methods
[4]
. Segmentation methods are the methods of using some image segmentation technologies to find the regions of significant objects. Nevetheless, the present segmentation technologies have very low accuracies to retreive images, so many scholars began to study the interest points based methods. For instance, Li Yangxi
et al
.
[5]
have proposed a query difficulty guided image retrieval system, which allowed him to apply ranking optimization approaches to predict the queries' ranking performances in terms of their difficulties. Another scientist Wang Xiaoyang
el al
.
[6]
have made an effort on questing a robust color image retrieval method by using visual interest point feature of significant bitplanes. He has proved that it was more effective to retrieve the noiseattacked images including blurring, sharpening and illumination by adopting this method. Moreover, Zhu Xianqiang
et al
.
[7]
have proposed a novel local saliency features by adopting interest extraction algorithms which could improve process efficiencies and fit to human visual perceptual characteristcs. At last, Glauco V. Pedrosa et al.
[8]
have defined the saliencies of a shape as the higher curvature points along the shape contour, and proposed a shapebased image retrieval technique using salience points to describe shapes, which is consists of a salience point detector, a salience representation using angular relative position and curvature value analyzed from a multiscale perspective, and a matching algorithm considering local and global features to calculate the dissimilarity.
As a result, these scholars have approved that the interested point based method has changed the noneffective traditional ways of image retrievals. Specifically, the method has altered the way of directly transforming the interests points in the image matching algorithm to image retrieval. Meanwhile, the method considers more about the space distribution information of the interest points. It can improve the precisions to retrieve images. Nevertheless, interest points’ detection algorithms can be influenced by unstable interest points in the regions of noninterests. Thus, the division of salient region has low accuracy. Aiming at this problem, we have proposed a novel image retrieval method based on IPDSH and SRIP, which can perform better than some other popular algorithms.
Fig. 1
shows the structure of this image retrieval method:
Image retrieval structure
2. IPDSH
Interest points are usually defined as corners and there are some popular corner detection algorithms such as Harris and SIFT. In respect of the Harris algorithm, although this algorithm is effective in the aspects of high calculation speed and high computational precision, it is sensitive to scales which can lead to low accuracies of extractions. Specifically, the precision of corner extraction only depends on the threshold when response functions implement the nonmaxima suppression and identify the local maxima. Meanwhile,the magnitude of threshold can determine the accuracy of detection. For example, corner information will lose as a result of a very large threshold and too small threshold can result in false corners
[9]
. To resolve this problem, some corner detection algorithms based on scaleinvariant features were proposed. By saying that, the Scaleinvariant feature transform (SIFT) algorithm is regarded as the most effective algorithm to slove the problem. Nevertheless, there are some disadvantages of the SIFT algorithm. For example, the computational complexity is high, calculation speed is low, and the interest points detected by the SIFT algorithm are not salient. Thus, we proposed the IPDSH algorithm based on the features of the SIFT and Harris (IPDSH) algorithm. The IPDSH algorithms can effectively improve the accuracies of interest points’ detections. Specific procedures are as followings:
In the first place, since Koendetink
[14]
and Lindeberg
[15]
have proved that the Gaussian convolution kernel was the only constant linear scale transformation to achieve volume. The image’s
f
(
x
,
y
) convolution can be realized with the twodimensional Gaussian kernel
G
(
x
,
y
,
σ
). Then, we can get the multiscale spaces sequence
S
, as shown in equation (1).
G
(
x
,
y
,
σ
) is defined as the equation (2). Included, ∗ represents the convolution, (
x
,
y
) represents the pixel position, and
σ
represents the scale space factor. If
σ
is getting smaller, the image smooth is getting less, and the corresponding scales will get smaller as well.
In order to improve the efficiency of stable interest points in the scale spaces, Lowe used the difference of Gaussian function (DoG) to get the convolution of
f
(
x
,
y
), and obtain the maxima of the multiscale spaces, as shown in equation (3).
where,
is a constant. Lindeberg found that DoG function was very similar to the normalized Gauss Laplasse function
σ
^{2}
∇
^{2}
G
, so the DoG function can be used to calculate the position of the extreme points in the scale spaces.
Futhermore, the descriptor of the SIFT interest points are based on the pixel gradients. Thus, in the local field of interest points, the larger changes of the pixel gradients can result in more unique descriptor vectors. At the same time, the matching of descritors will probably be more accurate. This gives us inspiration that if there are harris corners existing in the local field of the extreme points, the extreme points are more likely to be the stable interest points in the scale spaces.
The autocorrelation matrix is used to extract the features and describe the local structure of images. It is defined as the equation (4).
In which
δ_{i}
represents the integral,
δ_{d}
represents the differential scales, and
represents the Gaussian filtering of
f
,
f_{x}
and
f_{y}
represents the derivatives in the directions of
x
and
y
.
Suppose that the eigenvalues of
A
are
λ
_{1}
and
λ
_{2}
, which represent the principal curvature of autocorrelation function, we can define the detection formula of different scale spaces as the equation (5).
Using the equation (4) and (5), we can obatin the harris corners of scale spaces. Then, according to the magnitudes of
λ
_{1}
and
λ
_{2}
, we can calculate
C
’s local maxima.
α
is set to be a constant number within 0.04 and 0.06. After, we can analyze if the Harris corners exist in the local field of extreme points. If any Harris corner falls on the extreme points’ neighborhood in the multiscale spaces, we should retain this extreme point and regard it as the stable interest points. On the other hand, if any Harris corner does not exist, we should deleted this extreme point and regard it as the unstable interest point.
This method can effectively decrease the number of interest points and enhance the significance of interest points’ set to improve the precision of image retrieval.
Fig. 2
shows an experiment result in comparing the IPDSH and the SIFT algorithms. It indicates the total number of interest points detected by IPDSH is lower than the SIFT algorithms because some unstable interest points have been removed.
Comparison of detecting results of IPDSH and SIFT
3. SRIP
The vast majority algorithms of the CBIR are based on the global information, which describe images by using some global feature descriptors. The global features algorithms are simple, and less sensitive to translate or rotate. However, although the methods can satisfy some users’ retrieval purposes, they still cannot describe the spatial differences of the image. As a result, the retrieval algorithms based on the global features have great limitations of satisfying the users’ demands. Therefore, the CBIR systems are based on the local features should be analyzed in order to truely cope with the users’ needs.
For instance, convex hull and annular division are two effective algorithms based on the local features.
Fig. 3
shows the feature regions of convex hull and annular division. Obviously, the accuracy of divisions has been influenced by the interest points located in the background. In general, most interest points of images are concentrated in the inner or edge of the objects. Some of these interest points are freely distributed in the background. When the total number of interest points is less, the free distribution points can disturb the division results.Thus, its negative effects should be considered in the feature of the extraction of image.
Division of feature region
By studying the convex hull and annular division above, these two alogrithms have negative effects on image extraction because of the disturbances of freely distributed interests ponits. Therefore, SRIP method which is also based on local features will be introduced.
In most time, users just want to find one or more specific targets of query images. Thus, it is significant to focus on the salient regions that users concern about. A salient region can express the main content of images well. To some extent, it can eliminate the interferences of redundant information, and be well accepted in the domain for image retrieval.
Furthermore, the salient region of images regards as a region that interest points’ distribution density is obviously higher than the others. Higher density represents more particular information. Within the salient region, the interest points are called salient points. The interest points locate outside of the salient region need to be removed. After, the background information can be reduced and the feature extractions will have more accuracies. The detailed procedures of salient region division are as followings:
Suppose that the size of image is
m
×
n
, the set of interest points
P
can be expressed as the equation (6). In which,
p
(
x
,
y
) represents the interest points and (
x
,
y
) represents the coordinate.
Let
N
represent the total number of interest points, and we define the centroid of interest points
O
as the equation (7).
Next, we calculate the distance between
O
and each interest points according to the equation (8), where,
R_{i}
is the distance from the
i
−
th
interest points to the centroid
O
.
Then, arranging
R_{i}
from large to small orders and setting the
k

th
(
k
∈
N
) value to be radius that was recorded as
R_{c}
. We draw a circle
C
around the centroid, which is seen as the circle center. In
Fig. 4
, we determined an ergodic region according to the
R_{c}
, and iterated each pixel in this region. All pixels in the ergodic region can be seen as the center of circle, and then we will obtain some circles with
R_{c}
radius. In
Fig. 4
, A’, B’, C’ and D’ represent the four corners of the image. The blue region constitutes of A, B, C and D is the ergodic region. AA’, BB’, CC’ and DD’ are angle bisectors of the each corner of the image. The goal is to achieve to iterate all the pixels in the image by only iterating the least number of pixels. Moreover, if
R_{c}
is large enough, the ergodic region will become a line of EF.
Ergodic region
In the process of iteration, we recorded the coordinate of the center and the total number of interest points in the circle. After the comparison of each circle, we selected the one which the number of interest points was the largest. Then it was defined as the salient region. It was drawn on the image in green. Finally, we retained the salient interest points in the salient region and excluded the free distributed interest points, so as to obtain the feature extraction region. Additionally, the set of salient interest points is determined as the equation (9).
where,
x_{c}
and
y_{c}
represent the circle center of the salient region.
Fig. 5
shows two examples of SRIP’s extractions (
k
= 14). It can be seen that the free distributed points have been effectively omitted. Plus, the featureregion extraction are more accurate than the results in
Fig. 3
.
Two examples of SRIP’s extraction (k=14)
Additionally, by comparing the data, we found that the experiment could achieve the best results if
R_{c}
= ⌈0.9 ×
N
⌉, in which ⌈⋅⌉ rounds the elements to the nearest integers towards plus infinity,
N
represents the total number of stable interest points.
4. PseudoZernike Moments
PseudoZernike Moments
[10]
can be obtained from a group of basic functions {
V_{nm}
(
x
,
y
),
x
^{2}
+
y
^{2}
≤ 1}, which perform better than Zernike moment in antinoise aspect.
V_{nm}
(
x
,
y
) is defined as equation (10), which can satisfy the equation (11).
where,
n
represents the nonnegative integer,
m
is the integer (
m
 ≤
n
),
ρ
is radius and
θ
is the angle of polar coordinates pixels. Then, we defined the
f
(
x
,
y
)'s (
n
,
m
) step pseudoZernike Moments as the following equation (12).
After confirming the interest points’ location, we can build the origin of polar coordinates, calculate the pseudoZernike and convert unit circle to polar coordinates. The pixels locate outside of the polar coordinates will not be calculated.
5. Similarity Measurement
Suppose that
Q
represents the query image with
E
salient interest points,
I
represents the database image with
F
salient interest points, and
N
represents the total number of each salient interest points’ PseudoZernike Moments. Then, we can define the similarities of PseudoZernike Moments
S
in salient interest points’ neighborhood as the equation (13).
where, ║
P_{Q}
(
i
)║ represents the amplitude of the query images’
i

th
moment, ║
P_{I}
(
i
)║ is the same amplitude of the database images.
6. Experiments
The platform of “hardware: Core 2 Q8200 CPU, 2.33GHz, 4.0GB memory, OS: windows server 2003，software: Matlab7.1” and the Caltech101 dataset have been selected to study this experiment. The Caltech101 dataset totally consists of 9146 images, and splits to 101 different object categories as well as an additional background/clutter category. There are around 40 to 800 images per category, e.g., sunflower, butterfly, schooner, rhino, panda, lamp, cougar face, rooster, piano, watch, cup, lotus, barrel, ant, chairs and other images.
Fig. 6
shows the thumbnails of some categories.In this experiment, the precision is defined as
P_{T}
=
n
/
T
, in which
T
represents the total number of image outputs in the retrieval results and
n
represents the number of correct images. Some experiment results are shown in
Fig. 7
.
Thumbnails from some categories of the Caltech101 database
Retrieval results
From the
Fig. 7
, there are 6 rows and each row has 10 images. The leftmost image of each row is query image, which were chosen from the categories shown in
Fig. 6
. Other images in
Fig. 7
in each row are the retrieval results which are arranged from large to small order according to the similarities. Furthermore, the images marked with red ellipses are the results in errors. In the first row, because there is no error result, the
P
_{10}
of sunflower is 100%. In the second row, there are 3 error results, so the
P
_{10}
of cougar face is 70%. Similarly, the
P
_{10}
of the schooner, butterfly, panda and grand piano are respectively 70%, 80%, 60% and 60%. It is also important to note that the 3 error results in the third row are all ketches because it regarded ships as the same as schooners. If we deem the query image as ships,
P
_{10}
will be valued 100%. In short, the 6 precisions of experiment results are over 50%, which can satisfy our needs.
By comparing the corrected results and results in errors, it is obvious that the image shapes in errors are similar to the corrected image shapes. The reason behind this is that PseudoZernike moments can describe these images well since it has space geometric invariance characteristis ofselecting, translating, and scaling.However, singly adopting this method can easily cause the distrubances of similar images. The advantage of this method is that the complexity is low and calculation speed can be fast. Nevertheless, if intergating with some algorithms based on more features such as colors and textures, the method we proposed can be improved.
In order to evaluate the performances of our algorithms further, we have randomly selected 10 query images in each category and calculated the average precisions
of 6 categories mentioned above. The purpose is to compare the precision of both of the Chen’s algorithms in Ref.
[11]
and Yu’s algorithms in Ref.
[12]
.
Table 1
shows the precision of different methods. By looking at the Table1,the average precision of our method has increased 6.5 percent than Chen’s algorithms and 11.2 percent than Yu’s algorithms. Furthermore,
Fig. 8
shows the
of the 102 categories, which are listed alphabetically from left to right. In accordance with this figure, the soccer ball’s precision is the highest, and the crab’s precision is the lowest. Moreover, there are 60 categories’ precisions are over 50 percent.
Precision of different methods
Precision of different methods
of 102 categories
Additionally, we have compared the PrecisionRecall (PR) curves and the average retrieval time with the algorithms of Chen’s and Yu’s.
Fig. 9
shows the comparison result of PR curves.
Fig. 10
shows the average retrieval time of 6 categories in the Caltech101 database. By reading this figure, our proposed method is superior than the other methods since it has effectively retrieved the precisions and the time complexities. Thus, the combination of IDPSH and SRIP can effectively reduce the amount of calculation, and accurately extract the images’ visual features.
Comparison result of PR curves
Average retrieval time of 6 categories in the Caltech101 database
8. Conclusions
In this paper, we proposed an image retrieval method based on IPDSH and SRIP. The IPDSH algorithm can detect interest points more accurately after eliminating unstable interest points. On the other hand, the SRIP algorithm can reduce the interferences of some free distributed interest points in the images' backgrounds. As a result, it can make the feature extraction algorithm perform better. In addition, when confirming the salient regions, the PseudoZernike moments will be extracted as the image features since they have the advantages of antinoise, and rotational invariance. Finally, the retrieval results can output according to the similarities. To conclude, these experiment results have shown the proposed method we conducted can perform better rather than some other popular methods.
In the future, based upon other scholars' researches, we would like to extend our proposed methods to solve more challenging problems such as video searches, large data and mobile searches
[13]
. Furthermore, we are also going to make an efforts on improving detection speeds while guaranteeing accuracies.
BIO
Xu Zhang received his B.S. degrees in control technology and instrument from Xidian University, Xi’an, and P.R. China in 2007. He is currently a Ph.D. student with the School of Areospace Science and Technology, Xidian University. His research interests include computer vision, pattern recognition, signal processing and image retrieval.
Baolong Guo received his B.S., M.S. and Ph.D. degrees from Xidian University in 1984, 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently the director of the Institute of Intelligent Control & Image Engineering (ICIE) at Xidian University. His research interests include neural networks, pattern recognition, intelligent information processing, and image communication.
Yunyi Yan received his B.S. degree in automation, M.S. degree in pattern recognizaion andintelligent system, and Ph.D degree in electronic science and technology from Xidian University in 2001, 2004 and 2008 respectively in China. Since 2004, he has been with ICIE Institute of Areospace Science and Technology School of Xidian University, where he is currently an associated professor. Since 2013, he served as the director of Intelligent Detection Department in Xidian University. He worked as a visiting scholar in Montreal Neurological Institute of McGill University,Canada from 2010 to 2011. His research interests include system reliablity analysis, medical image processing, and swarm intelligence.
Wei Sun was born in 1980, Anhui province, China. He received his Bachelor degree in Measuring & Control Technology and Instrumentations, Master and Ph.D. degrees in Circuit and System from Xidian University, Xi'an, China, in 2002, 2005 and 2009, respectively. Since 2012, Dr. Wei Sun has been an associate Professor in School of Aerospace Science and Technology at Xidian University, Xi’an, China, His research interests include visual information perception, pattern recognition and embedded video system. He has published over 20 papers in the significant journal or conference and has been granted 3 patents.
Meng Yi (S’12) received the M.S. degree in Electrical Engineering from Northwestern Polytechnical University, Xi’an, China, in March 2008. Since 2009, he has been a Ph.D. of Electric circuit and systematic at Xidian University. Currently, he is a visiting doctoral candidate in Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland, College Park, maryland, USA. His research interests include computer vision, pattern recognition, signal processing and biometrics.
Datta R.
,
Joshi D.
,
Li J.
,
Wang J. Z.
2008
“Image retrieval: ideas, influences, and trends of the new age”
ACM Computing Suerveys
Article 5:159, Article (CrossRef Link).
40
(2)
Ji R. R.
,
Duan L. Y.
,
Chen J.
,
Xie L. X.
,
Yao H. X.
,
Gao W.
2013
“Learning to distribute vocabulary indexing for scalable visual search”
IEEE Transactions on MULTIMEDIA
Article (CrossRef Link).
15
(1)
153 
166
DOI : 10.1109/TMM.2012.2225035
Yan C. L.
2013
“Accurate image retrieval algorithm based on color and texture feature”
Journal of Multimedia
Article (CrossRef Link).
8
(3)
277 
283
DOI : 10.4304/jmm.8.3.277283
Fu X.
,
Zeng J. X.
2010
“A novel image retrieval method based on interest points matching and distribution”
Chinese Journal of Lasers
Article (CrossRef Link).
37
(3)
774 
778
DOI : 10.3788/CJL20103703.0774
Li Y. X.
,
Geng B.
,
Zha Z. J.
,
Tao D. C.
,
Yang L. J.
,
Xu C.
2012
“Difficulty guided image retrieval using linear multiple feature embedding”
IEEE Transactions on Multimedia
Article (CrossRef Link).
14
(6)
1618 
1630
DOI : 10.1109/TMM.2012.2199292
Wang X. Y.
,
Yang H. Y.
,
Li Y. W.
,
Yang F. Y.
2013
“Robust color image retrieval using visual interest point feature of significant bitplanes”
Digital Signal Processing
Article (CrossRef Link).
23
(4)
1136 
1153
DOI : 10.1016/j.dsp.2013.01.013
Zhu X. Q.
,
Huang J. C.
,
Shao Z. F.
,
Cheng G. Q.
2013
“A new approach for interesting local saliency features definition and its application to remote sensing imagery retrieval”
Geomatics and Information Science of Wuhan University
Article (CrossRef Link).
38
(6)
652 
655
Pedrosa G. V.
,
Batista M. A.
,
Barcelos C. A. Z.
2013
“Image feature descriptor based on shape salience points”
Neurocomputing
Article (CrossRef Link).
120
(23)
156 
163
DOI : 10.1016/j.neucom.2012.07.055
Wang Y. T.
,
Chen Y. Q.
,
Li J.
,
Li B. M.
2012
“The Harris corner detection method based on three scale invariance spaces”
International Journal of Computer Science Issues
Article (CrossRef Link).
9
(6)
18 
22
Dai X. B.
,
Liu T. L.
,
Shu H. Z.
,
Luo L. M.
2013
“Pseudozernike moment invariants to blur degradation and their use in image recognition”
Lecture Notes in Computer Science
Article (CrossRef Link).
7551
90 
97
Chen M. S.
,
Yang S. Y.
,
Zhao Z. J.
,
Fu P.
,
Sun Y.
,
Li X. N.
,
Sun Y.
,
Qi X. Y.
2011
“Multipoints diverse density learning algorithm and its application in image retrieval”
Journal of Jilin University (Engineering and Technology Edition)
Article (CrossRef Link).
41
(5)
1456 
1460
Yu J.
,
Qin Z. C.
,
Wan T.
,
Zhang X.
2013
“Feature integration analysis of bagoffeatures model for image retrieval”
Neurocomputing
Article (CrossRef Link).
120
355 
364
DOI : 10.1016/j.neucom.2012.08.061
Ji R. R.
,
Duan L. Y.
,
Chen J.
,
Yao H. X.
,
Yuan J. S.
,
Rui Y.
,
Gao W.
2012
“Location discriminative vocabulary coding for mobile landmark search”
Int J Comput Vision
Article (CrossRef Link)
96
(3)
290 
314
DOI : 10.1007/s1126301104729
Koenderink J.J.
1984
“The structure of images”
Biological Cybernetics
Article (CrossRef Link).
50
363 
396
DOI : 10.1007/BF00336961
Lindeberg T.
1993
“Detecting salient bloblike image structures and their scales with a scalespace primal sketch: a method for focusofattention”
International Journal of Computer Vision
Article (CrossRef Link).
11
(3)
283 
318
DOI : 10.1007/BF01469346