Advanced
Image Retrieval Method Based on IPDSH and SRIP
Image Retrieval Method Based on IPDSH and SRIP
KSII Transactions on Internet and Information Systems (TIIS). 2014. May, 8(5): 1676-1689
Copyright © 2014, Korean Society For Internet Information
  • Received : December 26, 2013
  • Accepted : May 04, 2014
  • Published : May 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Xu Zhang
Baolong Guo
Yunyi Yan
Wei Sun
Meng Yi

Abstract
At present, the Content-Based Image Retrieval (CBIR) system has become a hot research topic in the computer vision field. In the CBIR system, the accurate extractions of low-level features can reduce the gaps between high-level semantics and improve retrieval precision. This paper puts forward a new retrieval method aiming at the problems of high computational complexities and low precision of global feature extraction algorithms. The establishment of the new retrieval method is on the basis of the SIFT and Harris (APISH) algorithm, and the salient region of interest points (SRIP) algorithm to satisfy users' interests in the specific targets of images. In the first place, by using the IPDSH and SRIP algorithms, we tested stable interest points and found salient regions. The interest points in the salient region were named as salient interest points. Secondary, we extracted the pseudo-Zernike moments of the salient interest points’ neighborhood as the feature vectors. Finally, we calculated the similarities between query and database images. Finally, We conducted this experiment based on the Caltech-101 database. By studying the experiment, the results have shown that this new retrieval method can decrease the interference of unstable interest points in the regions of non-interests and improve the ratios of accuracy and recall.
Keywords
1. Introduction
W ith the development of computer networks and multimedia technologies, there have been an increasing number of people who have searched for the information from the internet. This has been an indispensable part of people's daily lives. However, the rapid growth of data amount has restricted the progress of image retrieval [1 - 3] . In order to solve this problem, a growing number of scholars have invested in the researches of the Content-Based Image Retrieval (CBIR) systems. The CBIR system is a process of extracting query images' characteristics such as colors, shapes and textures. Meanwhile, it can retrieve image information by comparing the images’ similarities. Although the CBIR has achieved remarkable successes in some fields, the semantic gap has restricted its development. Therefore, finding accurate low-level visual feature has become a key factor of extracting implicit high-level semantic.
In respect to the researches of low-level visual features, the local characteristics methods for image retrievals have commonly been used. They include two categories which are segmentation based methods and interest points based methods [4] . Segmentation methods are the methods of using some image segmentation technologies to find the regions of significant objects. Nevetheless, the present segmentation technologies have very low accuracies to retreive images, so many scholars began to study the interest points based methods. For instance, Li Yangxi et al . [5] have proposed a query difficulty guided image retrieval system, which allowed him to apply ranking optimization approaches to predict the queries' ranking performances in terms of their difficulties. Another scientist Wang Xiaoyang el al . [6] have made an effort on questing a robust color image retrieval method by using visual interest point feature of significant bit-planes. He has proved that it was more effective to retrieve the noise-attacked images including blurring, sharpening and illumination by adopting this method. Moreover, Zhu Xianqiang et al . [7] have proposed a novel local saliency features by adopting interest extraction algorithms which could improve process efficiencies and fit to human visual perceptual characteristcs. At last, Glauco V. Pedrosa et al. [8] have defined the saliencies of a shape as the higher curvature points along the shape contour, and proposed a shape-based image retrieval technique using salience points to describe shapes, which is consists of a salience point detector, a salience representation using angular relative position and curvature value analyzed from a multi-scale perspective, and a matching algorithm considering local and global features to calculate the dissimilarity.
As a result, these scholars have approved that the interested point based method has changed the non-effective traditional ways of image retrievals. Specifically, the method has altered the way of directly transforming the interests points in the image matching algorithm to image retrieval. Meanwhile, the method considers more about the space distribution information of the interest points. It can improve the precisions to retrieve images. Nevertheless, interest points’ detection algorithms can be influenced by unstable interest points in the regions of non-interests. Thus, the division of salient region has low accuracy. Aiming at this problem, we have proposed a novel image retrieval method based on IPDSH and SRIP, which can perform better than some other popular algorithms. Fig. 1 shows the structure of this image retrieval method:
PPT Slide
Lager Image
Image retrieval structure
2. IPDSH
Interest points are usually defined as corners and there are some popular corner detection algorithms such as Harris and SIFT. In respect of the Harris algorithm, although this algorithm is effective in the aspects of high calculation speed and high computational precision, it is sensitive to scales which can lead to low accuracies of extractions. Specifically, the precision of corner extraction only depends on the threshold when response functions implement the non-maxima suppression and identify the local maxima. Meanwhile,the magnitude of threshold can determine the accuracy of detection. For example, corner information will lose as a result of a very large threshold and too small threshold can result in false corners [9] . To resolve this problem, some corner detection algorithms based on scale-invariant features were proposed. By saying that, the Scale-invariant feature transform (SIFT) algorithm is regarded as the most effective algorithm to slove the problem. Nevertheless, there are some disadvantages of the SIFT algorithm. For example, the computational complexity is high, calculation speed is low, and the interest points detected by the SIFT algorithm are not salient. Thus, we proposed the IPDSH algorithm based on the features of the SIFT and Harris (IPDSH) algorithm. The IPDSH algorithms can effectively improve the accuracies of interest points’ detections. Specific procedures are as followings:
In the first place, since Koendetink [14] and Lindeberg [15] have proved that the Gaussian convolution kernel was the only constant linear scale transformation to achieve volume. The image’s f ( x , y ) convolution can be realized with the two-dimensional Gaussian kernel G ( x , y , σ ). Then, we can get the multi-scale spaces sequence S , as shown in equation (1).
PPT Slide
Lager Image
G ( x , y , σ ) is defined as the equation (2). Included, ∗ represents the convolution, ( x , y ) represents the pixel position, and σ represents the scale space factor. If σ is getting smaller, the image smooth is getting less, and the corresponding scales will get smaller as well.
PPT Slide
Lager Image
In order to improve the efficiency of stable interest points in the scale spaces, Lowe used the difference of Gaussian function (DoG) to get the convolution of f ( x , y ), and obtain the maxima of the multi-scale spaces, as shown in equation (3).
PPT Slide
Lager Image
where,
PPT Slide
Lager Image
is a constant. Lindeberg found that DoG function was very similar to the normalized Gauss Laplasse function σ 2 2 G , so the DoG function can be used to calculate the position of the extreme points in the scale spaces.
Futhermore, the descriptor of the SIFT interest points are based on the pixel gradients. Thus, in the local field of interest points, the larger changes of the pixel gradients can result in more unique descriptor vectors. At the same time, the matching of descritors will probably be more accurate. This gives us inspiration that if there are harris corners existing in the local field of the extreme points, the extreme points are more likely to be the stable interest points in the scale spaces.
The autocorrelation matrix is used to extract the features and describe the local structure of images. It is defined as the equation (4).
PPT Slide
Lager Image
In which δi represents the integral, δd represents the differential scales, and
PPT Slide
Lager Image
represents the Gaussian filtering of f , fx and fy represents the derivatives in the directions of x and y .
Suppose that the eigenvalues of A are λ 1 and λ 2 , which represent the principal curvature of autocorrelation function, we can define the detection formula of different scale spaces as the equation (5).
PPT Slide
Lager Image
Using the equation (4) and (5), we can obatin the harris corners of scale spaces. Then, according to the magnitudes of λ 1 and λ 2 , we can calculate C ’s local maxima. α is set to be a constant number within 0.04 and 0.06. After, we can analyze if the Harris corners exist in the local field of extreme points. If any Harris corner falls on the extreme points’ neighborhood in the multi-scale spaces, we should retain this extreme point and regard it as the stable interest points. On the other hand, if any Harris corner does not exist, we should deleted this extreme point and regard it as the unstable interest point.
This method can effectively decrease the number of interest points and enhance the significance of interest points’ set to improve the precision of image retrieval. Fig. 2 shows an experiment result in comparing the IPDSH and the SIFT algorithms. It indicates the total number of interest points detected by IPDSH is lower than the SIFT algorithms because some unstable interest points have been removed.
PPT Slide
Lager Image
Comparison of detecting results of IPDSH and SIFT
3. SRIP
The vast majority algorithms of the CBIR are based on the global information, which describe images by using some global feature descriptors. The global features algorithms are simple, and less sensitive to translate or rotate. However, although the methods can satisfy some users’ retrieval purposes, they still cannot describe the spatial differences of the image. As a result, the retrieval algorithms based on the global features have great limitations of satisfying the users’ demands. Therefore, the CBIR systems are based on the local features should be analyzed in order to truely cope with the users’ needs.
For instance, convex hull and annular division are two effective algorithms based on the local features. Fig. 3 shows the feature regions of convex hull and annular division. Obviously, the accuracy of divisions has been influenced by the interest points located in the background. In general, most interest points of images are concentrated in the inner or edge of the objects. Some of these interest points are freely distributed in the background. When the total number of interest points is less, the free distribution points can disturb the division results.Thus, its negative effects should be considered in the feature of the extraction of image.
PPT Slide
Lager Image
Division of feature region
By studying the convex hull and annular division above, these two alogrithms have negative effects on image extraction because of the disturbances of freely distributed interests ponits. Therefore, SRIP method which is also based on local features will be introduced.
In most time, users just want to find one or more specific targets of query images. Thus, it is significant to focus on the salient regions that users concern about. A salient region can express the main content of images well. To some extent, it can eliminate the interferences of redundant information, and be well accepted in the domain for image retrieval.
Furthermore, the salient region of images regards as a region that interest points’ distribution density is obviously higher than the others. Higher density represents more particular information. Within the salient region, the interest points are called salient points. The interest points locate outside of the salient region need to be removed. After, the background information can be reduced and the feature extractions will have more accuracies. The detailed procedures of salient region division are as followings:
Suppose that the size of image is m × n , the set of interest points P can be expressed as the equation (6). In which, p ( x , y ) represents the interest points and ( x , y ) represents the coordinate.
PPT Slide
Lager Image
Let N represent the total number of interest points, and we define the centroid of interest points O as the equation (7).
PPT Slide
Lager Image
Next, we calculate the distance between O and each interest points according to the equation (8), where, Ri is the distance from the i th interest points to the centroid O .
PPT Slide
Lager Image
Then, arranging Ri from large to small orders and setting the k - th ( k N ) value to be radius that was recorded as Rc . We draw a circle C around the centroid, which is seen as the circle center. In Fig. 4 , we determined an ergodic region according to the Rc , and iterated each pixel in this region. All pixels in the ergodic region can be seen as the center of circle, and then we will obtain some circles with Rc radius. In Fig. 4 , A’, B’, C’ and D’ represent the four corners of the image. The blue region constitutes of A, B, C and D is the ergodic region. AA’, BB’, CC’ and DD’ are angle bisectors of the each corner of the image. The goal is to achieve to iterate all the pixels in the image by only iterating the least number of pixels. Moreover, if Rc is large enough, the ergodic region will become a line of EF.
PPT Slide
Lager Image
Ergodic region
In the process of iteration, we recorded the coordinate of the center and the total number of interest points in the circle. After the comparison of each circle, we selected the one which the number of interest points was the largest. Then it was defined as the salient region. It was drawn on the image in green. Finally, we retained the salient interest points in the salient region and excluded the free distributed interest points, so as to obtain the feature extraction region. Additionally, the set of salient interest points is determined as the equation (9).
PPT Slide
Lager Image
where, xc and yc represent the circle center of the salient region. Fig. 5 shows two examples of SRIP’s extractions ( k = 14). It can be seen that the free distributed points have been effectively omitted. Plus, the feature-region extraction are more accurate than the results in Fig. 3 .
PPT Slide
Lager Image
Two examples of SRIP’s extraction (k=14)
Additionally, by comparing the data, we found that the experiment could achieve the best results if Rc = ⌈0.9 × N ⌉, in which ⌈⋅⌉ rounds the elements to the nearest integers towards plus infinity, N represents the total number of stable interest points.
4. Pseudo-Zernike Moments
Pseudo-Zernike Moments [10] can be obtained from a group of basic functions { Vnm ( x , y ), x 2 + y 2 ≤ 1}, which perform better than Zernike moment in anti-noise aspect. Vnm ( x , y ) is defined as equation (10), which can satisfy the equation (11).
PPT Slide
Lager Image
PPT Slide
Lager Image
where, n represents the non-negative integer, m is the integer (| m | ≤ n ), ρ is radius and θ is the angle of polar coordinates pixels. Then, we defined the f ( x , y )'s ( n , m ) step pseudo-Zernike Moments as the following equation (12).
PPT Slide
Lager Image
After confirming the interest points’ location, we can build the origin of polar coordinates, calculate the pseudo-Zernike and convert unit circle to polar coordinates. The pixels locate outside of the polar coordinates will not be calculated.
5. Similarity Measurement
Suppose that Q represents the query image with E salient interest points, I represents the database image with F salient interest points, and N represents the total number of each salient interest points’ Pseudo-Zernike Moments. Then, we can define the similarities of Pseudo-Zernike Moments S in salient interest points’ neighborhood as the equation (13).
PPT Slide
Lager Image
where, ║ PQ ( i )║ represents the amplitude of the query images’ i - th moment, ║ PI ( i )║ is the same amplitude of the database images.
6. Experiments
The platform of “hardware: Core 2 Q8200 CPU, 2.33GHz, 4.0GB memory, OS: windows server 2003,software: Matlab7.1” and the Caltech-101 dataset have been selected to study this experiment. The Caltech-101 dataset totally consists of 9146 images, and splits to 101 different object categories as well as an additional background/clutter category. There are around 40 to 800 images per category, e.g., sunflower, butterfly, schooner, rhino, panda, lamp, cougar face, rooster, piano, watch, cup, lotus, barrel, ant, chairs and other images.
Fig. 6 shows the thumbnails of some categories.In this experiment, the precision is defined as PT = n / T , in which T represents the total number of image outputs in the retrieval results and n represents the number of correct images. Some experiment results are shown in Fig. 7 .
PPT Slide
Lager Image
Thumbnails from some categories of the Caltech-101 database
PPT Slide
Lager Image
Retrieval results
From the Fig. 7 , there are 6 rows and each row has 10 images. The leftmost image of each row is query image, which were chosen from the categories shown in Fig. 6 . Other images in Fig. 7 in each row are the retrieval results which are arranged from large to small order according to the similarities. Furthermore, the images marked with red ellipses are the results in errors. In the first row, because there is no error result, the P 10 of sunflower is 100%. In the second row, there are 3 error results, so the P 10 of cougar face is 70%. Similarly, the P 10 of the schooner, butterfly, panda and grand piano are respectively 70%, 80%, 60% and 60%. It is also important to note that the 3 error results in the third row are all ketches because it regarded ships as the same as schooners. If we deem the query image as ships, P 10 will be valued 100%. In short, the 6 precisions of experiment results are over 50%, which can satisfy our needs.
By comparing the corrected results and results in errors, it is obvious that the image shapes in errors are similar to the corrected image shapes. The reason behind this is that Pseudo-Zernike moments can describe these images well since it has space geometric invariance characteristis ofselecting, translating, and scaling.However, singly adopting this method can easily cause the distrubances of similar images. The advantage of this method is that the complexity is low and calculation speed can be fast. Nevertheless, if intergating with some algorithms based on more features such as colors and textures, the method we proposed can be improved.
In order to evaluate the performances of our algorithms further, we have randomly selected 10 query images in each category and calculated the average precisions
PPT Slide
Lager Image
of 6 categories mentioned above. The purpose is to compare the precision of both of the Chen’s algorithms in Ref. [11] and Yu’s algorithms in Ref. [12] . Table 1 shows the precision of different methods. By looking at the Table1,the average precision of our method has increased 6.5 percent than Chen’s algorithms and 11.2 percent than Yu’s algorithms. Furthermore, Fig. 8 shows the
PPT Slide
Lager Image
of the 102 categories, which are listed alphabetically from left to right. In accordance with this figure, the soccer ball’s precision is the highest, and the crab’s precision is the lowest. Moreover, there are 60 categories’ precisions are over 50 percent.
Precision of different methods
PPT Slide
Lager Image
Precision of different methods
PPT Slide
Lager Image
of 102 categories
Additionally, we have compared the Precision-Recall (PR) curves and the average retrieval time with the algorithms of Chen’s and Yu’s. Fig. 9 shows the comparison result of PR curves. Fig. 10 shows the average retrieval time of 6 categories in the Caltech-101 database. By reading this figure, our proposed method is superior than the other methods since it has effectively retrieved the precisions and the time complexities. Thus, the combination of IDPSH and SRIP can effectively reduce the amount of calculation, and accurately extract the images’ visual features.
PPT Slide
Lager Image
Comparison result of PR curves
PPT Slide
Lager Image
Average retrieval time of 6 categories in the Caltech-101 database
8. Conclusions
In this paper, we proposed an image retrieval method based on IPDSH and SRIP. The IPDSH algorithm can detect interest points more accurately after eliminating unstable interest points. On the other hand, the SRIP algorithm can reduce the interferences of some free distributed interest points in the images' backgrounds. As a result, it can make the feature extraction algorithm perform better. In addition, when confirming the salient regions, the Pseudo-Zernike moments will be extracted as the image features since they have the advantages of anti-noise, and rotational invariance. Finally, the retrieval results can output according to the similarities. To conclude, these experiment results have shown the proposed method we conducted can perform better rather than some other popular methods.
In the future, based upon other scholars' researches, we would like to extend our proposed methods to solve more challenging problems such as video searches, large data and mobile searches [13] . Furthermore, we are also going to make an efforts on improving detection speeds while guaranteeing accuracies.
BIO
Xu Zhang received his B.S. degrees in control technology and instrument from Xidian University, Xi’an, and P.R. China in 2007. He is currently a Ph.D. student with the School of Areospace Science and Technology, Xidian University. His research interests include computer vision, pattern recognition, signal processing and image retrieval.
Baolong Guo received his B.S., M.S. and Ph.D. degrees from Xidian University in 1984, 1988 and 1995, respectively, all in communication and electronic system. From 1998 to 1999, he was a visiting scientist at Doshisha University, Japan. He is currently the director of the Institute of Intelligent Control & Image Engineering (ICIE) at Xidian University. His research interests include neural networks, pattern recognition, intelligent information processing, and image communication.
Yunyi Yan received his B.S. degree in automation, M.S. degree in pattern recognizaion andintelligent system, and Ph.D degree in electronic science and technology from Xidian University in 2001, 2004 and 2008 respectively in China. Since 2004, he has been with ICIE Institute of Areospace Science and Technology School of Xidian University, where he is currently an associated professor. Since 2013, he served as the director of Intelligent Detection Department in Xidian University. He worked as a visiting scholar in Montreal Neurological Institute of McGill University,Canada from 2010 to 2011. His research interests include system reliablity analysis, medical image processing, and swarm intelligence.
Wei Sun was born in 1980, Anhui province, China. He received his Bachelor degree in Measuring & Control Technology and Instrumentations, Master and Ph.D. degrees in Circuit and System from Xidian University, Xi'an, China, in 2002, 2005 and 2009, respectively. Since 2012, Dr. Wei Sun has been an associate Professor in School of Aerospace Science and Technology at Xidian University, Xi’an, China, His research interests include visual information perception, pattern recognition and embedded video system. He has published over 20 papers in the significant journal or conference and has been granted 3 patents.
Meng Yi (S’12) received the M.S. degree in Electrical Engineering from Northwestern Polytechnical University, Xi’an, China, in March 2008. Since 2009, he has been a Ph.D. of Electric circuit and systematic at Xidian University. Currently, he is a visiting doctoral candidate in Department of Electrical and Computer Engineering and Center for Automation Research, University of Maryland, College Park, maryland, USA. His research interests include computer vision, pattern recognition, signal processing and biometrics.
References
Datta R. , Joshi D. , Li J. , Wang J. Z. 2008 “Image retrieval: ideas, influences, and trends of the new age” ACM Computing Suerveys Article 5:1-59, Article (CrossRef Link). 40 (2)
Ji R. R. , Duan L. Y. , Chen J. , Xie L. X. , Yao H. X. , Gao W. 2013 “Learning to distribute vocabulary indexing for scalable visual search” IEEE Transactions on MULTIMEDIA Article (CrossRef Link). 15 (1) 153 - 166    DOI : 10.1109/TMM.2012.2225035
Yan C. L. 2013 “Accurate image retrieval algorithm based on color and texture feature” Journal of Multimedia Article (CrossRef Link). 8 (3) 277 - 283    DOI : 10.4304/jmm.8.3.277-283
Fu X. , Zeng J. X. 2010 “A novel image retrieval method based on interest points matching and distribution” Chinese Journal of Lasers Article (CrossRef Link). 37 (3) 774 - 778    DOI : 10.3788/CJL20103703.0774
Li Y. X. , Geng B. , Zha Z. J. , Tao D. C. , Yang L. J. , Xu C. 2012 “Difficulty guided image retrieval using linear multiple feature embedding” IEEE Transactions on Multimedia Article (CrossRef Link). 14 (6) 1618 - 1630    DOI : 10.1109/TMM.2012.2199292
Wang X. Y. , Yang H. Y. , Li Y. W. , Yang F. Y. 2013 “Robust color image retrieval using visual interest point feature of significant bit-planes” Digital Signal Processing Article (CrossRef Link). 23 (4) 1136 - 1153    DOI : 10.1016/j.dsp.2013.01.013
Zhu X. Q. , Huang J. C. , Shao Z. F. , Cheng G. Q. 2013 “A new approach for interesting local saliency features definition and its application to remote sensing imagery retrieval” Geomatics and Information Science of Wuhan University Article (CrossRef Link). 38 (6) 652 - 655
Pedrosa G. V. , Batista M. A. , Barcelos C. A. Z. 2013 “Image feature descriptor based on shape salience points” Neurocomputing Article (CrossRef Link). 120 (23) 156 - 163    DOI : 10.1016/j.neucom.2012.07.055
Wang Y. T. , Chen Y. Q. , Li J. , Li B. M. 2012 “The Harris corner detection method based on three scale invariance spaces” International Journal of Computer Science Issues Article (CrossRef Link). 9 (6) 18 - 22
Dai X. B. , Liu T. L. , Shu H. Z. , Luo L. M. 2013 “Pseudo-zernike moment invariants to blur degradation and their use in image recognition” Lecture Notes in Computer Science Article (CrossRef Link). 7551 90 - 97
Chen M. S. , Yang S. Y. , Zhao Z. J. , Fu P. , Sun Y. , Li X. N. , Sun Y. , Qi X. Y. 2011 “Multi-points diverse density learning algorithm and its application in image retrieval” Journal of Jilin University (Engineering and Technology Edition) Article (CrossRef Link). 41 (5) 1456 - 1460
Yu J. , Qin Z. C. , Wan T. , Zhang X. 2013 “Feature integration analysis of bag-of-features model for image retrieval” Neurocomputing Article (CrossRef Link). 120 355 - 364    DOI : 10.1016/j.neucom.2012.08.061
Ji R. R. , Duan L. Y. , Chen J. , Yao H. X. , Yuan J. S. , Rui Y. , Gao W. 2012 “Location discriminative vocabulary coding for mobile landmark search” Int J Comput Vision Article (CrossRef Link) 96 (3) 290 - 314    DOI : 10.1007/s11263-011-0472-9
Koenderink J.J. 1984 “The structure of images” Biological Cybernetics Article (CrossRef Link). 50 363 - 396    DOI : 10.1007/BF00336961
Lindeberg T. 1993 “Detecting salient blob-like image structures and their scales with a scale-space primal sketch: a method for focus-of-attention” International Journal of Computer Vision Article (CrossRef Link). 11 (3) 283 - 318    DOI : 10.1007/BF01469346