Advanced
WLSD: A Perceptual Stimulus Model Based Shape Descriptor
WLSD: A Perceptual Stimulus Model Based Shape Descriptor
KSII Transactions on Internet and Information Systems (TIIS). 2014. Dec, 8(12): 4513-4532
Copyright © 2014, Korean Society For Internet Information
  • Received : July 22, 2014
  • Accepted : November 20, 2014
  • Published : December 31, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Jiatong Li
School of Information and Electronics, Beijing Institute of Technology
Baojun Zhao
School of Information and Electronics, Beijing Institute of Technology
Linbo Tang
School of Information and Electronics, Beijing Institute of Technology
Chenwei Deng
School of Information and Electronics, Beijing Institute of Technology
Lu Han
School of Information and Electronics, Beijing Institute of Technology
Jinghui Wu
School of Information and Electronics, Beijing Institute of Technology

Abstract
Motivated by the Weber’s Law, this paper proposes an efficient and robust shape descriptor based on the perceptual stimulus model, called Weber’s Law Shape Descriptor (WLSD). It is based on the theory that human perception of a pattern depends not only on the change of stimulus intensity, but also on the original stimulus intensity. Invariant to scale and rotation is the intrinsic properties of WLSD. As a global shape descriptor, WLSD has far lower computation complexity while is as discriminative as state-of-art shape descriptors. Experimental results demonstrate the strong capability of the proposed method in handling shape retrieval.
Keywords
1. Introduction
W ith the development of the Internet and the rapid update of the electronic equipment, the efficient and effective image retrieval methods are required to satisfy the image retrieval needs of people. Unlike the traditional textual annotation based retrieval methods, content-based image retrieval (CBIR) systems, which represent the image as features like color, texture and shape to implement image matching, are expected to realize the function of searching image in high efficiency [1] . Shape is an important perception for humans to understand images. However, to establish effective shape feature extraction and matching method is still a challenging task. A robust shape descriptor is required to be invariant to translation, rotation and scale as well as insensitive to noise, tolerant to distortion and even occlusion [2] . In addition, the massive database requires the CBIR has quick retrieval response, therefore, low computation complexity is also necessary for a shape descriptor.
Many shape representation and analysis methods have been proposed in the past decades. They can be generally classified into two categories: contour-based methods and region-based methods. Among the two categories, contour-based methods are in the majority.
There are many contour-based methods. One of the classic methods is Curvature Scale Space (CSS) [3] . The CSS uses the zero-crossings of the contour curvature to divide the whole shape contour into convex/concave arcs. CSS smoothens the contour by Gaussian kernels of increasing scales, and finally the whole contour will be convex with pairs of the curvature zero-crossing points evolving together. The whole evolving process of the curvature zero-crossings can be illustrated by the CSS Image, whose similarity is used for shape matching. Another well-known shape descriptor is Shape Context (SC) [4] . At each reference point, SC uses the spatial location of remaining contour points to form the 2D histogram, and thus finds correspondence by the histogram sets of two shapes. In order to tackle the shape articulation, Inner Distance Shape Context (IDSC) [5] extends the SC, and proposes to replace the traditional Euclidean distance by inner distance, which is the shortest path between landmark points within the shape silhouette, and dynamic programming is used to preserve the ordering of the contour points. Unlike SC and IDSC which utilize local histogram matching methods, Contour Points Distribution Histogram (CPDH) [9] uses one histogram to describe the distribution of the whole contour points of a shape under polar coordinate, and the shape similarity is obtained by Earth Mover’s Distance (EMD). In [8] , another shape descriptor called Contour Flexibility (CF) is proposed, which describes the shape contour points by their deformable potential and also uses dynamic programming to do shape matching. Besides designing discriminative shape descriptors, some new shape matching methods are also proposed. One of them is Hierarchical Procrustes Matching (HPM) [6] . HPM is proposed to conduct the shape matching hierarchically by using the longer segment matching result to predict the correspondence of the shorter and finer segment. In [7] , Shape tree (ST) describes the shape by hierarchically dividing the shape contour into short sub-curves, and adopts the elastic matching method for shape comparison. The ST representation has good tolerance to shape distortion, and random shape deformations can be obtained by adding noise to the nodes in a shape tree without perceptually identifying the shape category.
The other major category of shape descriptor is region-based methods. For example, the Zernike moments (ZM) [10] is selected as the MPEG7 standard region-based shape descriptor, but it is relatively time consuming and has limited tolerance to shape distortion. Another moment-based shape descriptor is Moment Invariant (MI) [11] . Although MI performs not as well as ZM, it only has 7 feature vectors, with low storage and high speed retrieval response, so it is also a widely used shape analysis method. Recently, Support Vector Shape (SVS) [12] is proposed which uses the decision function trained by Support Vector Machine (SVM) to describe the shape. With the Radial Basis Function (RBF), SVS maps points within the shape to high dimension, and has good performance against large noise.
Recent years, there are some new trends in shape analysis. For instance, some work focus on the post-processing after shape matching [13] [14] . In [13] , it is proposed to construct the graph model using the similarity (or distance) of pairs of shapes, and adopt the graph transduction to learn the graph structure implicitly. The context-based shape retrieval methods like graph transduction can well handle the situation in which intra-class distances are larger than inter-class ones and is capable to improve the retrieval rate based on available shape measures. In addition, there are researches about the heuristic-based auxiliary shape descriptor aims at describing some specific kinds of shapes. One of the representative work is [15] . In [15] , two perceptually motivated strategies are proposed, the first handles shapes with base structure and “strand” structures, the second handles symmetry shapes. The two strategies can be integrated into existing shape matching methods to improve the retrieval or classification performance. In this paper, we focus more on shape representation and matching and do not focus on the above post-processing methods, since they can be used in all available shape measures.
The remainder of paper is organized as follows. In Section 2, we review the previous shape representation work in multi-scale which is in close relation to our approach. Section 3 describes the new shape representation in detail. Section 4 presents the experimental results using the proposed method for shape retrieval in two datasets, including the widely used MPEG-7 core dataset and Tari 1000 dataset respectively. The experimental result in MPEG-7 shows the competitive retrieval performance to the state-of-the-art shape descriptors, and the results in Tari 1000 demonstrate the proposed method is also very discriminative in handling articulated shapes.
2. Related Work
Contour-based methods are more popular than region-based ones and it can further be divided into global description methods and local description methods. Usually, local descriptors are more discriminative than global ones. The massive database requires not only high retrieval rate, but also low storage features and quick retrieval response. However, most local description methods aim to find correspondence of feature sets of contours, consequently sacrificing computation efficiency, which is not suitable for large database retrieval.
Multi-scale methods are an important research branch of shape analysis and have shown high discriminative ability [3] [6] [7] [29] [30] . However, most of the existing multi-scale methods still use the scale normalization to get the scale invariance, and circular shift matching to achieve the rotation invariance, all of which add to high computation load. In addition, the general multi-scale shape representation itself is another step that is time-consuming. For example, as discussed in Section 1, CSS achieves the multi-scale representation by smoothing the shape contour using Gaussian kernels. The same representation method is also adopted by [29] .
This paper proposes a new contour-based shape descriptor, called Weber’s Law Shape Descriptor (WLSD). WLSD is a global shape descriptor with very low computation complexity while is as discriminative as the state-of-the-art local descriptors. In addition, WLSD is intrinsically extended to multi-scale without extra computation load, and it represents a shape by only 24 dimension vector each scale, which leads to efficient shape matching without scale normalization or circular shift.
WLSD aims to describe the salient variation of a shape contour that stimulate human perception. According to Weber’s Law, under the fact that human perception of a pattern depends not only on the stimulus difference but also on the ratio of the relative stimulus change to the original stimulus intensity, we first design Contour Angular Feature (CAF) to model the so-called stimulus intensity, and then establish WLSD according to the Weber’s Law Equation. WLSD is then extended to multi-scale to enhance its description capacity. We also adopt the feature selection method to select the effective scales of WLSD. The experiments demonstrate the discriminative power and low computation property of WLSD in shape retrieval.
3. Weber’s Law Shape Descriptor for Shape Representation
In this section, we first review Weber’s Law and then propose the Contour Angle Feature (CAF). We then adopt the CAF to construct WLSD, and demonstrate the importance of the multi-scale property of WLSD. Finally, we give the WLSD scale selection procedures by SFS.
- 3.1 Weber’s Law
Weber’s Law is a psychological rule. It demonstrates that the change in stimulus intensity varies by the original stimulus intensity, and the ratio of the just noticeable difference (JND) of the perception to the original stimulus is a constant [23] [24] . The relationship can be expressed as:
PPT Slide
Lager Image
where Δ I denotes the JND, I represents the original stimulus intensity, k is the constant corresponding to the certain perception. The fraction Δ I / I is known as the Weber fraction or Weber proportion. For example, an experiment conducted by Weber in 1840 shows that people can feel the increase or decrease of the additional 1 gram if they hold a 52 grams weight object, but can only feel 2 gram if they hold 104 grams, which demonstrates the JND differs according to the original stimulus intensity.
- 3.2 WLSD
Weber’s Law gives us the motivation to a perceptual stimulus model. The first task is to construct a shape descriptor as the original stimulus intensity. Many shape descriptors have been proposed, such as the distance between the shape centroid and the contour, which is used in [25] to do the trademark retrieval, others, like the contour curvature, signature, Fourier Descriptor (FD), are either difficult to model Weber’s Law or are sensitive to scale or rotation variation, so we design a new shape descriptor called Contour Angular Feature (CAF).
- 3.2.1 The Original Stimulus Feature - Contour Angle Feature
We first introduce the single scale CAF, and then give the general definition of the multi-scale CAF.
Given a closed contour point O with its nearest right and left points A and B , then ∠ AOB is called the CAF of point O . Note ∠ AOB has two directions, inwards or outwards to the closed contour, and the sum of two angles are 2 π . In this paper, we suppose the CAF always be the outwards one. CAF can be extended to multi-scale naturally, let CAFs signifies the s scale of CAF, then angle ∠ AOB is CAF 1 , and CAF 2 is the angle formed by contour point O with its second nearest right and left points, etc. Now we give the general definition of CAFs :
Definition I: Given a closed contour C , and sample it by n clockwise points, whose coordinates are orderly ( xi , yi ), where i =1,2,…, n , then the CAFs of O ( xi , yi ) is defined as:
PPT Slide
Lager Image
where s =1,2,…,⎿( n -1)/2⏌ (⎿ ⏌ denotes round toward negative infinity), operator × represents the vector cross product,
PPT Slide
Lager Image
,
PPT Slide
Lager Image
.
In Definition I , the multi-scale CAFs can be obtained with s varying from 1 to ⎿( n -1)/2⏌, As and Bs are the right and left points relative to O respectively, and both are s points away from O . Consequently, every sample point can have the maximum ⎿( n -1)/2⏌ scale angles. Examples of the CAF 1 , CAF 3 , and CAF 5 of point O ( xi , yi ) are shown in Fig. 1 (a). The reason why extending
PPT Slide
Lager Image
and
PPT Slide
Lager Image
to three dimension is to highlight the direction property of the vector cross product (for instance,
PPT Slide
Lager Image
×
PPT Slide
Lager Image
>0 indicates that B O A is clockwise ordered), therefore, the definition guarantees angle
PPT Slide
Lager Image
always directing outwards the interior of a closed contour. Analogously, the CAFs of the opposite direction can be defined, which is equivalent to Definition I. The defined
PPT Slide
Lager Image
varies from 0 to 2 π , indicating the local contour of the shape changing perceptually from inwards strand structure to outwards one (as illustrated by Fig. 1 (b)), so the CAFs is capable to describe all variation of a shape contour. Notice CAF is invariant to scale and rotation.
PPT Slide
Lager Image
(a) Illustration of the CAFs (b) CAFs varies from nearly 0 (the bottom square) to near π(the upper square)
- 3.2.2 Weber’s Law Shape Descriptor
Hinted by the Weber’s Law, we use the CAF difference between the current point and its neighbors to model the difference of the stimulus intensity, and proportion of CAF differences to CAF of the current point to get the saliency variation that stimulate human perception. We then adopt the arctangent function to operate on the proportion [17] , which can reduce the negative effect caused by noise. Let θi be CAF1 of the contour point O ( xi , yi ), then the saliency of point O ( xi , yi ) can be described as:
PPT Slide
Lager Image
where θi denotes the CAF1 of the current point, θ mod(i+1,n) and θ mod(i-1,n) represent the right and left neighbor points of the current point respectively. We call this feature the Weber’s Law Shape Descriptor (WLSD), and f ( θi ) the WLSD value of point O ( xi , yi ).
The idea of the WLSD is partially motivated by Weber’s Law Descriptor (WLD) in [17] . WLD is an image descriptor regards the gray intensity as the original stimulus intensity, and the difference of the stimulus is the intensity differences of a current pixel and its neighbors. While in WLSD, CAF is viewed as the stimulus intensity, and the difference of the stimulus is the CAF difference of a current contour point and its neighbors. However, the description targets of WLSD and WLD are different, leading to the necessity of miming more information of WLSD: the information content of a shape contour is much smaller than that of a gray image, and the CAF of a contour varies relatively more smoothly than that of gray intensity. That leads to the motivation of extending WLSD to multi-scale.
If we only use single neighbor based WLSD shown in Equation (3), the relatively large WLSD value points (salient points) will only occupy a small proportion of the whole contour points. Fig. 2 maps the WLSD value of contour points linearly to the colormap bar illustrated in Fig. 2 (e) so that we can observe the description capacity of WLSD visually (the minimum value maps to leftmost deep blue, maximum to the rightmost deep red). Fig. 2 depicts the single scale WLSD (i.e. the WLSD indicated by Equation (3)) of the contour. Because CAF 1 varies slowly, WLSD value has a sparse distribution, with most of the WLSD value focus on the middle of the colormap bar (nearly 0). In addition, the single neighbor based WLSD value presents an intermittent appearance because of the noise or the continuity variation of the shape contour, which is an unstable and less discriminative description. Our goal is to let WLSD give a relatively continuous and robust description of the contour in accordance to human perception. As a result, we introduce the multi-scale WLSD.
PPT Slide
Lager Image
WLSD1→1(single scale WLSD) value linearly mapped to the colormap bar
The scale of WLSD has close relation with the scale of CAF, let WLSDw represents WLSD of scale w , given a contour sampled with n points and its corresponding CAFs , let w =1, then the WLSD 1 of point O ( xi , yi ) is:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
signifies the CAFs defined as Equation (2). Consequently, each WLSDw has a corresponding scale of CAFs . In this sense, the scale of WLSD is also determined by the scale of CAFs implicitly. We use WLSD sw represents the WLSDw extracted from CAFs , then the notation of the Equation (4) is WLSD s→1 . Combining the scales of s and w , we introduce the general definition of WLSD sw :
Definition II: Given a closed contour C , and sample it by n clockwise points, whose coordinates are orderly ( xi , yi ), where i =1,2,…, n , the WLSDsw of point O ( xi , yi ) is defined as:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
is the CAFs of the contour point O ( xi , yi ), and w =1,…,⎿⎿( n -1)/2⏌/ s ⏌(⎿ ⏌ denotes round toward negative infinity).
Therefore, given the CAFs of a contour, the stimulus intensity difference of the current scale w is the difference between the current point’s CAFs and the CAFs of its right and left w points (the adjacent distance within each sides’ points is s ).
Fig. 3 illustrates some examples of WLSD sw . Red represents the current point, green denotes the right and left w number of points corresponding to different scale WLSD sw .
PPT Slide
Lager Image
The illustration of the WLSDsw
Fig. 4 illustrates that the multi-scale WLSD can describe the contour saliency well and has the desired continuous salient variation description property. (a) ~ (d) depict the WLSD 5→5 , and demonstrate that multi-scale WLSD has relatively uniform distribution within the WLSD range of value and can describe the contour saliency continuously. Fig. 4 also indicate WLSD has the similar description of intra-class shapes.
PPT Slide
Lager Image
WLSD5→5 value linearly mapped to the colormap
- 3.2.3 WLSD Histogram and WLSD scale selection
In order to let the feature be tolerant to the shape distortion to some extent, we uniformly divide the range of WLSD sw value, and use one dimensional histogram as the global shape feature. Given two histograms H 1 and H 2 , we adopt the commonly used χ 2 distance to compare them:
PPT Slide
Lager Image
where K represents the number of histogram bins, h 1k and h 2k signifies number of points falling into the k th bin of H 1 and H 2 respectively. In the paper we take K =24.
Different scale of WLSD present different discriminative information, for instance, WLSD 3→3 can capture local information of the contour, while WLSD 16→5 tends to describe the contour in a global view. As mentioned above, each WLSDw has a corresponding scale of CAFs . Given a n points contour, then the whole number of scales of CAF is ⎿( n -1)/2⏌, and each scale of the CAF corresponds to ⎿⎿( n -1)/2⏌/ s ⏌ scales of WLSD. Consequently, given a n points contour, we can get the whole number of scales of WLSD, which is denoted by SWLSD :
PPT Slide
Lager Image
We want to select discriminative scales of WLSD among the whole number of scales shown in Equation (6), and combine the selected scales to be the final effective features. Motivated by the feature selection theory, we view the scale selection as a feature selection problem and use the idea of Sequential Forward Selection (SFS) [36] . First, we define the feature evaluation criterion according to specific application situation, and then select the scale in the candidate set that can improve the defined criterion iteratively until the criterion stops improving. It should be noted that the scales selected by SFS may be not the optimal, but it is relatively more time-saving than many other complicated algorithms, and is very efficient which can be verified by the experiment.
One of the problem above is that, the feature dimension will expand as the iterative times increase, then the shape matching computation will also increase. To overcome the above computation problem, we operate directly on the distance matrix instead of calculating the combined features. Suppose there are M training shapes, then the distance matrix is D M×M , whose elements are the distance between each pair of shapes. Suppose each contour is uniformly sampled by 200 points, according to Equation (6), the size of the initial candidate set SWLSD is 473, we denote them by:
PPT Slide
Lager Image
The subscript of D is in accordance with that of WLSD sw , it can be simplified denoted by { D 1 , D 2 ,…, D l ,…, D SWLSD }. The WLSD scale selection algorithm steps are shown as Table 1 :
WLSD Scale Selection Algorithm
PPT Slide
Lager Image
WLSD Scale Selection Algorithm
After the above training procedures, the selected scales are recorded in L . In specific application, for example, shape retrieval, we first train the samples to acquire the effective WLSD scales, and then compare shapes according to their combination distances of the selected WLSD scales.
4. Experimental Results and Analysis
In this section, we demonstrate the performance of WLSD in shape retrieval, and compare it with other state-of-art methods. We first do experiment in MPEG7 shape database. Others, like Kimia’s dataset, is also very popular in shape matching, but it is not suitable for training since its size is relatively small (Kimia’-99 dataset has 99 shapes with 11 shapes per category). Therefore, we choose the Tari dataset [21] . In the MPEG-7 dataset, we further conduct some specific analysis experiments, including the computation complexity analysis, the stability of the scale selection algorithm and the effectiveness of WLSD in comparison with CAF.
In the following experiments, the retrieval rate is measured by the so-called bull’s eye score. Let t be the number of shapes of the same class. Every shape in the database is compared to all other shapes, and the number of shapes from the same class among the 2 t most similar shapes is reported. The bull’s eye retrieval rate is the ratio of the total number of shapes from the same class to the highest possible number (in MPEG7 is 20×1400). We apply the five-fold cross validation to demonstrate the performance of WLSD. The dataset is divided into five folds, one split for validation and the others for training. Note that each validation fold is tested on the whole dataset using the scales obtained from the corresponding four training splits, consequently, the combination of the five validation folds’ performance will include retrieval result of all the dataset samples.
Each contour is uniformly sampled by 200 points, according to Equation (6), the number of the initial candidate set SWLSD is 473. The experimental environment is Matlab 7.11.0, and the experimental platform is Intel(R) Core(TM) 2 Duo CPU, 2.53 GHz.
- 4.1 MPEG-7 dataset
- 4.1.1 Comparison with existing methods
MPEG-7 CE-Shape-1 Part B dataset is widely used in shape retrieval, it consists of total 1400 shapes, having 70 shape classes, and each class owns 20 pictures. The database is challenging due to the presence of much distortion and examples that are visually dissimilar from other members of their class and highly similar to members of other classes. The first example of each class is shown in Fig. 5 :
PPT Slide
Lager Image
Examples of MPEG-7 shape dataset
The bull’s eye score is compared by the recently popular 13 shape analysis algorithms. From Table 2 , it can be seen that the bull’s eye score of WLSD performs better than all the other methods, including multi-scale methods- [6] [29] [30] . Moreover, WLSD is much more efficient than all the methods listed on the table. In real application, after the scale selection and the feature extraction of the database having been done offline, the time complexity of shape retrieval using WLSD is only the time costs by feature extraction of the query shape, and the number of selected scales χ 2 distance computation of the 24-dimension vectors. Under the above experimental environment, the pairwise shape matching time of WLSD is only 2.2ms with n =200, where n is the number of contour points.
Comparison of bull’s eye score on MPEG-7 dataset
PPT Slide
Lager Image
Comparison of bull’s eye score on MPEG-7 dataset
Considering the methods whose retrieval rate are above 85% listed on the table. Hierarchical Procrustes reports 300ms taken by matching two shapes [6] . The computation of pair-wise similarity between two shapes in Symbolic takes 76.5ms on average, with the contours represented by 100 points [33] . Shape L’ Â ne Rouge estimates the density takes on average 2 to 3 minutes per shape, which is also much time consuming than WLSD [7] . With 100 points, IDSC reports 0.31s on a 2.8G PC implemented by optimized Matlab code [5] . The main reasons of time saving of WLSD are two points, one is that WLSD is a global shape descriptor which is much faster than local descriptors that depend on local matching; the other is that WLSD is intrinsically extended to multi-scale without extra extension load, such as filtering the contour used in [18] [29] .
- 4.1.2 Experiments of the single scale WLSD
We first test the effectiveness of the single scale WLSD by showing its bull’s eye retrieval rate of each scale in Fig. 6 (a). Because of the relation of WLSDs and WLSDw , the scales mainly spread near the coordinates. The highest score of single scale is 58.30%, with the corresponding scale is WLSD 10→5 , while lowest score is only 33.16%. The results show that the performance of single scale WLSD is not satisfactory. Consequently, the multi-scale fusion is necessary so that different scales that contain different information about the contour variation can make good combination.
PPT Slide
Lager Image
(a) The retrieval rate of the single scale WLSD (b) The front and left elevation of Fig.6 (a)
From Fig. 6 (a), it can be seen that the outstanding scales of WLSD are mainly focused on the low scales relatively, because the score peaks are grouped around the original point. The trends begin to decrease as the scales become larger.
We also want to observe the performance of WLSDs and WLSDw respectively. So in Fig. 6 (b), the front and the left elevation of Fig.6 (a), the illustration of WLSDs and WLSDw , are drawn respectively. It illustrates that the outstanding scales of WLSDs are mainly range from 10 to 40, and the counterpart of WLSDw , which are relatively lower, range from 1 to 20. Moreover, the scores of WLSDs are obviously higher than those of WLSDw on average. It is precisely able to explain the characteristic of distribution of the selected scales, which we will show in the following multi-scale experiment.
Although it seems WLSDs performs better than WLSDw , however, it does not mean that WLSDw is not important, as it can be seen from Fig. 6 (a) that only after WLSDw starts to grow can the WLSD sw reaches a jump increasing performance. It proves that WLSDs and WLSDw have good combination, and WLSDw makes necessary supplement to WLSDs .
- 4.1.3 Experiments of the multi-scale WLSD
In the first experiment in this section, we will demonstrate the effectiveness of the multi-scale integration. First, the detail of the iteration performance is discussed. In Fig. 7 (a), the curve shows that the retrieval rate increases by iteration using WLSD in MPEG-7. It demonstrates that the power of a single scale WLSD is not very satisfactory but the score increases shapely by 73.41% just after the second iteration. The retrieval rate arrives more than 80% with less than 4 scales and reaches above 85% with less than 8 scales, which shows the efficiency of the SFS. Finally, after ten iterations, the performance is relatively stable.
PPT Slide
Lager Image
(a) The improvement trend of the retrieval rate by the increase of the iteration times (b) The first five selected scales of each training set in MPEG-7
Then, we demonstrate the effectiveness of Weber’s Law applied to WLSD, the experiment using only the CAF is conducted and the result is also shown in Fig. 7 (a). The highest score obtained by CAF is 80.59%, which is much lower than WLSD. But the trend of the increasing retrieval rate is similar to that of WLSD. Therefore, WLSD can dramatically improve the discriminative performance compared with only CAF. The main reason is that CAF only has the multi-scale information of a shape contour, whereas WLSD can describe more than that with the saliency variation which implicitly indicates the relation between the contour points. The same increasing trend of the curve shown in Fig. 7 (a) also indicates that the multi-scale selection motivated by SFS is efficient and stable.
The distribution of the first several selected scales is further observed. Fig. 7 (b) illustrates the first five selected scales of the five-fold subsets, where the indication color is orderly green, blue, red, white and magenta. It can be seen that the variance of the selected scale becomes larger as the number of iteration increases, and the variance along WLSDs varies more dramatically than that of WLSDw . So the lower scales of WLSD sw are relatively more stable.
In the second experiment, we analysis the WLSD scale distribution. The selected scales of five folds are shown in Fig. 8 (a) ~ (e) respectively, where the yellow area is all the possible WLSD scales and the green dots represent the selected scales. It is illustrated that the effective scales of WLSDw are mainly the lower scales between 1~16. Even though the scales of WLSDs spread relatively scattered, they are more concentrated among 1~20, despite of some sparse high scales. We conclude that the discriminative scales of WLSD sw mainly concentrate in lower scales while higher scales make effective complement.
PPT Slide
Lager Image
(a) ~ (e): The distribution of the selected scales of each training set in MPEG-7 (f): The scales of retrieval rate more than 50% in MPEG-7
The scales of retrieval rate more than 50% are shown in Fig. 8 (f) in blue dots. Compared with the selected scales in Fig. 8 (a) ~ (e), the outstanding individual scales (we call them OIS) are more concentrated. We also find that many selected scales are not in the OIS set although the first selected scales are mainly in the OIS domain. Consequently, less outstanding scales are also very important since they make necessary supplement to the OIS.
In the third experiment, we demonstrate the robustness of the proposed scale selection algorithm. In Fig. 9 (a), the accumulation of the selected scales of the five training sets are illustrated. Compared with Fig. 7 (b), as more scales are included, the scales of WLSDs spread more uniformly than WLSDw . It can be explained that once the high WLSDs scale is selected, the high WLSDw scale is not necessary to some extent, or in another way, high WLSDw scales are not as discriminative as the lower ones. We also want to conduct the quantitative analysis of the variation of the selected scales. Consequently, in Fig. 9 (b), the variance of the first ten selected scales of the training sets are illustrated. From the first two figures, it can be seen that the variance of the first several selected scales of both WLSDs and WLSDw are very stable. Although there are some fluctuations in higher scales, the whole stability is satisfactory. We conclude that the proposed scale selection algorithm is stable in the dominant discriminative scales and have acceptable variation in the complementary scales.
PPT Slide
Lager Image
(a) The accumulation of the first ten selected scale of the training sets (b) The variance of the first ten selected scales
As shown in Fig. 9 (b), WLSDw is relatively more stable than WLSDs . Although WLSDs performs better than WLSDw , WLSDw has made more contribution in the stability of the whole WLSD sw . We further draw the conclusion that WLSDs dominates the discriminability and WLSDw not only provides necessary supplement but also makes WLSD sw more stable. Therefore, it demonstrates again the close relation between WLSDs and WLSDw .
- 4.2 Tari dataset
To demonstrate the WLSD can also handle articulated shapes, we test it on relatively new Tari dataset, which consists of 1000 binary images from 50 shape categories, each category has 20 images. It is designed to have large intra-class shape deformation and many shapes are articulated (see Fig. 10 .). It has been extended from the original 180 images [21] to do the shape skeleton research [22] .
PPT Slide
Lager Image
Examples of Tari dataset
We also adopt the bull’s eye score here, since the scores reported in this dataset are not as much as that in MPEG-7, the number of the compared algorithms are relatively small (as shown in Table 3 ).
Comparison of bull’s eye score on Tari dataset
PPT Slide
Lager Image
Comparison of bull’s eye score on Tari dataset
From Table 3 , IDSC scores higher than WLSD since it is designed to handle articulated shapes and is insensitive to articulation. Overall, WLSD is comparable to the listed methods, which prove the proposed method can handle articulated shapes well. Note the compared methods are also much more time-consuming than WLSD since their local property or local matching scheme.
The iteration plot and the distribution of scales are illustrated in Fig. 11 . The curve shows that after the second iteration, the retrieval rate increases significantly and begins approaching to the final rate. The selected scales cluster in lower scales and disperse in the higher scales. Both of the figures are in accordance with the test in MPEG-7 dataset, which demonstrates the discussion and conclusion in MPEG-7 experiment section, and shows the stability of the proposed method.
PPT Slide
Lager Image
(a) The curve shows the retrieval rate increases by iteration in Tari (b) The distribution of the selected scales of WLSD in Tari
In Fig. 12 , it shows that the variance of WLSDw is much more stable than WLSDs . Compared with Fig. 9 (b), the average variance of WLSD sw here is less variable, which can be explained that the Tari dataset has less distortion than MPEG7. In summary, it has the same variation feature as that of MPEG7. It demonstrates again the robustness of the scale selection method.
PPT Slide
Lager Image
The variance of the first ten selected scales
As described in the Abstract and Section 3.2.1, both the CAF and WLSD are insensitive to rotation and scale variation, we now demonstrate this property in the last experiment. We conduct the experiment by rotating the Tari dataset to 30˚ , 60˚ ,90˚ clockwise, and scale the dataset to 0.5, 1.5 and 2 as well. The test result is shown in Table 4 . From the table, it can be seen that the rotated and the scaled scores are nearly the same to the original ones (the rotation 0˚ and scale 1), which demonstrates CAF and WLSD are robust to rotation and scale variation. We also find that some scores are even higher than the original, which may because of the dataset itself or the de-noising operation in the image transform. The scores of 1.5 and 2 scale transform are lower than that of 0.5 can validate this discussion.
Bull’s eye score on rotated and scaled Tari dataset
PPT Slide
Lager Image
Bull’s eye score on rotated and scaled Tari dataset
5. Conclusion
In this paper we propose a new shape descriptor based on Weber’s Law, named Weber’s Law Shape Descriptor (WLSD). The key idea of WLSD is to capture the salient variation of a shape contour to stimulate human perception. We first design Contour Angular Feature (CAF) to construct the original stimulus intensity and propose WLSD according to Weber’s Law Equation, and then develop the WLSD to multi-scale as well as demonstrate the necessity of the multi-scale extension, finally, the feature selection framework is applied to the multi-scale WLSD to extract discriminative scales. The experiments show the effectiveness and the outstanding efficiency of WLSD.
The future work is notable from the theory foundation of WLSD, since Weber’s Law is instinctively associated with saliency. Therefore, we plan to enhance the selection scheme to select the salient WLSD that stimulate human views to further improve the discriminative ability.
BIO
Jiatong Li is a successive Master and Ph.D. candidate in Information and Communication Engineering at Beijing Institute of Technology (BIT) under the supervision of Professor Baojun Zhao. He is now visiting University of Technology, Sydney (UTS) as a dual-doctoral degree student under the supervision of Richard Xu. His research interests include pattern recognition and machine learning.
Baojun Zhao received the Ph.D. degree in electromagnetic measurement technology and instrument from Harbin Institute of Technology (HIT) in 1996. From 1996 to 1998, he worked as a Postdoctoral Research Fellow at Beijing Institute of Technology (BIT). Baojun Zhao is currently a professor and the director of National Professional Laboratory of Signal Acquisition and Processing. He has been the project leader of more than 30 national projects. His main research interests include video coding, pattern recognition, infrared/laser signal processing, and parallel signal processing. He has authored or co-authored over 100 publications and received 5 national ministerial-level scientific & technological progress awards in these fields.
Linbo Tang received the Ph.D. degree in signal and information processing from Beijing Institute of Technology in 2005. Currently, he is a tutor of master students at the School of Information and Electronics, Beijing Institute of Technology (BIT). His current research interests include video coding, target detection, pattern recognition and object tracking. His speciality is real-time image processing.
Chenwei Deng received the Ph.D. degree in signal and information processing from Beijing Institute of Technology in 2009. Currently, he is an Associate Professor at the School of Information and Electronics, Beijing Institute of Technology. Prior to this, he was a Post-doctoral Research Fellow with the School of Computer Engineering, Nanyang Technological University. He has authored or co-authored over 50 technical papers in refereed international journals and conferences, and co-edited one book. His current research interests include video coding, quality assessment, perceptual modelling, feature representation and object recognition.
Lu Han received her BS degree in Communication Engineering at North Eastern University (NEU) in 2008. She is now pursuing for her MS degree in Information and Communication Engineering at Beijing Institute of Technology (BIT). Her research interests include real-time image processing and pattern recognition.
Jinghui Wu received his BS degree in electronic and information from School of Information and Computer engineering, Northeast Forestry University (NFU), Harbin, China, in June 2007. He received his MS degree in signal and information processing from School of Electrical and Electronic Engineering, Harbin Science and Technology University (HSTU), Harbin, China, in June 2010. He is currently pursuing for his Ph.D. degree in Signal and Information Processing at the School of Information and electronic, Beijing Institute Technology (BIT). His research interests include image enhancement, target tracking, and target recognition.
References
Mussarat Yasmin 2013 “Content based image retrieval using combined features of shape, color and relevance feedback,” KSII Trans. on Internet and Information Systems 7 (12) 3149 - 3165    DOI : 10.3837/tiis.2013.12.011
Tak Yoon-Sik , Hwang Eenjun 2008 “Pruning and matching scheme for rotation invariant leaf image retrieval,” KSII Trans. on Internet and Information Systems 2 (6) 280 - 298    DOI : 10.3837/tiis.2008.06.001
Mokhtarian F. , Abbasi S. 2002 “Shape similarity retrieval under affine transforms,” Pattern Recognition 35 (1) 31 - 41    DOI : 10.1016/S0031-3203(01)00040-1
Serge Belongie , Malik Jitendra , Puzicha Jan 2002 “Shape matching and object recognition using shape contexts,” IEEE Trans. on Pattern Analysis and Machine Intelligence 24 (4) 509 - 522    DOI : 10.1109/34.993558
Haibin Ling , Jacobs David W. 2007 “Shape classification using the inner-distance,” IEEE Trans. on Pattern Analysis and Machine Intelligence 29 (2) 286 - 299    DOI : 10.1109/TPAMI.2007.41
McNeill G. , Vijayakumar S. “Hierarchical procrustes matching for shape retrieval,” IEEE Computer Society Conf. on Computer Vision and Pattern Recognition 2006 vol. 1 885 - 894
Felzenszwalb P. F. , Schwartz J. D. “Hierarchical matching of deformable shapes,” IEEE Conf. on Computer Vision and Pattern Recognition 2007 1 - 8
Xu C. , Liu J. , Tang X. 2009 "2D shape matching by contour flexibility," IEEE Trans. on Pattern Analysis and Machine Intelligence 31 (1) 180 - 186    DOI : 10.1109/TPAMI.2008.199
Shu X. , Wu X.J. 2011 “A novel contour descriptor for 2D shape matching and its application to image retrieval,” Image and vision Computing 29 (4) 286 - 294    DOI : 10.1016/j.imavis.2010.11.001
Kim W. Y. , Kim Y. S. 2000 “A region-based shape descriptor using Zernike moments,” Signal Processing: Image Communication 16 (1) 95 - 102    DOI : 10.1016/S0923-5965(00)00019-9
Joviša Žunić , Hirota Kaoru , Rosin P. L. 2010 “A Hu moment invariant as a shape circularity measure,” Pattern Recognition 43 (1) 47 - 57    DOI : 10.1016/j.patcog.2009.06.017
Van Nguyen Hien , Porikli Fatih 2013 “Support vector shape: a classifier-based shape representation,” IEEE Trans. on Pattern Analysis and Machine Intelligence 35 (4) 970 - 982    DOI : 10.1109/TPAMI.2012.186
Xiang Bai 2010 “Learning context-sensitive shape similarity by graph transduction,” IEEE Trans. on Pattern Analysis and Machine Intelligence 32 (5) 861 - 874    DOI : 10.1109/TPAMI.2009.85
Xingwei Yang , Koknar-Tezel Suzan , Latecki Longin Jan “Locally constrained diffusion process on locally densified distance spaces with applications to shape retrieval,” IEEE Conf. on Computer Vision and Pattern Recognition 2009 357 - 364
Temlyakov Andrew “Two perceptually motivated strategies for shape classification,” IEEE Conf. on Computer Vision and Pattern Recognition 2010 2289 - 2296
Heng Qi 2010 “An effective solution for trademark image retrieval by combining shape description and feature matching,” Pattern Recognition 43 (6) 2017 - 2027    DOI : 10.1016/j.patcog.2010.01.007
Jie Chen 2010 “WLD: A robust local image descriptor,” IEEE Trans. on Pattern Analysis and Machine Intelligence 32 (9) 1705 - 1720    DOI : 10.1109/TPAMI.2009.155
Mokhtarian F. , Abbasi S. , Kittler J. 1997 “Efficient and robust retrieval by shape content through curvature scale space,”Series on Software Engineering and Knowledge Engineering vol. 8 51 - 58
Tu Z. , Yuille A. L. “Shape matching and recognition-using generative models and informative features,” Computer Vision-ECCV, Springer Berlin Heidelberg 2004 195 - 209
Loris Nanni , Brahnam Sheryl , Lumini Alessandra 2012 “Local phase quantization descriptor for improving shape retrieval/classification,” Pattern Recognition Letters 33 (16) 2254 - 2260    DOI : 10.1016/j.patrec.2012.07.007
Cagri Aslan 2008 “Disconnected skeleton: Shape at its absolute scale,” IEEE Trans. on Pattern Analysis and Machine Intelligence 30 (12) 2188 - 2203    DOI : 10.1109/TPAMI.2007.70842
Emre Baseski , Erdem Aykut , Tari Sibel 2009 “Dissimilarity between two skeletal trees in a context,” Pattern Recognition 42 (3) 370 - 385    DOI : 10.1016/j.patcog.2008.05.022
Gescheider A. George 2013 “Psychophysics: the fundamentals,” Psychology Press
Jain K. Anil 1989 “Fundamentals of digital image processing,” Prentice-Hall, Inc.
Heng Qi 2010 “An effective solution for trademark image retrieval by combining shape description and feature matching,” Pattern Recognition 43 (6) 2017 - 2027    DOI : 10.1016/j.patcog.2010.01.007
Yu Zhou 2011 “Shape matching using points co-occurrence pattern,” in Proc. of 6th International Conf. on Image and Graphics 344 - 349
Ling H. , Yang X. , Latecki L. J. “Balancing deformability and discriminability for shape matching,” Springer Berlin Heidelberg Computer Vision–ECCV 2010 411 - 424
Adrian Peter , Rangarajan Anand , Ho Jeffrey “Shape L’Ane rouge: sliding wavelets for indexing and retrieval,” IEEE Conf. on Computer Vision and Pattern Recognition 2008 1 - 8
Adamek T. , O'Connor N. E. 2004 “A multiscale representation method for nonrigid shapes with a single closed contour,” IEEE Trans. on Circuits and Systems for Video Technology 14 (5) 742 - 753    DOI : 10.1109/TCSVT.2004.826776
Emad Attalla , Siy Pepe 2005 “Robust shape similarity retrieval based on contour segmentation polygonal multiresolution and elastic matching,” Pattern Recognition 38 (12) 2229 - 2241    DOI : 10.1016/j.patcog.2005.02.009
Paula I. C. , Medeiros F. N. S , Bezerra F. N. “Shape retrieval by corners and dynamic space warping,” in Proc. of 18th International Conference on Digital Signal Processing 2013 1 - 6
Fotopoulou F. , Economou G. “Multivariate angle scale descriptor of shape retrieval,” in Proc. of Signal Process. Appl. Math. Electron. Commun (SPAMEC) 2011 105 - 108
Daliri M. R. , Torre V. 2008 “Robust symbolic representation for shape recognition and retrieval,” Pattern Recognition 41 (5) 1782 - 1798    DOI : 10.1016/j.patcog.2007.10.020
Wang B. 2011 “Shape retrieval using combined Fourier features,” Optics Communications 284 (14) 3504 - 3508    DOI : 10.1016/j.optcom.2011.03.063
Wang Z , Ouyang J 2013 “Shape classes registration and retrieval based on shape parts matching,” Journal of Computational Information Systems 9 (4) 1493 - 1499    DOI : 10.2298/CSIS120219060W
Molina L. C. , Belanche L. , Nebot À. “Feature selection algorithms: a survey and experimental evaluation,” IEEE International Conf. on Data Mining 2002 306 - 313