Advanced
Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk
Spectral Clustering with Sparse Graph Construction Based on Markov Random Walk
KSII Transactions on Internet and Information Systems (TIIS). 2015. Jul, 9(7): 2568-2584
Copyright © 2015, Korean Society For Internet Information
  • Received : January 04, 2015
  • Accepted : June 09, 2015
  • Published : July 31, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Jiangzhong Cao
Faculty of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China.
Pei Chen
School of Information Science and Technology, Sun Yat-sen University, Guangzhou, 510006, China.
Bingo Wing-Kuen Ling
Faculty of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China.
Zhijing Yang
Faculty of Information Engineering, Guangdong University of Technology, Guangzhou, 510006, China.
Qingyun Dai
Guangdong Polytechnic Normal University, Guangzhou, 510006, China.

Abstract
Spectral clustering has become one of the most popular clustering approaches in recent years. Similarity graph constructed on the data is one of the key factors that influence the performance of spectral clustering. However, the similarity graphs constructed by existing methods usually contain some unreliable edges. To construct reliable similarity graph for spectral clustering, an efficient method based on Markov random walk (MRW) is proposed in this paper. In the proposed method, theMRW model is defined on the raw k -NN graph and the neighbors of each sample are determined by the probability of the MRW. Since the high order transition probabilities carry complex relationships among data, the neighbors in the graph determined by our proposed method are more reliable than those of the existing methods. Experiments are performed on the synthetic and real-world datasets for performance evaluation and comparison. The results show that the graph obtained by our proposed method reflects the structure of the data better than those of the state-of-the-art methods and can effectively improve the performance of spectral clustering.
Keywords
1. Introduction
C lustering, also called cluster analysis, is widely used in many research fields, including multimedia retrieval, pattern recognition and data mining [1 - 3] . Many clustering algorithms have been developed over the past decades. Among these algorithms, spectral clustering, as a promising method, has recently attracted considerable attention [4 - 7] . Spectral clustering reveals the cluster structures of data using eigenvectors of the Laplacian graph and can stably detect nonconvex patterns and nonlinearly separable clusters [5 , 6] . It is considered as one of the most promising clustering techniques because of its superior performance compared to traditional clustering algorithms when utilized on certain challenging datasets [8] .
Spectral clustering is a clustering method based on graph theory. Given a data set V = { v 1 , v 2 ⋯, vn }⊂ l , spectral clustering first constructs a similarity graph with the data as vertices and the similarity among the data as the weight of the edges. Then, the eigenvectors of a Laplacianmatrix corresponding to the similarity graph is computed. Finally, clustering result is obtained by running a traditional clustering method (e.g., k-means) on the new data mapped by the eigenvectors. The objective of spectral clustering is to obtain new data that can be clustered more easily in the eigenspace constructed by the Laplacian matrix. Since the Laplacian matrix is computed in accordance with the similarity graph, the construction of the similarity graph plays a key role in spectral clustering. The most common similarity graph is a full graph in which each vertex is connected to all others. However, since the pairwise similarity between two faraway data points is less reliable than that between two neighbors, the full graph usually contains some unreliable edges that degrade the performance of spectral clustering. It has been shown that the sparse similarity graph can help to improve the performance of spectral clustering [5 , 9] . Determining the neighbors is the core for constructing the sparse graph. Several methods for constructing sparse graphs currently have been proposed, with two most classical methods being k -nearest neighbor ( k -NN) and ε -neighborhood [4] . In a k -NN graph a vertex is only connected to its k nearest neighbors, whereas points whose pairwise distances (similarity) are smaller (larger) than ε are connected in ε -neighborhood graph. Classical sparse graphs can effectively reduce the effect caused by unreliable pairwise similarities from faraway vertices. However, in many cases, the pairwise similarities are not always accurate, especially for data with an elongated structure. Hence, classical sparse graphs often contain a number of unreliable relationships (an example is shown in Fig. 1 (b)). A method based on the dominant set was subsequently proposed to overcome this difficulty [10] . In its graph, each vertex connects the vertices in the dominant set of its k -NN. For considering the affinities among the neighbors, this method is more robust to errors and outliers based on the pairwise similarities. However, the dominant neighborhood is inclined to make the local density samples condensing to an isolated clique from the non-uniform samples, as shown in Fig. 2 (c). In addition, several methods for constructing graphs have been proposed for other applications. Wang et al. [11] proposed a graph structure method named k-regular nearest neighbor graph and applied it to the manifold-ranking based retrieval framework. Li proposed the k-edge connected neighborhood graph, and presented two constructing methods named k-MST [12] and Min-k-ST [13 , 14] . In [15] , a method named b-matching was proposed to guarantee the number of edges of each vertex to be equal. Cheng et al. [16] proposed a method in which graphs are constructed using datum-wise sparse reconstructions of samples via l 1 -norm minimization. The methods outlined above are designed for different tasks, but, to the best of our knowledge there is no evidence to show which graph is the best for spectral clustering. Consequently, a good graph construction method is still needed.
PPT Slide
Lager Image
The 30 nearest neighbors of the point s determined by the transition probability at different values of t. (a) Original data. (b) t=1. (c) t=2. (d) t=4. (e) t=6. (f) t=100.
PPT Slide
Lager Image
The sparse graphs on the two-half-moon data constructed by different methods. (a) The C-k-NN method [4]. (b) The R-k-NN method [11]. (c) The DN-k-NN method [10]. (d) Our proposed method.
Markov random walk (MRW) can be useful for this purpose because its model has a close relation with graphs. Given a graph and a starting point, we can randomly select one of its neighbors and move to this neighbor; then, we can select this neighbor as the next starting point and repeat the above process. In this way, the sequence of points selected is a random walk on the graph. In fact, Markov chains can be viewed as random walks on a weighted graph [26] . Recently, the MRW model has been explored as a similarity measure and applied in many fields [17 , 18 , 21 , 32 , 33] .White and Smyth [27] explored average first-passage time as a similarity measure between vertices in the graph. Kondor and Lafferty [28] defined a graph regularization model that can be used to compute similarities between nodes. Newman [29] suggested a random-walk model to compute a “betweenness centrality” of a given node in a graph. Faloutsos et al. [30] extracted the “most relevant” connected subgraph relating two nodes of interest, based on electrical flows. Also, Palmer and Faloutsos [31] defined a similarity function based on random walks and electrical networks, and showed that this quantity provides a reasonably good measure for clustering and classifying categorical attributes. In [32] , several other quantities, such as the average commute time and the pseudoinverse of the Laplacian matrix of the graph, were investigated to provide similarities between any pair of nodes. In addition, the transition probabilities of the MRW were used to define a metric among the points [17 , 18] . In particular, the points are said to be more close if they have more similar probability distributions over all the points. The transition matrix was used as a smoothing kernel to improve the similarity metric [19] . Random-walk models on a graph have been proved effective in exploring the global relationships among the vertices.
However, these paths with many steps are not helpful to determine neighbors. As shown in Fig. 1 (f), the neighbors determined by paths with 100 steps are not reliable. In view of this fact, this paper proposes to determine the neighbors of each vertex using only the paths in its neighborhood. In our proposed method, the probability of MRW after performing suitable steps is utilized to determine the neighbors in the sparse graph. We assume that vi is closer to vj if the t -step probability of the MRW from v i to vj is higher. Also, the t -step probabilities of the MRW are adopted to determine the k -NN in this work. Due to the robustness of the transition probabilities of the MRW, our proposed method can not only effectively remove the unreliable neighbors, but also avoid, to an extent, the condensing degradation. It should be noted that our proposed method is different from the method presented in [20] that uses the MRW to improve spectral clustering. In our method, the probability of the MRW after performing suitable steps is utilized to determine the neighbors in sparse graphs, whereas in [20] the stable state of the MRW is used as an affinity matrix. As shown in Fig. 1 , the probability of the MRW carries different information about neighbors after different steps. The probability after performing suitable steps can reflect the good relations of the neighborhood and the stable state carries a small amount of information about neighbors. There is also an inherent difference between our proposed method and the method presented in [21] . In [21] , the MRW is utilized to intuitively interpret spectral clustering. By contrast, in our method, the MRW is used to measure the relationships among data.
The rest of this paper is organized as follows. The backgrounds of MRW and spectral clustering are reviewed in Section 2. In Section 3, we propose the construction of k -NN graph based on MRW. Experiment results on some datasets are presented in Section 4. Finally, some concluding remarks are given in Section 5.
2. Spectral Clustering and Markov Random Walk
- 2.1 Spectral clustering
Spectral clustering is a class of methods that is based on eigen-decomposition of graph affinity matrices. Given a set of data points S = { x 1 ,⋯, xn }, a weighted graph G = ( V , E ) is first constructed in which every vertex corresponds to a point in S and each edge is weighted by the similarity between the connected points. The Laplacian graph L [22] is then derived from the adjacency matrix of G , and the eigenvectors of L are computed. Finally, the traditional K -means method is applied to the low-dimensional representations of the original data.Many spectral clustering algorithms based on the above procedures have been proposed [3 - 7] .
In this paper, the Ng-Jordan-Weiss (NJW) algorithm [6] is briefly reviewed and summarized in Algorithm 1 for completeness. It is noted that the proposed graph can also be used in any other spectral clustering algorithm.
NJW spectral clustering algorithm
PPT Slide
Lager Image
NJW spectral clustering algorithm
- 2.2 Markov Random Walk
Markov chain is a discrete random process with memorylessness, in which the next state depends only on the current state and not on the sequence of events that preceded it. It has many applications, such as statistical models of real-world processes. Markov random walk can be seen as a Markov chain.More details about Markov random walk can be found in [23] .
Given a sequence of random variables X = { X ( n ), n = 0,1,2,⋯} and its state space E = {1,⋯, N } for n 1 , n 2 ,⋯ nm (0 ≤ n 1 , < n 2 ⋯< nm ), i 1 , i 2 ,⋯ im , j E and t , if the probabilities follow:
PPT Slide
Lager Image
then, the sequence X is called a Markov chain. The above equation formulates the Markov property: memorylessness. In (2), P { X ( n + t ) = j | X ( n )} = i }, t ≥ 1 is called the transition probabilities of the Markov chain going from the state i at time n to the state j at time n + t , which is usually denoted by pij ( n , n + t ). A Markov chain is said to be time-homogeneous if the value of pij ( n , n + t ) is independent of the start time n . So, in a time-homogeneous Markov chain, the transition probability between two states depends only on the time interval, t , which is also called the transition step. Markov chain is assumed to be time-homogeneous in this paper. The probability of Markov chain from state i to state j after t steps, pij ( t ), is denoted by
PPT Slide
Lager Image
. If the number of steps is one, then
PPT Slide
Lager Image
is abbreviated to pij . The transition matrix of a Markov chain can be constructed as follows:
PPT Slide
Lager Image
When the number of steps is larger than one, the transition matrix after t steps is called a high-order transition matrix, which is represented as follows:
PPT Slide
Lager Image
The transition matrix after t steps can be computed using the t -th power of the one-step transition matrix as follows [23] :
PPT Slide
Lager Image
From the equation (6), the t -step transition matrix P (t) can be easily computed once the one-step transition matrix is obtained. The elements of P (t) , denoted as
PPT Slide
Lager Image
, represent the transition probability of the MRW going from state i to j after t steps. The probability represents different meanings in different applications. In this paper, it is interpreted as the extent of connection among the data, as discussed in Section 3.
3. MRW-K-NN GRAPH
The transition probabilities of the MRW reflect the complex relations among vertices. If there are more paths with high weight between two vertices, the transition probability between them will be larger. Thus, the relationships among states carried by high-order transition probabilities are more reliable than those in one-step transition matrix. This observation motivates us to employ it to determine the neighbors of the sparse graph in spectral clustering. It is assumed that vi is closer to vj if the t -step probability of the MRW from v i to vj is higher. The proposed method comprises two main steps. First, MRW is defined on the classical k -NN graph, which usually contains many unreliable neighbors. Then, the new sparse graph is constructed based on the t -step transition probabilities of MRW.
The robustness of the transition probabilities of MRW enables the proposed method to not only effectively remove unreliable neighbors, but also, to an extent, avoid the condensing degradation. The constructed sparse graph is called the MRW- k -NN graph. Details of the proposed method are given as follows.
- 3.1 Definition of MRW on the commonk-NN graph
Given a data set V = { v 1 , v 2 ⋯, vn } and a similarity measure f : V × V → ℜ, the common k -NN graph, denoted as the C- k -NNG, is constructed as follows. Each data point corresponds to a vertex in the graph. For each data vi , it only connects to the k nearest points according to the similarity measure f . The edge between two connected vertices is weighted by their similarity. The affinity matrix corresponding to the C- k -NNG is defined as:
PPT Slide
Lager Image
where sij is the similarity between vi and vj (i.e., f ( vi , vj )). Here, Nk ( vi ) is the set of k nearest neighbors of vi .
Next, define a MRW on the C- k -NNG [4] , assign the states of the random walk to its vertices, and define the transition probabilities between the states as functions of the weighted edges. The one-step transition probability matrix P of the MRW can be obtained directly from the affinity matrix W as follows:
PPT Slide
Lager Image
where, P ij represents the one-step transition probability of the MRW going from vi to vj , and
PPT Slide
Lager Image
Given the one-step transition matrix, the t -step probability matrix P (t) can be easily obtained by computing the power of P using equation (6). The element of P (t) , denoted as
PPT Slide
Lager Image
, represents the probability of the MRW going from vi to vj in t steps.
The one-step transition matrix of the MRW only corresponds to the first order neighborhood structure of the graph. On the other hand, the t -step transition probability of the MRW corresponds to the high-order neighborhood structure of the graph. Hence,
PPT Slide
Lager Image
can effectively reflect the connection between two vertices, vi and vj .
PPT Slide
Lager Image
is larger if there exist more paths with larger weighted edges between vi and vj , and vice versa.
- 3.2 Construction of the MRW-k-NN graph
The t -step transition probabilities are employed in this paper because they can reflect more reliable relationships among data. Vertex vi is only connected to the k neighbors with the largest t -step transition probabilities going from vi . The edges are weighted by W ij . The corresponding affinity matrix is denoted as W (t) and defined as follows:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
denotes the set containing
PPT Slide
Lager Image
for j = 1,2,⋯, n and
PPT Slide
Lager Image
denotes the k -th largest element in
PPT Slide
Lager Image
. It is worth noting that the edges in the MRW- k -NN graph are still weighted by W , and P (t) is only used to determine the connected vertices. This is because the value of
PPT Slide
Lager Image
varies with respect to the value of t and the degree of vertex vi . Fig. 1 illustrates the advantages of using the MRW- k -NN in the construction of the graph. In the figure, the solid circle denotes the point s and the red circles denote its neighbors. The 30 nearest neighbors of point s determined by the transition probability with different values of t are shown in Figs. 1 (b)–(f). In Fig. 1 (b), the neighbors of the point s are determined by P , which is the same as the result obtained by the common k -NN method. It can be seen that its neighbors contain some points from other manifolds. This shows that some of the neighbors determined by the common k -NN method are unreliable. However, the neighbors become increasingly more reliable as the value of t increases. It is worth noting that the neighbors of the point s are unreliable when t becomes very large, as illustrated in Fig. 1 (f). This is because when t becomes very large, the probabilities converge to a unique stationary distribution. Hence, they have little distinguishable information. As a result, the value of t plays an important role in the MRW based methods. In [17] , it is suggested that the value of t should be chosen according to the task. In this paper, a heuristic method is proposed for spectral clustering, and the value of t is searched from 1 to 20. The value of t minimizing the normalized cut [5] of the resultant clusters is chosen.
- 3.3 Spectral clustering with the MRW-k-NN graph
The MRW- k -NN graph can be constructed by following the above steps. It is easy to apply the proposed graph to spectral clustering by substituting the affinity matrix in Algorithm 1 with the best W (t) that minimizes the normalized cut. Spectral clustering with the MRW- k -NN graph is described in Algorithm 2 .
Spectral clustering with the MRW-k-NN graph
PPT Slide
Lager Image
Spectral clustering with the MRW-k-NN graph
- 3.4 Complexity analysis of spectral clustering with the MRW-k-NN graph
The difference between the new spectral clustering and the NJW algorithm ( Algorithm 1 ) lies in their sparse graphs. In the proposed method, there exists a process to select the suitable order for the MRW. As described in Algorithm 2 , we search for the best order by choosing the minimum cut among the N partitions. Suppose that we divide n data points into k clusters, the computing complexity of the NJW algorithm is O ( n 3 ) and the computing complexity of Algorithm 2 is O ( Nn 3 ). In Section 4, we will show the time-consuming evaluation for the proposed method. In addition, the space complexity of Algorithm 2 is O ( n 2 ), which is the same as that of the NJW algorithm.
4. Experimental Results and Analysis
To illustrate the effectiveness of our proposed method, we conduct the experiments on several synthetic and real datasets. We compared three existing methods for constructing the k -NN graphs: the classical method [4] , the method based on dominant set [10] and k -regular nearest neighbor graph [11] . They are denoted as the C- k -NN, DN- k -NN and R- k -NN, respectively. Here, the similarity measure function is defined as:
PPT Slide
Lager Image
where ║•║ denotes the Euclidean distance and δ is a scale parameter that controls the rate of change of the similarity with the distance among the data points. There are two parameters to be set: the value of k in (7) and in (10), as well as the value of δ in (11), which will be discussed in the following subsections.
- 4.1 Synthetic dataset
The synthetic dataset contains the two-half-moon data and the circle-Gaussian data, as shown in Figs. 1 (a) and 3 (a), respectively. The two-half-moon data consist of two clusters that are distributed with nonlinear elongated structures. There are some noises between the two clusters. The circle-Gaussian data comprise three clusters: two clusters in the middle with Gaussian distribution, and one outer cluster with a circular shape. It is worth noting that the circle-Gaussian data are not well separated. The points around the Gaussian clusters tend to connect all these three clusters.
In order to have a fair comparison, the value of k is chosen as 30 in all methods. The value of δ is chosen as the mean distance between the vertices to their tenth neighbor defined, according to the Euclidean distance. The sparse graphs of these two synthetic data constructed by four different methods are shown in Fig. 2 and Fig. 3 , respectively. From Fig. 2 and Fig. 3 , it is clear that the graphs of these two datasets have common characteristics. There are some erroneous neighbors on the C- k -NN graph. The k nearest neighbors of some points are in other clusters. This violates the principle requiring that the similarities among the points in the same manifold should be larger than those in different manifolds. Compared with the C- k -NN graph, there are fewer unreliable neighbors in R- k -NN graphs. However, the performances of the R- k -NN graphs are worse than those of the DN- k -NN and MRW- k -NN graphs. The DN- k -NN graph determines the nearest neighbors well. However, the vertices in the same manifold are separated into several parts. This causes the DN- k -NN graphs to be unsuitable for applications such as spectral clustering. This deficiency can be illustrated as follows. For the MRW- k -NN graph, the edges connecting various vertices distributed in different manifolds have been significantly reduced. Also, the vertices in the same manifold are not disconnected. Hence, the graphs constructed by our proposed method outperform those constructed by the other methods.
PPT Slide
Lager Image
The sparse graphs on the circle-Gaussian data constructed by different methods. (a) Circle-gauss data. (b) The C-k-NN method [4]. (c) The R-k-NN method [11]. (d) The DN-k-NN method [10]. (e) Our proposed method.
- 4.2 MNIST dataset
We also conducted experiments on the MNIST dataset ( http://yann.lecun.com/exdb/mnist/ ). This dataset consists of 10 digits with a total of 70,000 examples. Every example is a 28×28 gray level image. These 784 pixels are used as a feature vector in the simulations, and principal component analysis is applied for pre-processing before computing the Euclidean distances. In the experiments, each subset is constructed by selecting the first 200 examples fromeach digit. To illustrate the significance of these sparse graphs, the full graph (FULL) is also included for the comparisons.
The value of δ is chosen as the local scale parameter defined in [24] and k is chosen as 10. The normalized mutual information (NMI) is employed as a performance index of the clustering, which is widely employed to evaluate clustering results [9 , 25] . Suppose X and Y denote two random variables, then the NMI is defined as:
PPT Slide
Lager Image
where I ( X , Y ) is the mutual information between X and Y , and H ( X ) is the entropy of X . When evaluating the performance of clustering algorithms, the NMI is computed by regarding the clustering result and true class label as two random variables. Specifically, given a dataset with n points, the true classes and clustered classes are represented by
PPT Slide
Lager Image
, respectively, where k and k ’ denote the number of the classes and the number of the true classes, respectively. The NMI is computed as:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
denote the number of points in
PPT Slide
Lager Image
, and nij denotes the number of points belonging to
PPT Slide
Lager Image
. The range of NMI is from0 to 1 and larger NMI implies better clustering results.
First, we constructed several subsets containing the digital groups {1, 2, 3}, {0, 5, 6}, {1, 4, 7, 9} and {0, …, 9}. The results obtained for these subsets are shown in Fig. 4 . It can be seen that the sparse graphs obtained by our proposed method achieved better NMIs for all the subsets than those obtained by other methods. Moreover, these sparse graphs of the existing methods achieved better results compared to the full graph (FULL), which confirms that sparse graphs are important.
PPT Slide
Lager Image
The results on several subsets of the MNIST dataset.
We also conducted simulations on the subsets comprising one pair of digits—a total of 45 subsets (pairs). Fig. 5 shows these results. In most cases, our proposed method outperformed other methods. In particular, the spectral clustering with the graphs constructed via the C- k -NN method, the DN- k -NN method, the R- k -NN method and our proposed method achieved the best results in 1, 11, 6, 6, 21 cases, respectively. The mean and the standard deviation of the values of NMIs of different methods on these 45 pairs are shown in Table 1 . It can be seen from Table 1 that our proposed method obtained the highest mean and the lowest standard deviation of the values of NMIs. The larger mean value refers to the better spectral clustering performance, while the smaller standard deviation refers to the more robustness of the method. Thus, from Table 1 it is clear that our proposed method comprehensively outperforms the other three methods.
PPT Slide
Lager Image
Results for 45 subsets of the MNIST dataset
Means and standard deviations of the NMIS of the spectral clustering for the different graphs performed on the 45 tests.
PPT Slide
Lager Image
Means and standard deviations of the NMIS of the spectral clustering for the different graphs performed on the 45 tests.
To further illustrate the effectiveness of the proposed method, we also show the sparse graphs on the real-world dataset. It is difficult to draw the actual graph because the samples are in high-dimension space; therefore, we applied principal component analysis and obtained graphs in the two principal dimensions, as shown in Fig. 6 . In the figure, the vertices with the same color denote the same kind of digits, of which there are two kinds in the graphs. From the red circles indicating the notable regions, it can be seen that: 1) there exist many edges connected vertices from different digits in C- k -NN graph; 2) the other three methods, namely R- k -NN, DN- k -NN and the proposed method, can reduce the wrong edges to an extent, which can be seen in the notable regions; 3) compared with the graphs by R- k -NN and DN- k -NN, especially in the notable regions, more right edges and less error are found in the graph based on our proposed method, In general, the graph based on our proposed method is more suitable for spectral clustering.
PPT Slide
Lager Image
Sparse graphs constructed on the pairwise digit subsets by various methods. Graphs (a)-(d) are constructed on the subset {1, 2} by C-k-NN method [4], R-k-NN method [11], DN-k-NN method [10] and our proposed method respectively. Graphs (e)-(h) are constructed on the subset {1, 9} by C-k-NN method [4], R-k-NN method [11], DN-k-NN method [10] and our proposed method respectively. The red circles in (a) and (e) are the notable regions.
Additionally, we also examined the time consumption of these methods. Fig. 7 shows the computational time required for the simulations of these five methods. As expected, the time taken for the spectral clustering with MRW-k-NN is always approximately N (maximum order) times that of the spectral clustering with C-k-NN, the same as those with FULL and DN-k-NN. However, our proposed method takes less time than the R-k-NN method. The main reason more time was required for our proposed method is that the spectral clustering algorithm has to run repeatedly to look for the best order of the MRW.
PPT Slide
Lager Image
Comparison of running-time for five algorithms on the MNIST dataset.
5. CONCLUSIONS
To improve the performance of spectral clustering, the MRW based method for constructing reliable k -NN graphs is proposed in this paper. The high-order transition probability of MRW is adopted to determine the neighbors in the proposed method. The proposed method consists of two steps. In the first step, anMRW model on the common k -NN graph constructed by the data is introduced. Then, in the second step, the probability of the MRW is computed and utilized to determine the neighbors. Because of the advantage that the high-order transition probabilities characterize the complex relationships among data, the neighbors selected in the proposed graph are more reliable than those of existing methods. The results of experiments show that the graph constructed by our proposed method performs better and can efficiently improve the performance of spectral clustering.
In future work, we plan to explore a more efficient algorithm to determine the order of the MRW and further verify the effectiveness of the proposed graph in other applications.
BIO
Jiangzhong Cao received his Ph.D. degree in communication and information system from School of Information Science and Technology, Sun Yat-sen University, China, in 2013. He is currently a Lecturer in the School of Information Engineering, Guangdong University of Technology, Guangzhou, China. His research interests include computer vision and pattern recognition.
Pei Chen received two Ph.D. degrees in wavelets and computer vision from Shanghai Jiaotong University, Shanghai, China, andMonash University,Melbourne, Australia, in 2001 and 2004, respectively. He is currently a Professor in the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China. He has been a Post-Doctoral Researcher with Monash University, a Senior Research Engineer with Motorola Labs, Shanghai, and a Research Professor with SIAT/CAS, Shenzhen, China. His current research interests include topics in computer vision and machine learning.
Wing-KuenLing received the B.Eng.(Hons) and M.Phil. degrees from the Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, in1997 and 2000, respectively, and the Ph.D. degree in the Department of Electronic and Information Engineering of the Hong degree in the Department of Electronic and Information Engineering of the HongKong Polytechnic University in 2003. In 2004, he joined the King’s College London as a Lecturer. In 2010, he joined the University of Lincon as a Principal Lecturer and promoted to a Reader in 2011. In 2012, he joined Guangdong University of Technology as a Full Professor. He is a Fellow of IET, a senior member of IEEE, and a China National Young Thousand-People-Plan Distinguished Professor. He currently serves in the technical committees of the Non-linear Circuits and Systems Group, the digital signal processing group and the power electronics and systems group of the IEEE Circuits and Systems Community. He was awarded the best reviewer prizes from the IEEE Instrumentation and Measurement Society in 2008 and 2012. He has also served as the guest editor-in-chief of several special issues of highly rated international journals, such as the Circuits, Systems and Signal Processing, the American Journal of Engineering and Applied Sciences, the International Journal of Digital Content Technology and Its Applications, and the Mediterranean Journal of Electronics and Communications. He is currently an associate editor of several international journals, including International Journal of Bifurcation and Chaos, the Circuits, Systems and Signal Processing, Journal of the Franklin Institute, American Journal of Engineering and Applied Sciences, International Journal of Engineering and Emerging Technologies, International Journal of Applied Mathematics and Modeling, and Open Journal Advanced Engineering Techniques. He has published an undergraduate textbook, are search monograph, four book chapters, a book review published in an IEEE journal, more than 75 papers in international leading journals and more than 55 papers in highly rated international conference. He has held 7 visiting positions and delivered more than 20 seminars in the fields of image processing, time frequency analysis, optimization theory, symbolic dynamics, as well as fuzzy and impulsive control theory.
Zhijing Yang received the B.S. and Ph.D. degrees from the Faculty of Mathematics and Computing Science, Sun Yat-Sen University, Guangzhou, China, in 2003 and 2008, respectively. Then, he became a Lecturer in the same university. He was a Visiting Research Scholar in the School of Computing, Informatics and Media, University of Bradford, U.K., between June 2009 and December 2009, and a Research Fellow in the School of Engineering, University of Lincoln, between January 2011 and January 2013. He is currently an Associate Professor in the Faculty of Information Engineering, Guangdong University of Technology, China. He has been a project leader for three projects on signal processing from National Natural Science Foundation of China. His current research interests include time-frequency analysis, signal processing, machine learning, and pattern recognition.
Qingyun Dai received the Ph.D. degree in communication and information systems from South China University of Technology, Guangzhou, China. She is currently a Professor in the School of Information Engineering, Guangdong University of Technology, Guangzhou, China. She has also been a Post-Doctoral Researcher at Sun Yat-sen University, Guangzhou, China. Her current research interests include Image Processing, RFID and Internet of Things.
References
Jain K. 2010 “Data clustering: 50 years beyond K-means,” Pattern Recognition Letters 31 651 - 666    DOI : 10.1016/j.patrec.2009.09.011
Elijorde F. , Kim S. , Lee J. 2014 “A Wind Turbine Fault Detection Approach Based on Cluster Analysis and Frequent Pattern Mining,” KSII Transactions on Internet and Information Systems (TIIS) 8 (2) 664 - 677    DOI : 10.3837/tiis.2014.02.020
Wang L. , Dong M. 2012 “Multi-level Low-rank Approximation-based Spectral Clustering for image segmentation,” Pattern Recognition Lett 33 2206 - 2215    DOI : 10.1016/j.patrec.2012.07.024
Luxburg U 2007 “A tutorial on spectral clustering. Statistics and Computing,” 17 395 - 416
Shi J. , Malik J. 2000 “Normalized cuts and image segmentation,” IEEE Trans. Pattern Anal. Machine Intell. 22 888 - 905    DOI : 10.1109/34.868688
Ng A. , Jordan M. , Weiss Y. “On spectral clustering: analysis and an algorithm,” MIT Press in Proc of. Advances in Neural Information Processing Systems (NIPS) 2002 http://citeseerx.ist. psu.edu/viewdoc/summary?doi=10.1.1.19.8100 vol. 15
Zhou J. , Gao S. , Jin Z. 2012 “A New Connected Coherence Tree Algorithm For Image Segmentation,” KSII Transactions On Internet And Information Systems 6 (4) 1118 - 1202
Verma D. , Meila M. 2003 “A comparison of spectral clustering algorithms,” University of Washington Department of Computer Science Technical Report 03-05-01 http://citeseerx. ist.psu.edu/viewdoc/summary?doi=10.1.1.57.6424
Cao J. Z. , Chen P. , Dai Q. Y. , Ling Wing-Kuen 2014 “Local information-based fast approximate spectral clustering,” Pattern Recognition Letters 38 63 - 69    DOI : 10.1016/j.patrec.2013.11.005
Yang X. , Latecki L. J. “Affinity learning on a tensor product graph with applications to shape and image retrieval,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR) 2011 2369 - 2376
Wang B , Pan F , Hu K M , Paul J C 2012 “Manifold-ranking based retrieval using k-regular nearest neighbor graph,” Pattern Recognition 45 1569 - 1577    DOI : 10.1016/j.patcog.2011.09.006
Li Y “K-edge connected neighborhood graph for geodesic distance estimation and nonlinear data projection,” in Proc. of The International Conference on Pattern Recognition 2004
Li Y 2005 “Building k-edge-connected neighborhood graph for distance-based data projection,” Pattern Recognition Letters 26 (13) 2015 - 2021    DOI : 10.1016/j.patrec.2005.02.010
Li Y 2005 “Building k edge-disjoint spanning trees of minimum total length for isometric data embedding,” IEEE Trans. Pattern Anal. Mach. Intell. 27 (10) 1680 - 1683    DOI : 10.1109/TPAMI.2005.192
Jebara T , Wang J , Chang S F “Graph construction and b-matching for semi-supervised learning,” in Proc. of International Conferenceon Machine Learning 2009
Cheng B , Yang J , Yan S , Fu Y. , Huang T S. 2010 Learning with l1-graph for image analysis. IEEE Transactionson Image Processing 19 858 - 866    DOI : 10.1109/TIP.2009.2038764
Szummer M. , Jaakkola T. “Partially labeled classification with Markov random walks,” Advances in Neural Information Processing Systems (NIPS) 2001 http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.25.1955 945 - 952
Lafon S. , Lee A. B. 2006 “Diffusion maps and coarse-graining: a unified framework for dimensionality reduction,”graph partitioning, and data set parameterization, IEEE Trans. Pattern Anal. Mach. Intell. 28 (9) 1393 - 1403    DOI : 10.1109/TPAMI.2006.184
Jiang J. , Wang B. , Tu Z. “Unsupervised metric learning by self-smoothing operator,” in Proc. IEEE Conf. on Computer Vision (ICCV) 2011 794 - 801
ZHANG Xianchao , YOU Quanzeng 2012 “An improved spectral clustering algorithm based on random walk [J],” Frontiers of Computer Science in China 3 268 - 278
Meila M , Shi J “A Random Walks View of spectral segmentation,” in Proc. of 8th Inernational Workshop on Artificial intelligence and Statistics 2001 http://citeseer.ist.psu.edu/viewdoc/ summary?doi=10.1.1.33.1501
Chung F. R. K “Spectral Graph Theory, Regional Conference Series in Mathematics,” AMS. 92 1997
Bertsekas P. , Tsitsiklis N. “Introduction to Probability,” AMS 1997 http://vfu.bg/en/e-Learning/ Math--Bertsekas_Tsitsiklis_Introduction_to_probability.pdf http://vfu.bg/en/e-Learning/ Math--Bertsekas_Tsitsiklis_Introduction_to_probability.pdf
Zelnik-Manor L. , Perona P. “Self-tuning spectral clustering,” Advances in Neural Information Processing Systems (NIPS) 2004 http://www.vision.caltech.edu/lihi/Demos/ SelfTuningClustering.html 1601 - 1608
Shang F. , Jiao L. C. , Shi J. , Gong M. , Shang R. H 2011 “Fast density-weighted low-rank approximation spectral clustering,” Data Min Knowl Disc http://link.springer.com/article/10.1007/s10618-010-0207-5 23 345 - 378    DOI : 10.1007/s10618-010-0207-5
Lovasz L 1993 “Random Walks on Graphs: A Survey,” Combinatorics www.cs.elte.hu/~lovasz/erdos.pdf 80 (2) 1 - 46
White S. , Smyth P. “Algorithms for Estimating Relative Importance in Networks,” in Proc. of Ninth ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining 2003 266 - 275
Kondor R.I. , Lafferty J. “Diffusion Kernels on Graphs and Other Discrete Structures,” in Proc. of 19th Int’l Conf. Machine Learning 2002 315 - 322
Newman M. 2005 “AMeasure of Betweenness Centrality Based on Random Walks,” Social Networks 27 (1) 39 - 54    DOI : 10.1016/j.socnet.2004.11.009
Faloutsos C. , McCurley K. , Tomkins A. “Fast Discovery of Connection Subgraphs,” in Proc. of 10th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining 2004 118 - 127
Palmer C. , Faloutsos C. “Electricity Based External Similarity of Categorical Attributes,” in Proc. of Seventh Pacific-Asia Conf. Knowledge Discovery and Data Mining (PAKDD ’03) 2003 486 - 500
Fouss F. , Pirotte A. , Renders J. , Saerens M. 2007 “Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation,” IEEE Transactions on Knowledge and Data Engineering 19 (3)    DOI : 10.1109/TKDE.2007.46
Lambiotte R. , Delvenne J. , Barahona M. 2014 “Random Walks, Markov Processes and the Multiscale Modular Organization of Complex Networks,” IEEE Transactions on Network Science and Engineering 1 (2) 76 - 90    DOI : 10.1109/TNSE.2015.2391998