Most hyperellipsoidal clustering (HEC) approaches use the Mahalanobis distance as a distance metric. It has been proven that HEC, under this condition, cannot be realized since the cost function of partitional clustering is a constant. We demonstrate that HEC with a modified Gaussian kernel metric can be interpreted as a problem of finding condensed ellipsoidal clusters (with respect to the volumes and densities of the clusters) and propose a practical HEC algorithm that is able to efficiently handle clusters that are ellipsoidal in shape and that are of different size and density. We then try to refine the HEC algorithm by utilizing ellipsoids defined on the kernel feature space to deal with more complexshaped clusters. The proposed methods lead to a significant improvement in the clustering results over Kmeans algorithm, fuzzy Cmeans algorithm, GMMEM algorithm, and HEC algorithm based on minimumvolume ellipsoids using Mahalanobis distance.
I. Introduction
With the growing popularity of clustering techniques in machine learning, pattern recognition, computer vision, and data mining a number of clustering algorithms have been extensively studied over several decades. An optimal clustering algorithm that can deal with all cases does not currently exist. Despite numerous research efforts, data clustering remains a difficult and challenging problem
[1]
–
[2]
.
The ideal objective function for clustering, aims to maximize the similarity of data points within clusters and to minimize the similarity of data points between clusters. It is difficult to find a feasible object function of clustering that meets the ideal criteria
[1]
–
[2]
. The most commonly used criterion for clustering is to minimize the sum of the Euclidean distances of data points within a cluster, for example, as in the case of Kmeans clustering. The clustering algorithms based on Euclidean metric have a tendency to divide the data points into clusters of equal size, equal density, and spherical shapes. This tendency has the drawback of splitting large and elongated clusters under certain circumstances, and the cluster allocation may change significantly with linear transformations of the data space. Realworld data often exhibits a mixture of Gaussian distribution (that is, ellipsoidal or complex shapes).
To resolve the aforementioned problems, numerous hyperellipsoidal clustering (HEC) algorithms have been proposed
[3]
–
[11]
. These approaches generally employ the Mahalanobis distance as a distance metric to create ellipsoidal clusters, however, there are several problems in using it. Firstly, it is difficult to directly compute the covariance matrix in the Mahalanobis distance because of the expensive computational complexity, and secondly, it may be singular when clusters include a small number of data points. To deal with these difficulties, HEC algorithms based on the modified Mahalanobis distance and with a pseudocovariance matrix have been proposed
[3]
–
[5]
. The time complexity of these methods is still expensive. On the other hand, HEC approaches that approximate the volume of clusters—that is, find minimumvolume ellipsoids (MVE) instead of the covariance matrix—have been presented to alleviate the problems in computing the covariance matrix
[8]
–
[11]
. Most of the related methods use the Mahalanobis distance as a distance metric. However, it has been proven that an ellipsoidal clustering cannot be obtained and that the cost function of partitional clustering cannot limit the size of the clusters and relations among the clusters if the Mahalanobis distance is directly applied to the clustering
[7]
. Furthermore, the HEC algorithms based on MVE work well on clusters of equal density, because these methods do not consider the density of each cluster.
The primary goal of this paper is to realize an ellipsoidal clustering and to provide practical implementations of an HEC algorithm. First, we recast a partitional clustering algorithm with the modified Gaussian kernel metric, as a problem of finding condensed clusters with respect to the volumes and densities of clusters. We then propose practical HEC algorithms based on the approximation of MVE with the modified Gaussian kernel metric, which are able to efficiently deal with clusters that are ellipsoidal in shape and that are of different size and density. We next attempt to enhance the capability of the HEC algorithms by mapping ellipsoids on the kernel feature space to handle nonlinear and elongated clusters. Experimental results show that the proposed methods lead to a significant improvement in the clustering results over Kmeans algorithm, fuzzy Cmeans algorithm, GMMEM algorithm, and MVEbased clustering algorithms with Mahalanobis distance.
The rest of this paper is organized as follows. In section II, we provide the problem definition and theoretical explanation. In section III, we then propose the HEC algorithm based on the approximation of MVE with the modified Gaussian kernel metric and its extended algorithm on the kernel feature map, which is able to find more complex cluster shapes. Experimental results and discussions are then provided in section IV. Finally, some concluding remarks are given in section V.
II. Problem Definition
Given a set of
n
data points in a
d
dimensional space,
x= { x i ∈ ℝ d } i=1 n ,
, the objective of a partitional clustering is then to determine an assignment of the partition matrix,
P
= {
P_{ik}

P_{ik}
∈ {0,1};
i
= 1, 2,...,
n
;
k
= 1, 2,...,
C
}, such that the cost function
E_{C}
(
P
) is minimized as
(1) P=arg(P)min E C (P),
where
C
is the number of clusters generally determined by the user. If data point
x_{i}
is assigned to the
k
th cluster
P_{ik}
= 1; otherwise
P_{ik}
= 0. For a unique assignment,
Σ k=1 C P ik =1
for
i
= 1,2,…,
n
should be satisfied. The cost function for the partitional clustering can be defined as
(2) E C (P)= ∑ i=1 n ∑ k=1 C P ik D( 𝒙 i , 𝒎 k ) ,
where
D
(
x_{i}
,
m_{k}
) is a distance metric between the input pattern
x_{i}
and the mean vector of
k
th cluster
m_{k}
.
To create ellipsoidal clusters, HEC algorithms generally adopt the Mahalanobis distance. However, the cost function of partitional clustering is constant under this condition
[7]
.
For realizing ellipsoidal clustering, we employ the modified Gaussian kernel as the distance metric:
(3) D( 𝒙 i , 𝒎 k )=α⋅ ( 𝒙 i − 𝒎 k ) T Q k −1 ( 𝒙 i − 𝒎 k ) + (1−α)⋅log det Q k ,
where
m_{k}
and
Q
_{k}
are the mean vector and covariance matrix of the
k
th cluster, respectively. The variable
α
∈ [0,1] controls the weight of the first and second terms of the modified Gaussian kernel. Note that the first term of (3) represents the Mahalanobis distance and that the second term is proportional to the volume of the
k
th ellipsoidal cluster, which is represented by the covariance matrix
Q
_{k}
. The clustering cost function with the modified Gaussian kernel can be written as
(4) E C (P)= ∑ i ∑ k P ik [α⋅ ( 𝒙 i − 𝒎 k ) T Q k −1 ( 𝒙 i − 𝒎 k ) + (1−α)⋅log det Q k ].
The necessary condition for the optimality of
E_{C}
(
P
),
∂ E C (P) ∂ 𝒎 k T =0,
the centers of the clusters allocated by minimizing the clustering cost function of (4) can be represented as
(5) 𝒎 k = Σ i P ik 𝒙 i Σ i P ik .
Lemma 1:
Given a set of
n
data points in a
d
dimensional space
𝒙= { 𝒙 i ∈ ℝ d } i=1 n ,
, where
m_{k}
indicates
C
means (for
k
= 1, 2,
,
C
, with
C
being the number of clusters) and its covariance matrix
Q k = 1 n−1 ∑ i=1 n ( 𝒙 i − 𝒎 k ) ( 𝒙 i − 𝒎 k ) T
is invertible, then
(6) ∑ i=1 n [ α⋅ ( 𝒙 i − 𝒎 k ) T Q k −1 ( 𝒙 i − 𝒎 k ) +(1−α)⋅log det Q k ] =α⋅d⋅(n−1)+n⋅(1−α)⋅log det Q k .
Proof:
By Theorem 1 in
[7]
and Theorem 1 in
[4]
, we have
∑ i=1 n α ⋅ ( 𝒙 i − 𝒎 k ) T Q k −1 ( 𝒙 i − 𝒎 k )= α⋅d⋅(n−1),
and
∑ i=1 n (1−α) ⋅log det Q k =n⋅(1−α)⋅log det Q k ,
∴ ∑ i=1 n [ α⋅ ( 𝒙 i − 𝒎 k ) T Q k −1 ( 𝒙 i − 𝒎 k )+(1−α)⋅log det Q k ]=α⋅d⋅(n−1)+ n⋅(1−α)⋅log det Q k .
■
Theorem 1:
If the modified Gaussian kernel (3) is used as the distance metric in the clustering cost function (2), then
(7) E C (P)≅ ∑ k=1 C n k log det Q k ,
where
n_{k}
= ∑
_{i}
P_{ik}
is the number of data points assigned to the
k
th cluster.
Proof:
From Lemma 1, we have
∑ i=1 n P ik D( 𝒙 i , 𝒎 k )=α⋅d⋅( n k −1)+ n k ⋅(1−α)⋅log det Qk .
The cost function can then be expressed as
E C (P)= ∑ i=1 n ∑ k=1 C P ik D( 𝒙 i , 𝒎 k ) = ∑ k=1 C α ⋅d⋅( n k −1)+ ∑ k=1 C n k ⋅(1−α)⋅logdet Qk =α⋅d⋅(n−C)+(1−α) ∑ k=1 C n k logdet Qk ,
where
α
,
d
,
n
, and
C
are all constants with respect to variable
k
and therefore
E C (P)≅ ∑ k=1 C n k log det Q k .
■
Remarks:
Note that the problem of minimizing the cost function (7) is equivalent to minimizing the weighted sum of the cluster volumes
E C (P)∝ ∑ k=1 C w k Vol(k).
The weight value of each cluster is defined as the number of data points that belong to it. This can be intuitively interpreted as allocating data points to one of the clusters, such that its compactness and density is maximized. In other words, the problem is one of finding condensed clusters with respect to the volumes and densities of clusters.
III. Ellipsoidal and ComplexShaped Clustering
In this section, we propose two HEC algorithms that attempt to minimize the weighted sum of the cluster volumes: i) KHEC— an iterative HEC algorithm using the modified Gaussian kernel and MVE approximation, which allocates the data points into a specified number of clusters of ellipsoidal shape and ii) EKHEC—an extended iterative HEC algorithm that improves the capability of the KHEC by utilizing ellipsoids defined on the kernel feature space to deal with more complexshaped clusters.
 1. KHEC Algorithm
The KHEC algorithm starts with an initial clustering into
C
clusters and iteratively finds the improved assignment of the partition matrix, such that the weighted sum of volumes of clusters is decreasing and until there is no further possible improvement. It is a modification of the Kmeans algorithm, which combines the modified Gaussian kernel and MVE approximation. Finding the optimal minimumvolume covering ellipsoid is the core step of the MVEbased HEC algorithm. Due to the combinatorial nature of the clustering problem, the computational complexity of the MVE is critical. Given a set of
n
data points in a
d
dimensional space
𝒙= { 𝒙 i ∈ ℝ d } i=1 n ,
, computing the MVE enclosing a set of data points is defined as a maxdet problem
[10]
–
[12]
, and it is a difficult problem to solve directly.
(8) Minimize logdetQ, subject to Q= Q T >0 and ( 𝒙 i − 𝒙 C ) T Q −1 ( 𝒙i − 𝒙 C )≤1, where i=1, 2,..., n.
Herein, we use three practical approximation approaches for finding the MVE. First, we derive the convex optimization problem of (9) from the LöwnerJohn ellipsoid
[12]
, which can be interpreted geometrically as minimizing the volume of the ellipsoid.
(9) Minimize det A −1 , subject to A>0 and A 𝒙 i −𝒃 ≤1, where i=1, 2,..., n.
Second, we solve the convex optimization problem of (10) by approximating the object function with the sum of the eigenvalues of matrix
Q
[9]
,
[12]
.
(10) Minimize Trace (Q), subject to Q= Q T >0 and ( 𝒙 i − 𝒙 C ) T Q −1 ( 𝒙 i − 𝒙 C )≤1, where i=1, 2,..., n.
Third, we employ a fastapproximation algorithm for finding MVE using Khachiyan’s algorithm
[13]
–
[15]
.
(11) MVE={𝒙∈ ℝ d  (𝒙− 𝒙 C * ) T Q* (𝒙− 𝒙 C * )≤1}, where Q * = 1 d (P U * P T −P 𝒖 * (P 𝒖 * ) T ) −1 , 𝒙 C * =P 𝒖 * ,
P=[ 𝒒 1 , 𝒒 2 ,…, 𝒒 n ]∈ ℝ d×n , 𝒒 i T =[ 𝒙 i T 1]
;
i
= 1, 2,...,
n
,
u
is the dual variable, and
U
= diag(
u
). The KHEC algorithm is described as follows.
Algorithm 1. KHEC algorithm Input: Given a set of n data points in a ddimensional space, $\bm{x}={\{{\bm{x}}_{i}\in {\mathbb{R}}^{d}\}}_{i=1}^{n}$ Output: The partition matrix P. Step 1: Fix the number of clusters, C. Randomly select C data points, and set these as the centers of clusters, m_{k}. Step 2: Initially determine the assignment of the partition matrix P, using the Euclidean distance measure with m_{k}. $$\begin{array}{l}{P}_{ik}=1,\text{\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}}{D}_{\text{Euc}}({\bm{x}}_{i},{\bm{m}}_{k})<{D}_{\text{Euc}}({\bm{x}}_{i},{\bm{m}}_{j}),\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j=1,2,\dots ,C\text{\hspace{0.17em}\hspace{0.17em}and\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j\ne k.\\ {P}_{ik}=0,\text{\hspace{0.17em}\hspace{0.17em}otherwise.}\end{array}$$ Step 3: Compute new centers of clusters and the number of data points belonging to the clusters. $${\bm{m}}_{k}=\frac{{{\displaystyle \sum}}_{i}{P}_{ik}{\bm{x}}_{i}}{{{\displaystyle \sum}}_{i}{P}_{ik}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{n}_{k}={\displaystyle {\sum}_{i}{P}_{ik}}.$$ Step 4: Compute the pseudocovariance matrix Q_{k}, using the MVE approximation algorithm. Step 5: Determine the new assignment of the partition matrix P, using the modified Gaussian kernel measure (3) with m_{k} and Q_{k}. $$\begin{array}{l}{P}_{ik}=1,\text{\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}}{D}_{\text{MGK}}({\bm{x}}_{i},{\bm{m}}_{k};{Q}_{k})<{D}_{\mathrm{MGK}}({\bm{x}}_{i},{\bm{m}}_{j};{Q}_{j}),\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j=1,2,\dots ,C\text{\hspace{0.17em}\hspace{0.17em}and\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j\ne k.\\ {P}_{ik}=0,\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}otherwise.}\end{array}$$ Step 6: If there are no changes to the partition matrix P, then STOP. Otherwise, repeat Step 3 through Step 5.
 2. EKHEC Algorithm
Most partitional clustering algorithms do not work well on a dataset with a nonlinear and elongated structure. To overcome this difficulty, spectral clustering algorithms have been proposed
[16]
, and there is ongoing research into kernel methods
[17]
–
[19]
. To perform clustering on a dataset with more complexshaped clusters—that is, a nonlinear and elongated structure—we employ one of the kernel tricks—a kernel principal component analysis (KPCA). Using integral operator kernel functions, we can efficiently compute the principal components in highdimensional feature spaces, which are related to the input space based on a nonlinear map
[17]
,
[18]
. The KPCA can be summarized as follows
[17]
,
[18]
: Given a set of
n
data points mapped onto a feature space
Φ(𝒙)= {Φ( 𝒙 i )∈𝓕} i=1 n ,
, the covariance matrix in a kernel feature space is defined as
(12) C Φ = 1 n ∑ j=1 n Φ ( 𝒙 j )Φ ( 𝒙 j ) T .
We then have
(13) λ(Φ( 𝒙 k )⋅𝑽)=(Φ( 𝒙 k )⋅ C Φ 𝑽); k=1,2,…,n,
where λ ≥ 0 is the eigenvalue, and
V
is the eigenvector
𝑽 = ∑ i=1 n α i Φ( 𝒙 i ) .
The eigenvector
V
lies in the span of
Φ(𝒙)= {Φ( 𝒙 i )∈𝓕} i=1 n ,
By defining the
n
×
n
kernel matrix
K
, as
K_{ij}
= (Φ (
x_{i}
)·Φ(
x_{j}
)), we can obtain
(14) nλK𝞪= K 2 𝞪,
where
α
denotes the column vector with entries
α
_{1}
,
α
_{2}
, ⋯,
α_{n}
. To find the solutions of (14), we solve the eigenvalue problem
(15) nλ𝞪=K𝞪,
for nonzero eigenvalues. The solutions
α^{k}
, belonging to nonzero eigenvalues, are normalized by requiring that the corresponding eigenvectors in a feature space be normalized— that is, (
V^{k}
·
V^{k}
) = 1.
For the principal component extraction, we compute the projections of the image of the data points Φ(
x
) onto eigenvectors
V^{k}
in the feature space
(16) 𝒙 ˜ =( 𝑽 k ⋅Φ(𝒙))= ∑ i=1 n α i k (Φ( 𝒙 i )⋅Φ(𝒙)) = ∑ i=1 n α i k k( 𝒙 i ,𝒙),
where
k
(
x
,
y
) is the kernel function. We use the radial basis function
k(𝒙,𝒚)=exp( − 𝒙−𝒚  2 2 σ 2 )
as the kernel function.
The EKHEC algorithm performs the KHEC algorithm in a kernel feature space. The EKHEC algorithm is described as follows.
Algorithm 2. EKHEC algorithm Input: Given a set of n data points in a ddimensional space, $\bm{x}={\{{\bm{x}}_{i}\in {\mathbb{R}}^{d}\}}_{i=1}^{n}$ Output: The partition matrix P. Step 1: Compute kernel matrix K.$${K}_{ij}=\text{\Phi}({\bm{x}}_{i})\cdot \text{\Phi}({\bm{x}}_{j})=k({\bm{x}}_{i},{\bm{x}}_{j});\text{\hspace{0.17em}\hspace{0.17em}}i,\text{\hspace{0.17em}\hspace{0.17em}}j=1,2,\mathrm{...},n.$$ Step 2: Find a solution 𝞪 of the eigenvalue problem nλ𝞪 = K𝞪 Step 3: Using the solutions 𝞪^{k}, belonging to nonzero eigenvalues, compute projections of the image of data points, Φ(x) onto eigenvectors, V^{k}, in a feature space $$\begin{array}{c}\tilde{\bm{x}}=({\text{\bm{V}}}^{\text{k}}\cdot \Phi ({\bm{x}}_{i}))={\displaystyle {\sum}_{i=1}^{n}{\alpha}_{i}^{k}}(\Phi ({\bm{x}}_{i})\cdot \Phi (\bm{x}))\\ ={\displaystyle {\sum}_{i=1}^{n}{\alpha}_{i}^{k}}k({\bm{x}}_{i},\bm{x}).\end{array}$$ Step 4: Fix the number of clusters C. Randomly select C data points in a feature space and set these as the centers of clusters, ${\tilde{\bm{m}}}_{k}$. Step 5: Initially determine the assignment of the partition matrix P, using the Euclidean distance measure with $$\begin{array}{l}{P}_{ik}=1,\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{D}_{\text{Euc}}({\tilde{\U0001d501}}_{i},{\tilde{\bm{m}}}_{k})<{D}_{\text{Euc}}({\tilde{\U0001d501}}_{i},{\tilde{\bm{m}}}_{j}),\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j=1,\text{\hspace{0.17em}\hspace{0.17em}}2,\mathrm{...},\text{\hspace{0.17em}\hspace{0.17em}}C\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}and\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j\ne k.\\ {P}_{ik}=0,\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}otherwise}.\end{array}$$ Step 6: Compute new centers of clusters and the number of data points belonging to the clusters $${\tilde{\bm{m}}}_{k}=\frac{{{\displaystyle \sum}}_{i}{P}_{ik}{\tilde{\bm{x}}}_{i}}{{{\displaystyle \sum}}_{i}{P}_{ik}},\text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}{n}_{k}={\displaystyle {\sum}_{i}{P}_{ik}}.$$ Step 7: Compute the pseudocovariance matrix ${\tilde{Q}}_{k}$, using the MVE approximation algorithm.Step 8: Determine a new assignment of the partition matrix P, using the modified Gaussian kernel measure (3) with ${\tilde{\bm{m}}}_{k}$ and ${\tilde{Q}}_{k}$.$$\begin{array}{l}{P}_{ik}=1,\text{\hspace{0.17em}\hspace{0.17em}if\hspace{0.17em}\hspace{0.17em}}{D}_{\text{MGK}}({\tilde{\bm{x}}}_{i},{\tilde{\bm{m}}}_{k};{\tilde{Q}}_{k})<{D}_{\text{MGK}}({\tilde{\bm{x}}}_{i},{\tilde{\bm{m}}}_{j};{\tilde{Q}}_{j}),\\ \text{\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}\hspace{0.17em}}j=1,\text{\hspace{0.17em}\hspace{0.17em}}2,\mathrm{...},\text{\hspace{0.17em}\hspace{0.17em}}C\text{\hspace{0.17em}\hspace{0.17em}and\hspace{0.17em}\hspace{0.17em}}j\ne k.\\ {P}_{ik}=0,\text{\hspace{0.17em}\hspace{0.17em}otherwise.}\end{array}$$ Step 9: If there are no changes in partition matrix P, then STOP. Otherwise, repeat Steps 6 through Step 8.
IV. Experimental Results
To validate the proposed HEC algorithms, we conducted experiments on synthetic and benchmark datasets. The experiments described in this paper were conducted on a PC with an Intel(R) Xeon(R) CPU W5590 @ 3.33 GHZ, and all algorithms were realized using Matlab with CVX
[20]
,
[21]
and LMI Toolbox. We employed the misclassification rate (MCR)
[6]
and the normalized mutual information (NMI)
[6]
,
[16]
,
[22]
as performance evaluation measurements. The MCR can be computed by dividing the number of data points that are not in a correct cluster by the total number of data points in the dataset. Given two random variables,
X
and
Y
, the NMI can be formulated as follows
[6]
,
[16]
:
(17) NMI(𝑿,𝒀)= I(𝑿,𝒀) [ H( 𝑿 )+H( 𝒀 ) ] ,
where
I
(
X
,
Y
) is the mutual information and
H
(
X
) is the entropy of
X
.
 1. Synthetic Dataset
To compare the performance of the MVE approximation algorithms, a dataset with a Gaussian distribution was used in the experiments. The dataset consists of 1,000 data points in a 2dimensional space, which is generated using the multivariate Gaussian function in Matlab.
We first estimated the execution time for approximating the MVE, as shown in
Table 1
. Moshtagh’s method of (11), based on Khachiyan’s algorithm, was the fastest. The boundaries of the approximated MVE are illustrated in
Fig. 1
. The ellipses with the red lines are depicted with the centers obtained using the approximation algorithms. The ellipses with the blue lines are depicted with the centers
m
= [1.0085 1.0113]
^{T}
, computed using (5). As shown in
Fig. 1
, the pseudocovariance matrices,
Q
are well approximated using all methods. Although the computation time of (9) is much faster than that of (8), they have the same approximation results (
Q = [ 0.0225 −0.0077 −0.0077 0.0189 ]
and
m
= [0.9485 0.7902]
^{T}
). The decision boundaries of (9) and (11) are quite similar compared to the results of (10). The MVE approximated by (10) has the pseudocovariance matrices,
Q=[ 0.0228 −0.0060 −0.0060 0.0173 ]
and the center
m
= [0.9014 0.5369]
^{T}
and the MVE approximated by (11) has
Q=[ 0.0230 −0.0081 −0.0081 0.0194 ]
and
m
= [0.9532 0.7472]
^{T}
.
Execution time of MVE approximation algorithms.
Algorithm  (8)  (9)  (10)  (11) 
Execution time  46.50 sec  6.77 sec  1.89 sec  0.71 sec 
Illustrated decision boundaries of approximated MVE. Red lines are drawn with centers obtained by approximation algorithms and blue lines are drawn with center computed by (5): (a) MVE approximated by (8); (b) MVE approximated by (9); (c) MVE approximated by (10); and (d) MVE approximated by (11).
To illustrate the efficacy of the proposed method, three synthetic datasets were used in the experiments. The detailed characteristics of the synthetic datasets are given in
Table 2
.
Three synthetic datasets for the experiments.
Dataset  Dataset 1  Dataset 2  Dataset 3 
Description  Ballellipse shape  Gaussianbanana shape  Gaussiancircle shape 
Dimension  2  2  2 
Number of clusters  2  2  2 
Number of instances  14  128  156 
4  10  100  28  56  100 
The first dataset shows the capability of the KHEC algorithm, which is able to deal with clusters of different size, different density, and ellipsoidal shapes. This dataset contains a small ballshaped cluster and an elongated cluster in the shape of an ellipse. The clustering results of the KHEC with three different MVE approximation methods (9)–(11), are the same on dataset 1; the rest — that is, (9) and (10) — are omitted. The clustering results for dataset 1 are depicted in
Fig. 2
.
As shown in
Fig. 2
, the proposed KHEC algorithm can separate the two clusters well, but the other methods fail to find correct clusters in dataset 1. As we mentioned in the introduction, the Kmeans algorithm performs well for clusters of equal size, equal density, and spherical shapes. Although the HEC algorithm with Mahalanobis distance divides data points into clusters of different size and ellipsoidal shapes, it fails to find the correct clusters when two clusters are close together. In the case of two clusters that are far enough apart, the algorithm is able to find the correct clusters. The proposed KHEC algorithm can control the clustering results by adjusting the variable
α
∈ [0, 1], which controls the weight of the first and second terms of the modified Gaussian kernel. This experiment shows that the KHEC algorithm — which minimizes the weighted sum of volumes of clusters — allocates data points into clusters such that its compactness and density are maximized and consequently finds condensed clusters with respect to the volumes and densities of the clusters.
Synthetic dataset 1 and some clustering results of this data: (a) dataset consisting of two clusters of different size, different density, and different shape, (b) clustering results of Kmeans algorithm, (c) clustering results of HEC algorithm with Mahalanobis distance, and (d) clustering results of proposed KHEC algorithm, with α = 0.2.
The second and third datasets were generated to validate the capability of the EKHEC algorithm, which is designed for and specializes in handling geometrically complexshaped clusters, as shown in
Figs. 3
and
5
. The clustering results of the KHEC with MVE approximations (9) and (10) are omitted, because their clustering results are the same as the clustering results of the KHEC based on Moshtagh’s method, which are depicted in
Figs. 3(e)
and
5(e)
.
Figure 3
illustrates the clustering results on dataset 2, which consists of a cluster with Gaussian distribution and a cluster with a banana shape. The first four methods show similar results, except for one data point. Note that the decision boundaries of clusters determined by each algorithm are quite different, although the clustering results are similar. The KHEC algorithm with Moshtagh’s MVE approximation creates the most separable decision boundaries compared to the others. This implies that, in this case, the Moshtagh’s MVE approximation finds a more compact ellipsoid than the linear matrix inequality (LMI)based MVE approximation. The HEC algorithm with a Mahalanobis distance and the KHEC algorithm with LMI have overlapped decision boundaries. The EKHEC algorithm is able to find the correct clusters as expected and only one data point is clustered incorrectly. The clustering results of the EKHEC algorithm on dataset 2 are illustrated in
Fig. 4
. The clustering results in the input and feature spaces are depicted in
Figs. 4(a)
and
4(b)
, respectively. The data points mapped onto the feature space are well separable and can be clustered based on the decision boundaries of the ellipsoids.
Synthetic dataset 2 and some clustering results of this data: (a) Gaussianbanana shaped dataset, (b) clustering results of Kmeans algorithm, (c) clustering results of HEC algorithm with Mahalanobis distance, (d) clustering results of KHEC algorithm with LMI, α=0.2, (e) clustering results of KHEC algorithm using Moshtagh’s method, α=0.2, and (f) clustering results of EKHEC algorithm, with α=0.02.
Clustering results of EKHEC algorithm on dataset 2: (a) clustering results and dataset depicted in input space and (b) clustering results and dataset depicted in feature space.
The clustering results on dataset 3 are presented in
Fig. 5
. Unlike the experiments on dataset 2, the HEC algorithm with the Mahalanobis distance fails to allocate data points into two clusters. The drawback of the HEC with Mahalanobis distance is that it may produce empty clusters if all data points in a cluster are reassigned to others, thereby, reducing the number of clusters; such as in the Kmeans algorithm
[10]
. The KHEC algorithm based on Moshtagh’s method creates more compact decision boundaries than the KHEC with LMI, as in the experiments on dataset 2. The EKHEC algorithm finds the correct clusters and few data points are allocated incorrectly. The clustering results of EKHEC on dataset 3 are given in
Fig. 6
. The clustering results in the input and feature spaces are shown in
Figs. 6(a)
and
6(b)
, respectively. In this case, the cluster boundaries in the feature space are close to each other and overlap.
Synthetic dataset 3 and some clustering results of this data: (a) Gaussiancircle shaped dataset, (b) clustering results of Kmeans algorithm, (c) clustering results of HEC algorithm with a Mahalanobis distance, (d) clustering results of KHEC algorithm using LMI, α=0.2, (e) clustering results of KHEC algorithm using Moshtagh’s method, α=0.2, and (f) clustering results of EKHEC algorithm, with α=0.09.
Clustering results of EKHEC algorithm on dataset 3: (a) clustering results and dataset depicted in input space and (b) clustering results and dataset depicted in feature space.
As illustrated in
Figs. 4
and
6
, the EKHEC algorithm is able to handle complexshaped data by applying the kernel trick— that is, KPCA. The EKHEC algorithm is more suitable for applications that deal with complexshaped datasets, such as geospatial analysis and image segmentations. The experimental results on three synthetic datasets show that the KHEC algorithm performs well for datasets with clusters of different size, different density, and ellipsoidal shapes and that EKHEC has the capability to deal with geometrically complexshaped clusters.
 2. Benchmark Dataset
We conducted the experiments on the five benchmark datasets from UCI data repository
[23]
to show the general clustering performance of the proposed methods. The descriptions of the benchmark datasets and the values of weight parameter
α
are presented in
Table 3
.
Benchmark datasets from UCI.
Dataset  Iris  Wine  Glass  Yeast  Breast tissue 
Attributes  4  13  11  8  10 
Clusters  3  3  7  10  6 
Instances  150  178  214  1,484  106 
Values of α  KHEC  03  0.6  0.95  0.6  0.5 
EKHEC  0.5  0.45  0.4  0.4  0.4 
We compared the proposed methods against Kmeans clustering, fuzzy Cmeans clustering, GMMEM algorithm, and the MVEbased HEC algorithm with a Mahalanobis distance. The clustering results on the benchmark datasets are shown in
Fig. 7
. The KHEC algorithm outperforms Kmeans, fuzzy Cmeans, GMMEM, and the HEC algorithm on Iris, Wine, Glass, and Breast Tissue datasets in terms of MCR and NMI criterion. On Yeast dataset, it shows comparable performance. The EKHEC algorithm shows better or comparable clustering performance compared to the others.
Clustering results of benchmark dataset: (a) MCR and (b) NMI.
 3. Sensitivity Analysis of Weight Parameter
In the proposed algorithms, the parameter
α
controls the weight of the first and second terms of the modified Gaussian kernel (3) and influences the result of clustering. We empirically analyze the sensitivity of the weight parameter
α
on the iris dataset of UCI repository.
Figure 8
shows the changes in the MCR and NMI value for the two proposed methods with the weight parameter
α
varying from 0.0 to 1.0. The experiments indicate that the KHEC and EKHEC algorithms achieve stable and desirable clustering results, for values of
α
ranging from 0.3 to 0.6, in terms of MCR and NMI criterion. It should be noted that the selection of the weight parameter
α
depends on a dataset. In the experiments of this paper, the value of the weight parameter
α
was set empirically.
Sensitivity analysis of weight parameter α: (a) KHEC algorithm and (b) EKHEC algorithm.
V. Conclusion
We described a theoretical explanation and proposed the practical implementations of the HEC algorithm based on MVE with a modified Gaussian kernel. First, we showed that the HEC with a modified Gaussian kernel metric can be formulated as the problem of finding condensed clusters with respect to their volume and density. Second, we proposed a practical HEC algorithm, namely, KHEC, which is able to handle clusters of different size, different density, and ellipsoidal shapes. Third, we enhanced the capability of the KHEC algorithms—namely, EKHEC—by utilizing ellipsoids defined on the kernel feature space, which is able to deal with geometrically complexshaped clusters—that is, a nonlinear and elongated structure. The experimental results showed that the KHEC algorithm can effectively separate the clusters by creating compact boundaries of clusters. Unlike other methods, EKHEC can work on nonlinear and elongated structures, as can be seen from the experiments. In our experiments, the proposed approaches outperformed the Kmeans algorithm, fuzzy Cmeans algorithm, GMMEM algorithm, and MVEbased HEC algorithm with a Mahalanobis distance.
This work was supported by the IT R&D program of MOTIE/KEIT, Korea (10039149, Development of Basic Technology of Human Identification and Retrieval at a Distance for Active Video Surveillance Service with Realtime Awareness of Safety Threats).
BIO
mohan@etri.re.kr
Hansung Lee received his BS, MS, and PhD degrees in computer science from Korea University, Sejong City, Rep. of Korea, in 1996, 2002, and 2008, respectively. From July 1996 to July 1999, he worked for Daewoo engineering company, Seoul, Rep. of Korea. He was with Korea University, as a lecturer from March 2002 to February 2010. Since November 2009, he has been with Electronics and Telecommunications Research Institute (ETRI), Daejeon, Rep. of Korea as a senior member of the research staff. His current research interests include pattern recognition, machine learning, optimization, data mining, and bigdata analytics.
jhy@etri.re.kr
JangHee Yoo received his BSc in physics from Hankuk University of Foreign Studies (HUFS) Seoul, Rep. of Korea, in 1988 and his MSc in computer science from HUFS, in 1990. He received his PhD in electronics and computer science from the University of Southampton, Hampshire, UK, in 2004. Since November 1989, he has been with Electronics and Telecommunications Research Institute (ETRI), Daejeon, Rep. of Korea as a principal researcher, and is the director of the Human Identification Research Section. He has also been an adjunct professor with the Department of Information Security Engineering at the University of Science and Technology, Daejeon, Rep. of Korea. His current research interests include embedded computer vision, biometric systems, humanmotion analysis, intelligent video surveillance, HCI, and intelligent robots. He is a member of the IEEE and the IEEK.
Corresponding Author dhpark@korea.ac.kr
Daihee Park received his BS degree in mathematics from Korea University, Seoul, Rep. of Korea, in 1982 and his PhD degree in computer science from Florida State University, USA, in 1992. He joined Korea University, Sejong City, Rep. of Korea in 1993, where he is currently a professor in the Department of Computer and Information Science. His research interests include data mining and intelligent databases.
Han J.
,
Kamber M.
2007
Data Mining: Concepts and Techniques
2nd ed.
Morgan Kaufmann Publishers
Mao J.
,
Jain A.K.
1996
“A SelfOrganizing Network for Hyperellipsoidal Clustering (HEC),”
IEEE Trans. Neural Netw.
7
(1)
16 
29
DOI : 10.1109/72.478389
Song W.
“The Hyperellipsoidal Clustering Using Genetic Algorithm,”
IEEE Int. Conf. Intell. Process. Syst.
Beijing, China
Oct. 28–31, 1997
592 
596
DOI : 10.1109/ICIPS.1997.672853
Ichihashi H.
,
Ohue M.
,
Miyoshi T.
“Fuzzy CMeans Clustering Algorithm with Pseudo Mahalanobis Distances,”
Proc. Third Asian Fuzzy Syst. Symp.
June 18–21, 1998
148 
152
Moshtaghi M.
2011
“An Efficient Hyperellipsoidal Clustering Algorithm for ResourceConstrained Environments,”
Pattern Recogn.
44
(9)
2197 
2209
DOI : 10.1016/j.patcog.2011.03.007
Song W.
1997
“Comments on a SelfOrganizing Network for Hyperellipsoidal Clustering (HEC),”
IEEE Trans. Neural Netw.
8
(6)
1561 
1563
DOI : 10.1109/72.641479
Krishnapuram R.
,
Kim J.
“A Clustering Algorithm Based on Minimum Volume,”
IEEE Int. Conf. Fuzzy Syst.
New Orleans, LA, USA
Sept. 8–11, 1996
2
1387 
1392
DOI : 10.1109/FUZZY.1996.552379
Lee H.
,
Park J.
,
Park D.
2002
“Hyperellipsoidal Clustering Algorithm Using Linear Matrix Inequality,”
J. Korea Institute Intell. Syst.
12
(4)
300 
305
Kumar M.
,
Orlin J.B.
2008
“ScaleInvariant Clustering with Minimum Volume Ellipsoids,”
Comput. Operations. Res.
35
(4)
1017 
1029
DOI : 10.1016/j.cor.2006.07.001
Todd M.J.
,
Yildirim E.A.
“On Khachiyan’s Algorithm for the Computation of MinimumVolume Enclosing Ellipsoids,”
Discr. Appl. Math.
155
(13)
1731 
1744
DOI : 10.1016/j.dam.2007.02.013
Moshtagh N.
2005
Minimum Volume Enclosing Ellipsoids, Tech. report,
the School of Engineering and Applied Science, Univ. Pennsylvania, PA
Schölkopf B.
,
Smola A.
,
Müller K.R.
“Kernel Principal Component Analysis,”
Proc. ICANN, LNCS
Lausanne, Switzerland
Oct. 8–10, 1997
1327
583 
588
DOI : 10.1007/BFb0020217
Tax D.
,
Juszczak P.
“Kernel Whitening for OneClass Classification,”
Proc. Pattern Recogn. Support Vector Mach.
Niagara Fall, Canada
Aug. 10, 2002
2388
40 
52
DOI : 10.1142/S021800140300240X
CVX Research, Inc.
CVX: Matlab Software for Disciplined Convex Programming, version 2.0.
http://cvxr.com/cvx
Cai D.
,
He X.
,
Han J.
2005
“Document Clustering Using Locality Preserving Indexing,”
IEEE Trans. KDE
17
(12)
1624 
1637
DOI : 10.1109/TKDE.2005.198
The machine learning dataset of UCI are available.
http://archive.ics.uci.edu/ml/