The Support Vector Data Description (SVDD) has achieved great success in anomaly detection, directly finding the optimal ball with a minimal radius and center, which contains most of the target data. The SVDD has some limited classification capability, because the hypersphere, even in feature space, can express only a limited region of the target class. This paper presents an anomaly detection algorithm for mitigating the limitations of the conventional SVDD by finding the minimum volume enclosing ellipsoid in the feature space. To evaluate the performance of the proposed approach, we tested it with intrusion detection applications. Experimental results show the prominence of the proposed approach for anomaly detection compared with the standard SVDD.
1. Introduction
A
nomaly detection has received significant attention over the past decades, and it spans numerous disciplines and application domains, such as intrusion detection, fraud detection, medical and public health anomaly detection, industrial damage detection, image processing, and sensor networks. In general, different application domains have different definitions of an anomaly, and it is difficult to distinguish anomalies from noise, insofar as noise tends to be similar to anomalies. With intrusion and malware detection in particular, adversaries often try to make malicious behavior appear normal. Hence, it is difficult to detect anomalies
[1]
. Despite considerable research efforts, anomaly detection remains a difficult and challenging problem.
Intrusion detection is the art of finding suspicious and malicious (i.e., unauthorized, inappropriate, and anomalous) activity on computer and network systems. Intrusion detection can be divided into two major paradigms in terms of a general strategy for detection
[2
,
3]
: misuse intrusion detection (MID) and anomaly intrusion detection (AID). The MID model identifies malicious behavior by patternmatching based on known patterns of malicious behavior. This is referred to as rule or signaturebased detection, and involves extracting patterns from known and/or observed attacks. The MID model may perform poorly against new types of attack that are unknown to the intrusion detection system, and the signature must be manually updated whenever novel attacks are reported. The AID model, on the other hand, constructs a normal behavior profile and examines deviations from that profile to detect suspicious behavior. The system discriminates attack (i.e., abnormal) behavior by thresholding the value of deviations. This can be considered as a type of novelty detection, and it can be used to detect unobserved (i.e., newly emerging) attacks. The significant basic assumption of AID is that abnormal behavior from the intruder and malware will be evidently distinguishable from the normal behavior of legitimate users and software
[3]
. As mentioned before, malicious behavior has become more similar to normal behavior and is thus more difficult to detect.
With the significant success of support vector learning methods in intelligent systems, there is ongoing research into applying a support vector machine (SVM) to anomaly detection
[4

5]
. With anomaly detection, one class of data is regarded as the target class, and the remaining data is classified as an outlier (or anomalous data). Because the other class of data might be available only with difficulty, and because only the normal class of data is easy to obtain in general, oneclass classification methods are recently adopted for anomaly detection
[6

11]
. One of the bestknown support vector learning methods for anomaly detection (i.e., oneclass SVM) is the support vector data description (SVDD)
[12]
. The SVDD employs a ball for expressing the region of the target class. It is known as an optimization problem for finding a minimum volume enclosing hypersphere with a minimal radius R and center a, containing most of the target data. Because the hypersphere in the input space can express only a limited region of the target class, the SVDD enhances its expressing capability by using balls defined in the kernel feature space with kernel tricks. However, even with balls in the feature space, the expressing capability of the SVDD can be limited
[10
,
11
,
13

16]
. To address the difficulty with standard SVDD, modified SVDDs that find the ellipsoidal decision boundary for normal data are proposed. GhasemiGol
et al
.
[14]
proposed a modified SVDD, viz., the Ellipse SVDD (ESVDD), and implemented it with AID applications
[10

11]
. The ESVDD finds the tighter decision boundary of a target class by using the hyperellipse around the target class defined in the input space. However, it has limitations in employing various kernel functions for feature space mapping. Wang
et al
.
[15]
presented the oneclass SVM based on the hyperellipsoid model, but it requires the solution of computationally expensive secondorder cone programming techniques
[16]
. Rajasegarar
et al
.
[16]
presented the centered hyperellipsoidal support vector machine (CESVM) for anomaly detection in sensor networks, addressing the computational challenge in
[15]
. The CESVM is a nonparametric anomaly detection model. That is, the training phase and testing phase are not explicitly distinct. In many real intrusion detection applications, the AID systems are trained using the normal dataset in advance. Only then can they be used detect an intrusion or attack. Therefore, it is difficult to implement it directly with conventional AID applications.
In this paper, we propose a new anomaly detection algorithm based on a minimum volume enclosing ellipsoid (MVEE) with kernel principal component analysis (KPCA), for mitigating the aforementioned limitations of the conventional SVDD by using the ellipsoid defined in the feature space. To evaluate the proposed approach, we conducted experiments with an intrusion detection benchmark dataset, containing abnormal data significantly similar to normal data. Experimental results show that the proposed approach leads to a significant improvement in anomaly detection with the intrusion detection datasets over the standard SVDD.
The remaining parts of this paper are organized as follows. We summarize the previous work related to our study in Section 2. In Section 3, the proposed anomaly detection approach based on MVEE with KPCA is provided. Experimental results and discussion are provided in Section 4. Finally, some concluding remarks are given in Section 5.
2. Related Work
The main concern of this study is to provide an anomaly detection approach for intrusion detection, one that alleviates the aforementioned limitations of the standard SVDD. This section presents previous research related to AID and summarizes the SVDD.
 2.1 Anomaly Intrusion Detection
AID refers to the detection of malicious behavior or intrusion activity in a computer host and network system based on a predefined profile of normal behavior. Anomaly detection techniques are applicable in the field of intrusion detection with the basic assumption that intrusion activity is noticeably different from normal system behavior
[1
,
3]
. There are three main categories of AID techniques
[17]
: statisticalbased, knowledgebased, and machinelearningbased methods. Statisticalbased models capture the activity of the system to create a stochastic behavior profile. However, it is difficult to model all behavior using stochastic methods. The knowledgebased model is an expert system, and has many virtues, e.g., robustness, flexibility, and scalability. Generally, a human expert manually constructs the knowledge (rule) base. Hence, building a highquality knowledge base is a difficult and timeconsuming process
[17
,
18]
. Machinelearning approaches such as Bayesian Networks, Markov modes, Neural networks, Fuzzy logic, genetic algorithms, clustering, and outlier detection have been extensively studied over the past decade to overcome some of the drawbacks to statistical and knowledgebased AID
[17

25]
. Recently, there is much ongoing research to apply data mining and machinelearning techniques to AID for designing more intelligent intrusion detection systems
[2
,
17]
. Because support vector learning shows superior performance in pattern classification and function approximation, it has developed into a viable tool for intrusion detection. There are two types of AID, based on the support vector learning approach
[2]
: standard SVMbased and oneclass SVMbased methods. The standard SVMbased method divides training data into normal and abnormal datasets during the training phase and classifies observed activity into normal and abnormal behavior during the testing phase. With this model, one class affects the training result of the other class, owing to the unbalanced volume of normal and abnormal training datasets. This model may be subject to misclassification for newly emerging attacks by creating a decision boundary including the unobserved area
[2]
. To overcome this problem, oneclass SVM (e.g., SVDD) and its variations have been implemented in AID
[7

11]
. It is possible to find the decision boundary of the normal class for AID because the training result is not affected by data instances from the abnormal class and does not include the unobserved area.
 2.2 Support Vector Data Description
The SVDD method approximates an optimal hypersphere in the feature space with a minimal radius
R
and center
a
, containing most of the target data. It can be derived as follows
[2
,
12]
: Given a target dataset
D
consisting of ddimensional ndata points
the SVDD is defined as a problem for finding a sphere that minimizes its volume, including the target dataset. It is formulated with the following optimization problem:
where
R
^{2}
is the square value of the sphere’s radius, and
a
is the center of the sphere.
ξ_{i}
is the penalty term that indicates how far
i
th data points
x_{i}
deviate from the sphere’s boundary, and
C
is the tradeoff constant.
By introducing a Lagrange function and a saddlepoint condition, we obtain the following dual problem, known as the quadratic programing (QP) optimization problem:
Once the dual solution
α_{i}
is obtained by solving the QP problem (2), we can define the decision function as follows:
To express a more complex region of the target class, we can define the hypersphere in the kernel feature space with the Mercer kernel trick as follows:
In this case, the decision function can be summarized as follows:
3. Anomaly Detection Based on MVEE and KPCA
This section presents an anomaly detection algorithm based on MVEE and KPCA, alleviating the limitations of standard SVDD by using an ellipsoid defined in the feature space. The proposed approach consists of two phases: the training phase for finding the decision boundary of the normal class in the feature space, and the testing phase for detecting anomalies.
 3.1 Training Phase
As mentioned in the introduction, the SVDD generally enhances its descriptive power by using a hypersphere defined in the feature space. However, there are some limitations that cannot be overcome by simply adopting the kernel feature space. To mitigate this drawback to conventional SVDD, we defined the MVEE in the empirical kernel feature space. To define the hyperellipsoid in the feature space, we employed KPCA, a variation of Principal Component Analysis (PCA) in a kernel feature space. Using integral operator kernel functions, we can efficiently compute principal components in highdimensional feature spaces, which are related to the input space by some nonlinear map.
KPCA can be summarized as follows
[26

28]
: Given a set of
n
training data points mapped onto a feature space,
the covariance matrix in a kernel feature space is defined as follows:
Then, we have
where
λ
≥ 0 are the eigenvalues, and
V
are eigenvectors,
By defining the
n
x
n
kernel matrix
K
as
K_{ij}
= (Փ(
x_{i}
)・Փ(
x_{j}
)), we can obtain following equation:
where
α
denotes the column vector with the entries
α
_{1}
,
α
_{2}
,⋯,
α_{n}
. We solve the eigenvalue problem of (8) for nonzero eigenvalues. The solutions
α
^{k}
belong to nonzero eigenvalues, and they are normalized by requiring that the corresponding eigenvectors in a feature space be normalized, such that (
V
^{k}
・
V
^{k}
)=1.
For the principal component extraction, we compute projections of the image for the training data points Փ(
x
) onto eigenvectors
V
^{k}
in the feature space.
where
k
(
x
,
y
) is the kernel function. In this paper, we use the radial basis function.
Next, we approximate the decision boundary with the hyperellipsoid in the feature space. The ellipsoid can be expressed as follows:
where
Q
is symmetric and a positive definite, i.e.,
Q
=
Q
^{T}
˃0, and
x_{c}
∈ ℝ
^{d}
is the center of the ellipsoid. The matrix
Q
determines how far the ellipsoid extends in every direction from
x_{c}
. The length of the semiaxes of ellipsoid E is given by
where
λ_{i}
are the eigenvalues of matrix
Q
, and the directions of the semiaxes of ellipsoid E are the eigenvectors of matrix
Q
. The volume of the ellipsoid is proportional to det(
Q
)
^{1/2}
[29

31]
. An example of an ellipsoid in ℝ
^{2}
is given in
Fig. 1
.
Example of an ellipsoid in R^{2}.
Let us consider the ellipsoid approximation problems for computing an MVEE around points in the feature space, {
x
_{1}
,
x
_{2}
,⋯,
x_{n}
}∈ F . This problem is equivalent to finding the minimum volume ellipsoid around the polytope defined by the convex hull of those points. This problem can be reduced to (12) from the ellipsoid expressed by (11).
Note that each training data point must be inside the ellipsoidal boundary, and the object function is proportional to the volume of the ellipsoid, represented by the covariance matrix
Q
. The matrix
Q
is a positivedefinite matrix, and it is symmetric. By introducing the Schur complement, (12) can be written as the following maxdet problem, one of the linear matrix inequality (LMI) problems. The second constraint in (12) can be written as the LMI constraint. However, this problem is difficult to solve directly
[29

31]
.
In this paper, we employ MVEE
[31]
, an efficient and fast approximation algorithm, based on Khachiyan’s algorithm
[32

34]
. The algorithm can be summarized as follows
[31]
: Consider a set of n data points in a ddimensional space,
S
= {
x
_{1}
,
x
_{2}
,⋯,
x_{n}
}∈ R
^{d}
. The minimum volumeenclosing ellipsoid containing
S
is denoted by MVEE(
S
). Let a “lifting” of
S
to R
^{d}
^{+1}
be defined by
S
' ={±
q
_{1}
, ±
q
_{2}
,⋯,±
q_{n}
}, where
q_{i}
^{T}
=[x
_{i}
^{T}
,1];i = 1,2,⋯,
n
. Based on this definition, each data point
x_{i}
is lifted to the hyperplane,
H
= {(
x
,
x_{d}
_{+1}
)∈R
^{d}
^{+1}

x_{d}
_{+1}
=1}. The MVEE(
S
') is centered at the origin, because
S
' is centrally symmetric. The minimum volumeenclosing ellipsoid for the original problem is recovered as the intersection of
H
with the MVEE containing the lifted points
q_{i}
, as follows: MVEE(
S
) = MVEE(
S
')∩
H
. The primal problem with the lifted points
q_{i}
can be written as
The Lagrangian dual problem is given by
where
V
(
z
)=
Q
diag(z)
Q
^{T}
,
Q
= [
q
_{1}
,
q
_{2}
,⋯,
q_{n}
], and
z
= (
d
+1)
u
.
The Lagrangian formulation of (14) can be written as
By introducing the Karush–Kuhn–Tucker (KKT) conditions for optimality, we obtain the following equality conditions:
where
Z
= diag(
z
) , and
Q
= [
q
_{1}
,
q
_{2}
,⋯,
q_{n}
]. Assume the matrix
M
^{*}
˃ 0 , and
M
^{*}
∈ R
^{(d+1)x(d+1)}
is the optimal solution to the primal problem (14) with the Lagrangian multipliers
z
^{*}
∈ ℝ
^{n}
. We then derive
where
V
(
z
)=
Q
diag(
z
)
Q
^{T}
, and
z
= (
d
+1)
u
.
Given
q
^{T}
= [
x
^{T}
,1], the equation of the ellipsoid can be written as
Let us denote
where
P
= [
q
_{1}
,
q
_{2}
,⋯,
q_{n}
]∈ R
^{dxn}
. We then have
where
Q
^{1}
=
PUP
^{T}

P
u
(
P
u
)
^{T}
. The inverse
V
(
u
)
^{1}
is given by
We then have
Therefore, for the dual optimal solution
u
^{*}
, we can obtain
where
P
= [
q
_{1}
,
q
_{2}
,⋯,
q_{n}
]∈
F
,
q_{i}
^{T}
=[
x_{i}
^{T}
,1];
i
=1,2,⋯
n
.,
u
is the dual variable, and
U
= diag(
u
) . We obtain the approximated optimal covariance matrix
Q
^{*}
and the center
x_{c}
^{*}
of the MVEE as the results of the training phase.
 3.2 Testing Phase
When the training phase is complete, the test data should be mapped in the same kernel feature space as the training phase in (9), because the decision boundary of the training phase is defined in the kernel feature space. Given a set of
d
dimensional
m
testing data points in the input space,
we can compute projections of the image of the testing data points Փ(
x^{tst}
) onto eigenvectors
V
^{k}
in the kernel feature space. The projection is defined as follows:
where
x^{trn}
is a set of ntraining data points, and
V
^{k}
,
α^{k}
are obtained in the training phase. An obvious choice for the decision function of the testing data
x^{tst}
as a target instance is
where
Q
^{*}
is the approximated optimal covariance matrix, and
x_{c}
^{*}
is the center of the MVEE, which are obtained from the training phase.
For more flexibility, we modified the decision function (24) ever so slightly as follows
[13]
:
where
ε
is a constant chosen by the user as a parameter for controlling the boundary of a target class.
Fig. 2
shows the illustrations of the training and testing phases for the proposed anomaly detection method based on MVEE and KPCA with a twodimensional toy example. The training dataset is depicted in
Fig. 2
(a). The decision boundary for the training dataset that is obtained during the training phase is depicted in
Fig. 2
(b). During the testing phase, each data point is assigned to one of two classes: the anomalous class or the normal class. If a data point is inside the decision boundary, i.e., inside the ellipsoid, the data point is classified as normal. If a data point is outside the decision boundary, the data point is regarded as anomalous data.
Illustrations of the training and testing phases for the proposed approach with a twodimensional toy example: (a) training dataset; (b) decision boundary obtained after the training phase; (c) detection of anomalous data during the testing phase. If a test data point is inside the ellipsoid, the data point is regarded as normal. If a test data point is outside the ellipsoid, the data point is regarded as anomalous.
4. Experimental Results and Analysis
This section presents the criteria for evaluating the performance of the proposed approach, along with the experimental results that validate it. In this study, the evaluation measurements were based on a twoclass confusion matrix. This evaluation involved determining the decision boundaries of the standard SVDD and the proposed method using two synthetic datasets, in an effort to show the efficacy of the proposed approach. To evaluate the proposed anomaly detection approach, we conducted experiments with intrusion detection datasets, specifically the KDD CUP 1999 dataset
[35]
and the SNMP MIB trafficflooding dataset
[3]
. The experiments described in this paper were conducted on a PC with an Intel(R) Xeon(R) CPU W5590 @ 3.33 GHZ, and all algorithms were implemented using the data description toolbox for Matlab
[36]
.
 4.1 Performance Evaluation Criteria for AID Algorithm
We employed the confusion matrix
[37]
for the performance evaluations of the proposed AID algorithm. A confusion matrix provides the actual and predicted results obtained by an algorithm.
Fig. 3
shows the confusion matrix for anomaly detection, i.e., a twoclass classifier. Here, “actual” refers to the real label of the data points, and “predicted” refers to the label predicted by an algorithm.
Example of a confusion matrix for the anomaly detection (twoclass classification) algorithm. a: the number of positive instances correctly classified as a positive class. b: the number of positive instances misclassified as a negative class. c: the number of negative instances misclassified as a positive class. d: the number of negative instances correctly classified as a negative class.
The performance of the algorithm can be evaluated using the data value in the confusion matrix. In this study, we used the following confusionmatrixbased evaluation measurements
[11
,
37]
to evaluate the proposed method. Accuracy (
AC
) is defined as the proportion of the total number of correct predictions.
The true positive rate (
TPR
) or recall is defined as the proportion of positive instances that are correctly classified as a positive class.
The false positive rate (
FPR
) is defined as the proportion of negative instances that are incorrectly classified as a positive class.
The true negative rate (
TNR
) is defined as the proportion of negative instances correctly classified as a negative class.
The false negative rate (
FNR
) is defined as the proportion of positive instances incorrectly classified as a negative class.
Precision (
PRC
) is defined as the proportion of correctly predicted positive (or negative) instances that are predicted as a positive (or negative) class by the algorithm.
The detection rate (
DR
) is defined as the number of intrusion (attack activity) instances detected by the algorithm divided by the total number of intrusion instances
[3]
.
where
I_{i}
is the total number of intrusion instances belonging to an
i
type attack.
T_{i}
is the number of intrusion instances classified as
i
type of attack by the algorithm.
As a single value of merit for comparing different algorithms, we used the
F
measure
[11]
, a tradeoff between precision and recall.
 4.2 Illustrations of Decision Boundaries
To show the efficacy of the proposed approach, we illustrated the decision boundaries of the standard SVDD with a Gaussian kernel function alongside the proposed method, using two synthetic datasets. Dataset 1 is banana shaped, while Dataset 2 is sinecurve shaped. Datasets 1 and 2 are shown in
Fig. 4
(a) and
Fig. 4
(d), respectively. The decision boundaries of the SVDD (under the conditions
σ
= 4.0 in Gaussian kernel function,
C
=0.01) and the proposed method (under the condition 2
σ
^{2}
=0.5 of (10) in KPCA) with Dataset 1 are depicted in
Fig. 4
(b) and
Fig. 4
(c), respectively.
Fig. 4
(e) and
Fig. 4
(f) show the decision boundaries for Dataset 2 created by the SVDD (
σ
= 1.05,
C
=0.01) and the proposed method (2
σ
^{2}
=0.01), respectively. As shown in
Fig. 4
(b), the SVDD finds a compact decision boundary with Dataset 1, which contains most of the data points from the training dataset. The descriptive region for a normal class seems to represent the training dataset well. However, most data instances lie to one side (namely, the bottomright side) of the decision boundary, and some data points are outside the decision boundary altogether. The description area for Dataset 1 generated by the SVDD is defectively unbalanced between its inner and outer surface. This result suggests a possible defect in SVDDbased anomaly detection approaches. Because the description area for a normal class includes a significantly large area where no training data points reside, it can accept the abnormal data points as a normal class and/or misclassify the normal data points as an abnormal class. Conversely, the proposed method creates a more compact decision boundary with Dataset 1 compared with the standard SVDD, as shown in
Fig. 4
(c). The decision boundary contains all the data points from the training dataset and the data instances lie inside the decision boundary, almost in the center of the descriptive region. The descriptive area for a normal class includes an area where no training data points reside, an area that is smaller than it is with the SVDD. As noted the introduction, the SVDD can express only a limited region of the target class, even in the feature space.
Illustrated decision boundaries of the standard SVDD and the proposed method: (a) synthetic Dataset 1, banana shaped; (b) the decision boundary of the SVDD for synthetic Dataset 1, σ = 4.0, C = 0.01 ; (c) the decision boundary of the proposed method for synthetic Dataset 1, 2σ^{2} = 0.5 ; (d) synthetic Dataset 2, sinecurve shaped; (e) the decision boundary of the SVDD for Dataset 2, σ = 1.05, C = 0.01 ; (f) the decision boundary of the proposed method for Dataset 2, 2σ^{2} = 0.01.
The experiment results for Dataset 2 are presented in
Fig. 4
(e) and
Fig. 4
(f). Unlike the experiment with Dataset 1, there is no defective unbalance with Dataset 2 between the area near the inner and outer surfaces of the description region created by the SVDD. However, it still includes a considerably wide area where no training data points reside. The proposed method creates a more compact decision boundary than the SVDD.
As illustrated in
Fig. 4
, the proposed algorithm is able to handle complex shape datasets by using the ellipsoid defined in the feature space. The experiment results on synthetic data show that the proposed algorithm mitigates the limitations of the SVDD and consequently creates a more compact and balanced decision boundary (i.e., description area) compared to the SVDD.
 4.3 KDD CUP 1999 Dataset
To determine the performance of the proposed anomaly detection approach, we conducted experiments on one of the bestknown benchmark datasets in the field of intrusion detection, namely, the KDD CUP 1999 dataset. The 1998 DARPA Intrusion Detection Evaluation Program collected this dataset during a simulation using U.S. military networks
[35]
. For accurate analysis of the results, we used only the Correctedlabeled dataset among KDD CUP 1999 datasets. The dataset divides into five classes: normal; denial of service (DOS); unauthorized access from a remote machine (R2L); unauthorized access to local superuser privileges (U2R); and surveillance and other probing (Probe). The dataset consists of 311,029 instances (60,593 normal instances, 229,853 DoS instances, 16,329 R2L instances, 88 U2R instances, and 4,166 Probe instances). The dataset consists of 9 symbolic attributes and 32 numeric attributes. For nonnumerical attributes, we borrowed the idea for kernel mapping discrete values of attributes used in
[38]
. Let Σ
_{i}
be the set of possible values of the
i
th nonnumeric attribute and Σ
_{i}
 be the cardinality of the set Σ
_{i}
, i.e., the number of elements in the set. We encode the values of nonattributes to the Σ
_{i}
 coordinates. Each coordinate is mapped to each possible value of the
i
th nonnumerical attribute. The coordinate corresponding to the value of the attribute has a positive value 1/Σ
_{i}
, and the remaining coordinates have a value of 0. By concatenating numerical attributes and encoded coordinates of nonnumerical attributes, we obtained new feature vectors for the SVDD and the proposed algorithm, consisting of only numeric attributes.
For comparison, we set the proposed method against the standard SVDD with the KDD CUP 1999 dataset. We used 300 normal instances randomly selected for training, and used all of data instances for testing. For the SVDD, the parameters
σ
and
C
were set to 0.4 and 0.05, respectively. For the proposed method, we used 80% of the eigenvalues and set
σ
to 100,000 in the KPCA step of the training phase. The control parameter
ε
in (17) was set to 30 during the testing phase. The performance evaluation results are presented in
Table 1
.
Experiment Results with the KDD CUP 1999 Dataset. The experimental results are yielded by the SVDD under the conditionsσ=0.4,C=0.05, and the proposed method under the conditionsσ=100000,ε=30.PRCNormalis the precision of the normal dataset, andPRCAttackis the precision of the attack dataset.DRDOS,DRR2L,DRU2RandDRProbrepresent the attackdetection rate for DoS, R2L, U2R and Prob, respectively.FNormalis theFmeasure from the perspective of normal behavior, andFAttackis theFmeasure from the perspective of attack activity.
Experiment Results with the KDD CUP 1999 Dataset. The experimental results are yielded by the SVDD under the conditions σ=0.4, C=0.05, and the proposed method under the conditions σ=100000, ε =30. PRC_{Normal} is the precision of the normal dataset, and PRC_{Attack} is the precision of the attack dataset. DR_{DOS} , DR_{R2L} , DR_{U}_{2}_{R} and DR_{Pr}_{ob} represent the attackdetection rate for DoS, R2L, U2R and Prob, respectively. F_{Normal} is the Fmeasure from the perspective of normal behavior, and F_{Attack} is the Fmeasure from the perspective of attack activity.
The proposed method performed better with respect to the performance evaluation measurements, the exception being the detection rate of R2L relative to the standard SVDD. Note that it reveals a poor detection performance for R2L and U2R attacks compared with its detection rate for DoS and Probe attacks. The R2L and U2R attacks are hostbased attacks that exploit the vulnerabilities of host computers, rather than network protocols. In R2L attacks, attackers connect to a remote host computer and attempt to exploit the vulnerability of a host computer to illicitly gain local access as a user. For U2R attacks, attackers access the system legally with a normal user account and attempt to obtain unauthorized local superuser privileges by exploiting vulnerabilities in the system. Hence, R2L and U2R are closely similar to the normal data in the KDD CUP 1999 dataset collected from network simulations
[2
,
25]
. Once an attacker gains superuser privileges, every aspect of the system is under the attacker’s control, and he or she can more easily gain access to related systems. Therefore, the U2R attacks are especially serious and dangerous. The proposed method led to a significant improvement in the detection rate of U2R attacks over the standard SVDD. The proposed approach also shows better or comparable performance compared to the others
[10

11]
.
We empirically analyze the sensitivity of the parameter
σ
on the KDD CUP 1999 dataset.
Fig. 5
shows the changes of False Positive and False Negative value of the proposed methods with the parameter
σ
varying from 50000 to 200000. It should be noted that the selection of the parameter
σ
depends on a dataset. In the experiments of this paper, the value of the parameter
σ
was set empirically.
Sensitivity analysis of the parameter σ on the KDD CUP 1999 dataset
 4.4 SNMP MIB Traffic Flooding Dataset
To validate the proposed anomaly detection approach, we conducted experiments on a recently collected network trafficflooding attack dataset. The SNMP MIB trafficflooding dataset is a statistical dataset gathered from SNMP agents
[3]
. The dataset was collected from a testbed connected to a campus network. The normal dataset was collected from victim host computers and other host computers outside the testbed. The attack dataset was generated using the distributed denial of service attack tool
[39]
. The 13 MIB variables were selected with a correlationbased feature selection (CFS) method and gathered with an MIB update time prediction mechanism. The 13 MIB variables can be divided into 4 MIB groups: IP, TCP, UDP, and ICMP. The dataset was collected over 10 days of experimentation. The MIB variables of the target system were updated and gathered in an average of 15 seconds. The dataset consists of 5,000 MIB data instances, including 2,000 normal traffic instances, 1,000 TCPSYN flooding attack instances, 1,000 UDP flooding attack instances, and 1,000 ICMP flooding attack instances. We used 300 normal instances randomly selected for training, and used all of the data instances for testing. For the SVDD, the parameters
σ
and
C
were set to 0.02 and 0.1, respectively. For the proposed method, we used 80% of the eigenvalues—the same as our experiment with the KDD CUP 1999 dataset—and set
σ
to 30 in the KPCA training phase. The control parameter
ε
was set to 300 during the testing phase.
Table 2
provides the results for the performance evaluation on the traffic flooding attack. The proposed method outperformed the standard SVDD on the SNMP MIB trafficflooding dataset. It achieved an overall accuracy rate of 99.88%, a true positive rate of 99.70%, and a true negative rate of 100%. The experimental results show that the proposed method is able not only to reduce the falsealarm rate, but also to improve the detection rate on the SNMP MIB trafficflooding dataset.
Experiment Results with the SNMP MIB Traffic Flooding Dataset. The experimental results are yielded by the SVDD under the conditionsσ=0.02,C=0.1, and the proposed method under the conditionsσ=30,ε=300.PRCNormalis the precision of the normal dataset, andPRCAttackis the precision of the attack dataset.DRTCPSYN,DRUDPandDRICMPrepresent the attackdetection rate for the TCPSYN flooding attack, UDP flooding attack, and ICMP flooding attack, respectively.FNormalis theFmeasure from the perspective of normal behavior, andFAttackis theFmeasure from the perspective of attack activity.
Experiment Results with the SNMP MIB Traffic Flooding Dataset. The experimental results are yielded by the SVDD under the conditions σ=0.02,C=0.1, and the proposed method under the conditions σ=30, ε =300. PRC_{Normal} is the precision of the normal dataset, and PRC_{Attack} is the precision of the attack dataset. DR_{TCPSYN} , DR_{UDP} and DR_{ICMP} represent the attackdetection rate for the TCPSYN flooding attack, UDP flooding attack, and ICMP flooding attack, respectively. F_{Normal} is the Fmeasure from the perspective of normal behavior, and F_{Attack} is the Fmeasure from the perspective of attack activity.
5. Conclusion
In this paper, we provided a new anomaly detection approach for anomaly intrusion detection based on a minimum volume enclosing ellipsoid and kernel principal component analysis to generate a better decision boundary around the normal behavior class of data. The proposed method mitigates the limitations in using the standard SVDD, i.e., the limited descriptive power of the hypersphere, by using the ellipsoid in feature space. Experimental results show the superiority of the proposed method. The efficacy of the proposed approach was demonstrated by depicting the decision boundaries of the standard SVDD with a Gaussian kernel function and the proposed method using a toy synthetic dataset. The proposed method creates a decision boundary that is more compact than it is with the standard SVDD. To validate the proposed method as an anomaly intrusion detection algorithm, we conducted experiments on the KDD CUP 1999 dataset and the SNMP MIB traffic flooding dataset. The proposed method led to a significant improvement in anomaly intrusion detection over the conventional SVDD approach.
BIO
Hansung Lee received his BS, MS, and PhD degrees in computer science from Korea University, Sejong City, Rep. of Korea, in 1996, 2002, and 2008, respectively. From July 1996 to July 1999, he worked for DAEWOO engineering company, Seoul, Rep. of Korea. He was with Korea University, Sejong City, Rep. of Korea as a lecturer from March 2002 to February 2010. From November 2009 to Nobember 2014, he worked for Electronics and Telecommunications Research Institute (ETRI), Daejeon, Rep. of Korea as a senior member of the research staff. He joined the Samsung Electronics Co., Ltd. Suwon, Rep. of Korea, in December 2014. His current research interests include pattern recognition, machine learning, optimization, data mining, and bigdata analytics.
Daesung Moon received his MS degree in Dept. of Computer Engineering from Busan National University, Busan, Rep. of Korea, in 2001. He received his PhD degree in computer science from Korea University, Seoul, Rep. of Korea, in 2007. He joined the Electronics and Telecommunications Research Institute (ETRI), Daejeon, Rep. of Korea, in 2000, where he is currently a senior researcher. His research interests include network security, data mining, biometrics, and image processing.
Ikkyun Kim received the B.C.E., M.S.C.E. and Ph.D degrees in computer engineering from the Kyungpook National University, Daegue, Rep. of Korea, in 1994, 1996 and 2009 respectively. Dr. Kim has worked at Electronics and Telecommunications Research Institute (ETRI), Daejeon, Rep. of Korea since 1996. He has developed several networkbased intrusion detection systems and is currently working on new anomaly detection method against zeroday attacks for network security and Bigdata analysis for security intelligence. His research interests include DDoS protection mechanism, highspeed network protection system, and cloud computing security.
Hoseok Jung received the PhD degree in computer science from Korea University, Sejong City, Rep. of Korea, in 2011. He is a chief in the Department of Knowledge & Information at the Korea Research Institute of Bioscience and Biotechnology (KRIBB), where he is working on information system, information security, and knowledge management. From 1992 to 1994, he was an application programmer in the Department of Medical Business, Korea Computer Cooperation, Seoul, Rep. of Korea. Since summer 1994, he has worked at the Korea Research Institute of Bioscience and Biotechnology, where he first served as application programmer and then as team manager. His research interests include data mining, machine learning, pattern recognition, and information retrieval.
Daihee Park received his BS degree in mathematics from Korea University, Seoul, Rep. of Korea, in 1982, and his PhD degree in computer science from Florida State University, USA, in 1992. He joined Korea University, Sejong City, Rep. of Korea in 1993, where he is currently a Professor in the Department of Computer and Information Science. His research interests include data mining and intelligent database.
Lee H.
,
Song J.
,
Park D.
“Intrusion detection system based on multiclass SVM,”
in Proc. of RSFDGrC, LNAI 3642
2005
511 
519
Yu J.
,
Lee H.
,
Kim M.
,
Park D.
2008
“Traffic flooding attack detection with SNMP MIB using SVM,”
Computer Communications
31
(17)
4212 
4219
DOI : 10.1016/j.comcom.2008.09.018
Parsa S.
,
Naree S.A.
2012
“A new semantic kernel function for online anomaly detection of software,”
ETRI Journal
34
(2)
288 
291
DOI : 10.4218/etrij.12.0211.0293
Kloft M.
,
Laskov P.
2012
“Security analysis of online centroid anomaly detection,”
JMLR
13
3681 
3724
Banerjee A.
,
Burlina P.
,
Meth R.
“Fast hyperspectral anomaly detection via SVDD,”
in Proc. of ICIP
Sep. 1619, 2007
vol.4
101 
104
Jiaomin L.
,
Zhenzhou W.
,
Xinchun F.
,
Jing W.
“Intrusion detection technology based on SVDD,”
in Proc. of ICINIS
2009
15 
18
Guo S.M.
,
Chen L.C.
,
Tsai J.S.H.
2009
“A boundary method for outlier detection based on support vector domain description,”
Pattern Recognition
42
(1)
77 
83
DOI : 10.1016/j.patcog.2008.07.003
Kang I.
,
Jeong M.K.
,
Kong D.
2012
“A differentiated oneclass classification method with applications to intrusion detection,”
Expert System with Applications
39
(4)
3899 
3905
DOI : 10.1016/j.eswa.2011.06.033
GhasemiGol M.
,
Monsefi R.
,
Yazdi H.S.
“Intrusion detection by new data description method,”
in Proc. of ISMS
2010
1 
5
GhasemiGol M.
,
Monsefi R.
,
Yazdi H.S.
2010
“Intrusion detection by ellipsoid boundary,”
J. Netw. Syst. Manage
18
(3)
265 
282
DOI : 10.1007/s109220109165x
Park J.
,
Kim J.
,
Lee H.
,
Park D.
2003
“Oneclass support vector learning and linear matrix inequalities,”
IJFIS
3
(1)
GhasemiGol M.
,
Monsefi R.
,
Yazdi H.S.
“Ellipse support vector data description,”
in Proc. of EANN, CCIS 43
2009
257 
268
Wang D.
,
Yeung D.S.
,
Tsang E.C.C.
2006
“Structured oneclass classification,”
IEEE Trans. Syst., Man, Cybern. B
36
(6)
1283 
1295
DOI : 10.1109/TSMCB.2006.876189
Rajasegarar S.
,
Leckie C.
,
Bezdek J. C.
,
Palaniswami M.
2010
“Centered hyperspherical and hyperellipsoidal oneclass support vector machines for anomaly detection in sensor networks,”
IEEE Trans. Inf. Forensics Security
5
(3)
518 
533
DOI : 10.1109/TIFS.2010.2051543
G.Teodoro P.
,
D.Verdejo J.
,
M.Fernandez G.
,
Vazquez E.
2009
“Anomalybased network intrusion detection: Techniques, systems and challenge,”
Computers & Security
28
(12)
18 
28
DOI : 10.1016/j.cose.2008.08.003
Sekar R.
,
Gupta A.
,
Frullo J.
,
Shanbhag T.
,
Tiwari A.
,
Yang H.
,
Zhou S.
“Specificationbased anomaly detection: a new approach for detecting network intrusions,”
in Proc. of ACM CCS
2002
265 
274
Kruegel C.
,
Valeur F.
,
Vigna G.
,
Kemmerer R.
“Stateful intrusion detection for highspeed networks,”
in Proc. of IEEE Symp. On Security and Privacy
2002
285 
293
E.Tapiador J.M.
,
Teodoro P.G.
,
D.Verdejo J.E.
“Detection of webbased attacks through Markovian protocol parsing,”
in Proc. of ISCC
2005
457 
462
Ramadas M.
,
Ostermann S.
,
Tjaden B.
“Detecting anomalous network traffic with selforganizing maps,”
in Proc. of RAID, LNCS 2820
2003
36 
54
Yu Y.
,
Wu H.
“Anomaly intrusion detection based upon data mining techniques and fuzzy logic,”
in Proc. of IEEE SMC
2012
514 
517
Hoque M.S.
,
Mukit M.A.
,
Bikas M.A.N.
2012
“An implementation of intrusion detection system using genetic algorithm,”
IJNSA
4
(2)
109 
120
DOI : 10.5121/ijnsa.2012.4208
Mingqiang Z.
,
Hui H.
,
Qian W.
“A graphbased clustering algorithm for anomaly intrusion detection,”
in Proc. of ICCSE
2012
1311 
1314
Lee H.
,
Chung Y.
,
Park D.
“An adaptive intrusion detection algorithm based on clustering and kernelmethod,”
in Proc. of PAKDD, LNAI 3918
2006
603 
610
S.Taylor J.
,
Cristianini N.
2004
“Kernel Methods for Pattern Analysis,”
143 
155
Schölkopf B.
,
Smola A.
,
Müller K.R.
“Kernel principal component analysis,”
in Proc. of ICANNN, LNCS 1327
1997
583 
588
Tax D.M.J.
,
Juszczak P.
“Kernel whitening for oneclass classification,”
in Pattern Recognition with Support Vector Machines, LNCS 2388
2002
40 
52
Boyd S.
,
Vandenberghe L.
2004
“Convex optimization, ”
Cambridge Univ
Hindi H.
“A tutorial on convex optimization”
in Proc. of American Control Conference
2004
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.325.9447
vol.4
3252 
3265
Moshtagh N.
2005
“Minimum volume enclosing ellipsoids,”
GRASP Lab., Univ. of Pennsylvania
Sun P.
,
Freund R.M.
2004
“Computation of minimum volume covering ellipsoids,”
Operations Research
52
(5)
690 
706
DOI : 10.1287/opre.1040.0115
Kumar P.
,
Yildirim E.A.
2005
“Minimum volume enclosing ellipsoids and core set,”
Journal of Optimization Theory and Applications
126
(1)
1 
21
DOI : 10.1007/s1095700526536
Khachiyan L.
1996
“Rounding of polytopes in the real number model of computation,”
Mathematics of Operations Research
21
(2)
307 
320
DOI : 10.1287/moor.21.2.307
UCI KDD Archive
KDD Cup 1999 Data.
Available:
Tax D.M.J.
Data description toolbox (Dd_tools)
Available:
Eskin E.
,
Arnold A.
,
Prerau M.
,
Portnoy L.
,
Stolfo S.
“A geometric framework for unsupervised anomaly detection,”
in Applications of Data Mining in Computer Security
2002
77 
101
Dittrich D.
Distributed denial of service (DDoS) attacks/tools
Available: