Advanced
Improving Data Accuracy Using Proactive Correlated Fuzzy System in Wireless Sensor Networks
Improving Data Accuracy Using Proactive Correlated Fuzzy System in Wireless Sensor Networks
KSII Transactions on Internet and Information Systems (TIIS). 2015. Sep, 9(9): 3515-3538
Copyright © 2015, Korean Society For Internet Information
  • Received : April 20, 2015
  • Accepted : July 15, 2015
  • Published : September 30, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
U Barakkath Nisha
Department of Computer Science& Engg, PSNA College of Engg & Technology Dindigul, Tamilnadu - India
N Uma Maheswari
Department of Computer Science& Engg, PSNA College of Engg & Technology Dindigul, Tamilnadu - India
R Venkatesh
Department of Information Technology, PSNA College of Engg & Technology Dindigul, Tamilnadu - India
R Yasir Abdullah
Department of Computer Science& Engg, Sri Subramanya College of Engg & Technology Palani, Tamilnadu - India

Abstract
Data accuracy can be increased by detecting and removing the incorrect data generated in wireless sensor networks. By increasing the data accuracy, network lifetime can be increased parallel. Network lifetime or operational time is the time during which WSN is able to fulfill its tasks by using microcontroller with on-chip memory radio transceivers, albeit distributed sensor nodes send summary of their data to their cluster heads, which reduce energy consumption gradually. In this paper a powerful algorithm using proactive fuzzy system is proposed and it is a mixture of fuzzy logic with comparative correlation techniques that ensure high data accuracy by detecting incorrect data in distributed wireless sensor networks. This proposed system is implemented in two phases there, the first phase creates input space partitioning by using robust fuzzy c means clustering and the second phase detects incorrect data and removes it completely. Experimental result makes transparent of combined correlated fuzzy system (CCFS) which detects faulty readings with greater accuracy (99.21%) than the existing one (98.33%) along with low false alarm rate.
Keywords
1. Introduction
W ireless sensor networks face challenges in providing accurate data in base station. Anomalies are refined by increasing data accuracy in WSNs. In a group of data, some data may follow a deviated pattern from the other data and are termed as "Anomalies". Dynamic environments are usually monitored by sensors over a period of time where the logs are created for future use. Self directed, minuscule and low power sensor nodes are presented in all WSNs. Sensing, Storing, Computing and Communication are the individual threads of sensor nodes [1] . All sensor nodes sense the available environment, save data in the main memory, interact the data between neighbor nodes and evaluate the computation process. They require energy for performing the above mentioned operations. Since the sensor node's battery capacity is very small, these energies should be utilized only for important processes. According to G.J Pottie, the energy needed for communication is more than the energy needed for computation [2] . Transceivers send and receive packets by consuming smaller energy. All researches focus only on reduction in communication overhead while transmitting the packets but unwanted data transmissions reduce the energy used in WSNs [3] . In this context, data aggregation avoids incorrect data in which sensors are designed to do so. It also tries to eliminate redundant data transmission by reducing the energy consumption of nodes. An important function of any WSNs is in analyzing the data which is saved as log in the form of readings by sensor nodes.
Large numbers of highly correlated data are entailed by redundant data and energy is exhausted in larger amount which again will be processed and received by the base station. Network lifetime is increased by providing fused information and eliminating redundant transmission through data aggregation [4] . When anomalies are not detected during the data aggregation process, the data inaccuracy occurs [5] . Data aggregation uses cluster structure, where data travels from source to sink in a hierarchical way. The data collected by clusters from one or more cluster members is applied to aggregation functions. Sink often receives the aggregated value then and there. In this context, some faulty nodes may be present, which can produce incorrect readings and deviate the exact output.
In general, anomaly detection can be classified into prior knowledge based like statistical, rule-based etc., and prior knowledge free namely data mining, computational intelligence etc., An ideal anomaly detection system should increase the level of data accuracy in base station and cluster head. Prior knowledge free technique lacks in giving accurate anomaly detection due to missing quick updating of normal profile, high computational complexity by introducing complex machine learning algorithms and slow detection by developing training model with agents. Prior knowledge based approach like statistical based techniques provide good detection rate with less false alarm rate and it requires a mathematical model, which takes more computational time [6] [7] . To overcome these problems, a proactive fuzzy system is implemented by using rule based anomaly detection scheme which is developed based on mathematical assumptions and information predicted by experts. Fuzzy rule based anomaly detection system provides high anomaly detection accuracy by generating confidence decision making rules and less computational complexity with fewer number of rules [10] .
In the proposed system, the term "proactive" refers to a fuzzy system which is executed in each cluster head for avoiding duplicate and abnormal data before the data is being sent to base station. After removing the bad data, base station receives accurate information by employing proactive system which is implemented in distributed fashion. Two levels of anomaly elimination process are dealt in this paper. The first level concentrates on finding faulty nodes in individual sensor nodes. The second level expresses mediator node's logs and based on those logs it finds the faulty mediator nodes and discard those nodes over the distributed networks. Fuzzy c-means clustering focus on creating input space for the fuzzy system. Input space partitioning acts as an input for the proposed proactive fuzzy system. Anomaly will be discovered by applying fuzzy logic with all qualified correlation techniques like spatial, temporal and attribute correlation. The leftovers of the paper are organized as follows: Section 2 is endowed with some important connected concepts needed for our proposed algorithm. Section 3 explains the network model and problem statement. Section 4 presents the proposed approach and methodology. Section 5 infers the experimental evaluation based on both synthetic and real data sets. Finally section 6 elucidates the future work by concluding the paper.
2. Related Work
Anomaly detection and removing noisy data in wireless sensor networks have been examined in varieties of research works [6] . Existing anomaly detection techniques can be categorized into two stream classes. The first stream uses supervised learning that formulates prior knowledge in developing a customary profile. The second stream is placed on an unsupervised learning to develop a customary profile that is generated based on prior knowledge of sensed data. In clustered sensor architecture, nodes perform different roles like sensing and leading node. The leading node or cluster head performs aggregation and sensing nodes sense the reading at different time slots. Cluster generation can be performed by various conventional clustering protocols like LEACH, HEED [8] [9] etc., The existing clustering protocols separate the nodes based on Euclidean distance metrics without considering the correlation distance among the sensor nodes which lead to informal cluster formation where clusters will not guarantee in providing reliable and accurate data to base station. To overcome this issue, a robust fuzzy c-means clustering is proposed for effective cluster formation with less computational complexity.
Previous works attempt in anomaly detection by aiming specifically in classifying the data as correct or incorrect without analyzing logic and originally happened event status in the environment. Daniel et al. [11] have proposed classification based voting method for anomaly detection. They have proposed five different classifiers that are used to detect anomaly with reliable estimations to replace the measurement affected by anomalies. This method fails in cases where large dataset are considered. Suat et al [12] have focussed data aggregation and authentication protocol for security, confidentiality and false data detection. It also reduces communication complexity upto 60% and computational complexity is increased. The author in [13] proposed a statistical data analysis for outlier detection on high dimensionality data with high false negative which fails to focus neither on spatial correlation nor spatio-temporal correlation. Janakiram et. al [14] presents a technique based on bayesian belief network to identify local anomalies by focusing on spatial and temporal with attribute correlation based on conditional probability for detecting anomaly effectively. This method is not suitable for dynamic network topology. Fuzzy system concludes its results based on decisions made by fuzzy inference system.
Chitradevi et.al [15] proposed anomaly detection based on distributed agglomerative clustering approach where anomaly is removed both at local and global level. Cluster distance and density measures are used to form optimal cluster thereby removing anomalies with affordable computational and communication complexity. Yanz zhang et al. [16] have proposed an ellipsoidal based support vector machine, which classifies sensor node data as anomaly by using ellipsoidal SVM based online anomaly detection and adaptive anomaly detection for multivariate data. They used the time window concept for identifying changes in normal behavior of the system. This technique suffers from some computational complexity due to updating a normal profile periodically. Y.Zhang et.al [17] proposed a statistical based outlier detection which is based on time series and geostatistics analysis with spatial and temporal correlation concepts. Their way of modeling temporal correlation by fitting auto regressive moving average (ARMA) model and spatial correlation model is developed by using variogram model. Krasimira kapitanova's et al. [18] proposed general fuzzy logic system for event detection by using spatial and temporal semantics. They decrease the number of rules by combining simple rules and trimming unwanted rules in rule base system. They used fuzzy logic instead of taking fixed thresholds and crisp values, which improve the accuracy of fire event detection. Liang et. al [19] proposed a double sliding window detection to increase the detection rate of event detection. However, they elaborate the effect of fuzzy logic and the power of spatial and temporal possessions of the data in classifying of detection rate. The authors Heshan Kumaragea et al. [20] proposed a fuzzy data modeling for distributed anomaly detection in different real data sets. Scalability and sensitivity of this approach are low while considering a large number of nodes.
From the literature survey, It is evident that an ideal anomaly detection system should produce high accuracy with minimal energy consumption. The proposed method has three main contributions. First, an input space partitioning is created by using robust fuzzy c-means clustering that results in forming more accurate clusters. Second, sensor's space, time and attribute correlation acts are incorporated into the fuzzy logic rule-base to further improve the accuracy of anomaly detection. Third, rules generated by rule based system are reduced by applying rule trimming function without affecting the detection rate of the system.
3. Network Model and Problem Statement
A distributed heterogeneous WSN is considered where enormous number of sensor nodes with limited power resource senses the physical phenomena and the little number of aggregator nodes perform anomaly detection with data aggregation. The network topology which uses undirected graph is considered G(S,E) where S represents sensor nodes and E as edges that connects two nodes within a cluster. It is assumed that sensor nodes and base station are fixed after deployment and each sensor has a separate identifier. Fig. 1 presents the sensor network's topology. Initially, all nodes perform the clustering operation on their own local data. An appropriate clustering protocol for implementing distributed clustered wireless sensor network is proposed, where the deployment area is grouped into several clusters. Each cluster head performs distributed data aggregation from the cluster members, where data aggregation reduces unwanted data transmission and increases the level of data accuracy by eliminating duplicate and unwanted data.
PPT Slide
Lager Image
Distributed clustered network representation
The aim is to perform anomaly detection and separate anomalous data in each cluster of the network. Let C k denote numbers of clusters where each cluster is having n numbers of sensor nodes i.e { S 1 , S 2 , S 3 .......... S n ϵ C k } and each cluster head is interconnected with all other cluster heads in the network. A n denote number of attributes involved in multi sensor nodes i.e { A 1 A 2 , A 3 .......... A n € S n } where, each attribute has partial or full dependency with other attributes. In the first phase of the proposed system, energy consumption is reduced with single-hop distance between the node and cluster head by applying the robust fuzzy c-means clustering. In the second phase, a fuzzy based comparative correlation technique is implemented and it removes unwanted data transmission by eliminating the anomalous data that are characterized as observations from a given sensor that are corrupted due to sensor malfunction.
4. Proposed Methodology
The proactive anomaly detection system starts by input space partitioning, where robust fuzzy c-means clustering is employed and it ends by detecting incorrect data accurately and removing it entirely by employing correlative fuzzy logic algorithm. By testing the conventional techniques, it is inferred that it ensures whether the data is incorrect or not and does not concentrate on analyzing the event based approach. To solve this issue, a fuzzy logic is put into practice with correlation technique for extracting the different regions for analyzing erroneous, suspicious and acceptable sample data generated by sensor nodes at different point of time.
- 4.1 Fuzzy Logic System
The proposed fuzzy classification system includes two basic steps. First step explains structure confirmation process which contains robust fuzzy c-means algorithm based on robust mahalanobis distance and new dissimilarity function. In second step, fuzzy inference engine classifies the input sample according to fuzzy rule set and reasoning method generated by fuzzy-spatial, fuzzy-temporal and fuzzy-attribute correlation acts. The proposed method is summarized in Fig. 2 . Fuzzy system is characterized by a set of linguistic statements based on experts knowledge. The experts knowledge are usually in the form of “ if-then ” rules [21] [22] . Fuzzification process converts the crisp input into fuzzy membership function and fuzzy inference system performs rule decision making and de-fuzzification converts fuzzy output into crisp output.
PPT Slide
Lager Image
Framework model of Proactive Fuzzy system with anomaly detection
- 4.2 Robust Fuzzy C-means Clustering
The conventional Fuzzy c-means (FCM) clustering analysis is applied to assemble same type of data in one cluster where the similar form of data should be near, and the dissimilar form of data should be farer. It can hold lot of information about the data than hard k- means clustering algorithm [23] [24] . FCM uses the Euclidean distance concept and simple cost function for cluster formation. The cluster heads are responsible for collecting data from the group and conveying to the next level cluster head or base station finally for detection. A variety of fuzzy clustering methods have been proposed and most of them are based on distance criteria [25] [26] .
Clustering partitions a dataset into several groups in such a manner that the similarity within a group is larger than that among its peers. In the proposed robust fuzzy c-means clustering (RFCM), the concept of robust mahalanobis distance and new cost function based on typicality measure and density measure are used. The distance of the vector from the centroid in a multidimensional space is defined by the term "Robust Mahalanobis Distance (RMD)" represented by a correlated independent variable [27] . Cost function is calculated by using distance and density measures. RFCM partitions the number of sensor nodes into different fuzzy groups. Let {S i :i=1, 2....n } be a set of sensor nodes. Each node S i senses m number of physical phenomena like light, voltage and humidity, such that S i {x 1 , x 2 .........x m } . The process of clustering is to assign the sensor nodes into number of clusters ( Ck ) where {C k :k=1, 2....n} by using distance metrics and dissimilarity function [28] .
Step 1: Initialize the membership matrix G which is allowed to have elements with values between 0 and 1.
PPT Slide
Lager Image
RFCM allows each feature vector belonging to every cluster with a fuzzy truth value ranging between low (0) and high (1) and CC i denoting cluster center.
Step 2: Calculate centroids of fuzzy cluster centers CC i where, i=1, 2... n
PPT Slide
Lager Image
where p is the degree of fuzzification.
Step 3: Compute dissimilarity function for RFCM based on density and typicality measures. It replaces conventional fuzzy c-means clustering distance (Euclidean distance) with RMD.
Step 3.1: Compute typicality measure
PPT Slide
Lager Image
.
PPT Slide
Lager Image
where, n is the number of training data, C k is the number of clusters, Gk ( Si ) membership of sensor node i in cluster k and RMDki 2 is expressed as follows:
PPT Slide
Lager Image
PPT Slide
Lager Image
where δm is the sample standard deviation of D ( Xi ).
Step 3.2: Compute density measure FD .
PPT Slide
Lager Image
Where, dij is the distance between i and j which indicates position of sensor nodes.
Final dissimilarity function is optimized by iterating F =
PPT Slide
Lager Image
+ FD using equation 3 and 4.
PPT Slide
Lager Image
and
PPT Slide
Lager Image
Step 4: Compute new membership function G i,j(new)
PPT Slide
Lager Image
where p is the degree of fuzzification, dij,dkj are calculated by using equation 4. The cluster centers CC n and the membership G i (S i ) are optimized by using RFCM.
- 4.3 Anomaly Detection using Fuzzy Reasoning based on Correlation Acts
Sensors should be continuously monitored where anomaly is available and the readings are taken from multiple sensors over a period of time since they are considered to be highly dependable and volatile. While analyzing the existing methods it is understood that no methodology is implemented by applying fuzzy logic with attribute, spatial and temporal correlation to anomaly detection. Hence fuzzy logic is applied with comparative correlation techniques for classifying of error and originally happened events. Anomaly detection is the combined output of all physical phenomena's readings in time and spatial location and is the rate of change of all attributes used in the sensor deployment area. Fuzzy reasoning for anomaly detection uses various linguistic variables for spatial, temporal and attribute acts. The outputs of the three separate techniques are given as input to the proactive anomaly detection system which will classify the percentage level of anomalies present in the environment and named them as superior, doubtful and inferior respectively.
- 4.3.1 Spatial Act
Spatial act is commonly known as the relationship among the nearest neighbor node readings. To make the system accurate and reduce the false alarm, an anomaly detection system needs to be designed with care [29] . For this, various readings from multiple sensors are included at various time intervals. There lies a negative correlation between the true probability report and the distance among the reported sensors. Hence while dealing with anomaly detection logic, spatial location concepts are added [30] . The spatial protector or linguistic variable is augmented with the rules in the rule-base. This spatial act rule is applied at each cluster generated by robust fuzzy C-means algorithm. Fig. 3 shows the confidence decision making of spatial act. In Table 1 three linguistic variables are declared as Hold (H), Remote (R), and Farthest (F). These are used to analyze the sensor's farthest distance. The format of the rules and membership function μSA(d) described in spatial act are as follows:
PPT Slide
Lager Image
PPT Slide
Lager Image
Decision making of spatial act
Spatial Act Rule Structure
PPT Slide
Lager Image
Spatial Act Rule Structure
The assurance levels of spatial act are classified as Low Spatial Act (LSA), Medium Spatial Act (MSA) and High Spatial Act (HSA) where, LSA denotes too farthest nodes, MSA comprises of nodes which fall between nearer and farthest nodes and HSA comprises of nodes which are too nearer.
- 4.3.2 Temporal Act
Temporal act is the relationship between the data in the current moment at time t and the data in the previous moment at time t-1. The sensor readings of the temporal properties are considered in order to decrease the false alarms. The sensor readings indicate a particular data generated at short interval of time to achieve high anomaly detection confidence. Anomaly detection confidence is increased whenever the temporal distance between the sensor readings decrease and vice versa. The inherent nature of sensor communication makes the temporal work specifically important [29] . The temporal defender is applied at each cluster generated by robust fuzzy c-means algorithm. Fig. 4 shows the confidence decision making of temporal act. In Table 2 three variables are declared as Diminutive (D), Average (A), and Extensive (E).These are used to analyze the sensor's readings time difference. The assurance levels of temporal act are classified as Low Temporal Act (LTA), Medium Temporal Act (MTA), and High Temporal Act (HTA) where, LTA denotes too longer time duration of readings, MTA covers the node's reading which falls between short and wide time difference and HTA consists of sensor nodes readings generated with shorter duration. The format of the rules and membership function μTA(t) described in temporal act are as follows:
PPT Slide
Lager Image
PPT Slide
Lager Image
Decision making of temporal act
Temporal Act Rule Structure
PPT Slide
Lager Image
Temporal Act Rule Structure
Temporal act can be processed on same sensor readings at different time interval or a node's reading is analyzed with its neighbor node's readings with same time duration.
- 4.3.3 Attribute Act
Attribute act can be expressed as the relationship among the sensed physical phenomena of sensor nodes. Basically, common relationship should exist among the physical phenomena like temperature and pressure in multi sensor node [14] . The proposed anomaly detection system's accuracy is increased by incorporating attribute act with spatial and temporal correlation acts. Multi-sensor senses the number of readings with respect to number of attributes ( An ) sensed by the node. Fully dependent attribute values should be coherent with each other. For this reason, when dealing with anomaly detection logic, attribute cohesive concepts are added. The attribute shield is augmented with the rules in fuzzy reasoning. This attribute act rule is applied at each node in the cluster generated by RFCM algorithm. Fig.5 shows the confidence decision making of attribute act. In Table 3 three variables are declared as Low Coupling (LC), Medium Coupling (MC), and High Coupling (HC). These are used to analyze the dependency of attributes readings. The format of the rules and membership function μAA(c) described in attribute act is as follows:
PPT Slide
Lager Image
PPT Slide
Lager Image
Decision making of attribute act
Attribute Act Rule Structure
PPT Slide
Lager Image
Attribute Act Rule Structure
The assurance levels of attribute act are classified as Low Attribute Act (LAA), Medium Attribute Act (MAA) and High Attribute Act (HAA) where, LAA denotes less dependency, MAA comprises of semi-dependency among the attributes and HAA expresses the full dependency relationship among the attribute values of sensor node.
- 4.3.4 Anomaly Detection with Comparative Correlation Act
Anomaly detection with comparative correlation system comprises spatial, temporal and attribute acts. Test data is evaluated by using proactive fuzzy system. In Table 4 three correlation act's assurance level are declared as linguistic variables for final proactive anomaly detection system. Fig. 6 shows the final decision making process with combined correlation acts. The anomaly assurance levels of the proposed system are classified as Inferior Nodes (IN), Doubtful Nodes (DN) and Superior Nodes (SN). Anomaly detection assurance level is judged by self-assurance of Spatial Act (SA), Temporal Act (TA) and Attribute Act (AA). The format of the rules described in anomaly detection engine is as follows:
  • IF SA is LSA and TA is LTA and AA is LAA;
  • THEN Anomaly classification is IN
Anomaly detection rule structure
PPT Slide
Lager Image
Anomaly detection rule structure
PPT Slide
Lager Image
Framework model of Anomaly with correlation acts
- 4.3.5 Mitigating Correlation Rules
The number of correlation rules generated by fuzzy rule mining technique can be large. Large number of rules may contain tedious rules that may mislead the classification process and they may increase the computation time to classify the anomaly [31] [32] . Trimming repetitive and unsuitable rules will increase the accuracy of the performance of the proposed system. The following three rule mining conditions are examined in removing unimpressive rules .
1. First, eliminate specific rules and retain only the general rules with high confidence. i.e Joining rules with akin conclusion.
It is assumed that there are two rules only
  • ®1: X ⇒ Z and ®2: Y ⇒ Z and X⊆Y, first rule ®1is the general rule and accepted if
  • a. the confidence of ®1is greater than the confidence of ®2or
  • b. the confidence of ®1is equal to the confidence of ®2but support of ®1is greater than support of ®2or
  • c. both the confidence and support of ®1is equal to the confidence and support of ®1and ®1⊂ ®2then ®2is eliminated.
2. Second, the controversy rules like ® 1 : X ⇒ Y and ®2: X ⇒ Z are also be eliminated.
3. Third, remove imperfect rules which do not satisfy the spatial,temporal and attribute restrictions. i.e every possible combination of input variables should be analyzed by each rule in fuzzy inference system.
In this work, these three rule mining conditions are considered to eliminate duplicate, uninteresting and controversial rules and produce a trimmed set of rules. Final Rules are stored in the transactional database which has been used for building a new proposed anomaly detection. If none of the rules in the rule-base is persuaded, we pioneer a default rule is pioneered.
5. Experimental Classification Results and Analysis
The proposed proactive anomaly detection system is tested using both synthetic and real data sets. In this section, these experiments and the results are described. This is performed in terms of anomaly detection rate or sensitivity, specificity and false alarm rate for both clean and unclean data sets. The algorithm on two real-life data sets is evaluated. The first dataset is obtained from Intel Berkeley Research Lab (IBRL) [33] and the second dataset is obtained from SensorScope project, which was located at the Grand-St-Bernard (GSB) pass at 2400 m between Switzerland and Italy [34] . Fig. 7 shows the deployment location of sensor nodes in the IBRL. Fig. 8 shows the deployment location of sensor nodes in the Grand St. Bernard deployment. In addition, comparative analysis is also performed between the proposed system and the work in [20] to check the rare effectuation of the proposed anomaly system.
PPT Slide
Lager Image
Sensor nodes in IBRL deployment
PPT Slide
Lager Image
Sensor nodes in GSB
- 5.1. Performance Evaluation
The performance of the proposed anomaly system is evaluated in terms of overall accuracy, sensitivity, specificity, positive predictive value and negative predictive value. Overall accuracy is the ability of the proposed system to detect the anomaly correctly. Sensitivity or Detection rate is the ability of the system to detect positive (abnormal) cases. Specificity is the ability of the system to detect negative (normal) cases. False Alarm Rate (FAR) is the ability of the system to detect positive (normal) cases. Positive Projection Rate (PPR) is defined as the proportion of positive test results that are true positives and Negative Projection Rate (NPR) is the proportion of those with a negative test result. The measures are described below:
PPT Slide
Lager Image
where TP, TN , FP and FN are referred to True Positive rate (abnormal data correctly classified) ,True Negative rate (normal data correctly classified), False Positive rate (normal data classified as abnormal) and False Negative rate (abnormal data classified as normal one) respectively.
- 5.2. IBRL Dataset
IBRL data set is analysed, which has 54Mica2Dot sensors with 4 attributes, during the 720 hours period between 28 th February 2004 and 5 th April 2004. During the 30 day period, the 54 sensors collected about 2.3 million readings [33] . The data in the data set is sticked at the time of exportation, namely March 2004 during the time interval 00:00 am to 03:59 am. The skeleton structure of the data set is illustrated in Table 5 . Only three features are considered namely temperature, humidity and voltage. Particularly variations in voltage are highly correlated with temperature.
Skeleton Structure of Intel Lab Data Set
PPT Slide
Lager Image
Skeleton Structure of Intel Lab Data Set
The synthetic data is also generated for the above said features by using the multivariate random generation function with different corruption level for checking scalability of the proposed anomaly detection system.
- 5.3. SensorScope Dataset
The SensorScope project of GSB data set is analysed, which has 23 sensors with several meteorological attributes such as temperature, humidity, solar radiation, soil moisture, and so on [34] . During the period of 2 months between September 2007 and October 2007, the 23 sensor nodes sense the readings with a sampling frequency of 2 minutes and are grouped in two clusters. Five numbers of nodes are enclosed in one cluster and remaining eighteen numbers of nodes are enclosed in another cluster.
The data in the data set is kept intact at the time of exportation, namely 29-30 September 2007 during the time interval 06:00 am to 14:00 am. The skeleton structure of the data set is illustrated in Table 6 . Only three features namely ambient temperature, surface temperature and relative humidity are considered.
Skeleton Structure of GSB Data Set
PPT Slide
Lager Image
Skeleton Structure of GSB Data Set
- 5.4. Evaluation on Datasets
To assess the proposed method, first the data sets are normalized by identifying extreme values or specious effects and removing them. Cleaned data were regarded as customary data with the use of scatter plot and chi-square test. Anomalies were randomly inserted in one or more nodes in each cluster, varying the range of data corruption level from 10% to 70%. Proposed system is implemented in the MATLAB version 2013 environment. The accuracy of data classification is investigated with respect to identifying anomalous and normal data points by calculating the values for sensitivity, specificity and positive projection rate. Considering the IBRL dataset, the proposed System is applied to several numbers of clusters ranging from 5 to 10. Robust fuzzy c- means algorithm is applied for finding optimal number of clusters in the data set. Fig. 9 shows the number of optimal clusters used for evaluating anomaly detection technique by using IBRL. Six cases were selected with respect to the number of clusters from 5 to 10. By considering the GSB dataset, three cases were selected with respect to the number of clusters from 3 to 5. Fig. 10 shows the optimal clusters used for evaluating anomaly detection technique in GSB.
PPT Slide
Lager Image
Robust Fuzzy C-means Clusters in IBRL
PPT Slide
Lager Image
Robust Fuzzy C-means Clusters in GSB
Each cluster readings were experimented by applying spatial, temporal and attribute acts. Assurance level of each act can be evaluated for separating inferior, superior and doubtful nodes. Fig. 11 depicts the relationship among the attributes involved in the evaluation by using IBRL and GSB data sets. Fig. 12.1 and Fig. 12.2 show the membership functions of sensor readings in cluster 5 and report low, medium and high assurance level based on the correlation act.
PPT Slide
Lager Image
Data Distribution with attribute correlation in IBRL (left), Data Distribution with attribute correlation in GSB (right)
PPT Slide
Lager Image
Fuzzy Membership function with attributes in IBRL
PPT Slide
Lager Image
Fuzzy Membership function with attributes in GSB
In general, detection accuracy of the proposed system is the ability to diagnose the anomalous data correctly. Sensitivity measures the proportion of actual positive cases (Anomaly) which are correctly identified as the percentage of incorrect data is correctly identified. Specificity measures the proportion of negative cases (Conformity) which are correctly identified as the percentage of correct data are correctly identified. FAR measures the proportion of positive cases (Anomaly) which are incorrectly identified as the percentage of correct data are incorrectly identified. PPR implies probable presence of anomaly in a given positive test result. NPR implies the probable absence of anomaly in a given negative test result. Table 7 shows the overall performances of proposed anomaly detection system. Here the performance of the proposed system is compared with that of approach in [20] . Therefore, it is clear that the proposed system achieves significant gain in detecting accuracy compared to the existing work [20] .
Performances of Fuzzy-spatial, Fuzzy-Temporal, Fuzzy-Attribute and existing work compared with proposed correlated fuzzy system
PPT Slide
Lager Image
Performances of Fuzzy-spatial, Fuzzy-Temporal, Fuzzy-Attribute and existing work compared with proposed correlated fuzzy system
Fig. 13 and Fig. 14 illustrate the performance of the proposed system for evaluating IBRL and GSB dataset. It is observed that the approach in [20] has less sensitivity and high false alarm rate compared to CCFS. Specifically, the proposed method considers fuzzy based correlation of data, offers 0% false alarm and 100% detection rate till 40% of the nodes in the network are found to be anomalous. Even for corruption level above 40% to 70% the average false alarm created is simply 2.57% in IBRL and 4.28% in GSB data set respectively.
PPT Slide
Lager Image
Sensitivity for altering corruption level
PPT Slide
Lager Image
Sensitivity for altering corruption level
Fig. 15 shows the performance of proposed system with varying cluster sizes. For this evaluation, different set of test data are randomly generated. The detection rate and false alarm rate are evaluated both for IBRL and GSB dataset. In each case, five clusters in IBRL and 3 clusters in GSB are considered and the anomaly is randomly inserted in the clusters. As observed from this figure, the proposed system improves the detection rate by taking into reflection and combining efficiently several correlation acts with respect to the optimal clusters through the use of RFCM. As observed in Fig. 16 , anomaly detection and misdetection fractions are attractively stable while the numbers of nodes from 50 to 500 are increased. This result implies that our fuzzy based anomaly detection has very fastidious scalability as it works well under different network sizes without losing its performance.
PPT Slide
Lager Image
Performance assessment for shifting cluster size
PPT Slide
Lager Image
Scalability Comparison
Finally, these performances at various anomalous percentages ranging from 10% to 70% are evaluated for 500 nodes. Anomalous percentage is defined as the ratio between total numbers of malicious nodes in network to total number of nodes present in the current network.
To further understand the behaviour of the proposed CCFS approach, it is necessary to compare it with well established state of the art anomaly detection algorithm. To evaluate the efficiency of the CCFS model, the computational complexity, communication overhead and memory complexity are considered. The computational complexity incurred by our model is O(nd+α+p+q) related to the calculation of spatial, temporal and attribute correlation. The communication overhead is O(nd). Correlation act has no communication overhead because the analysis was performed locally at each node. The memory complexity is represented as O(ndr), where r represents number of rules. Less number of rules saves more memory space. Table 8 explains the complexity of different state of the art anomaly detection approaches. Albeit our method infers this method proves that the detection rate is high compared to other methods. Computational complexity, communicational complexity and memory complexity are slightly reduced when compared to other techniques.
Complexity Analysis of Anomaly Detection
PPT Slide
Lager Image
Legends: n - Number new of records at time, d-dimension of the observations, α- Attribute correlation, p-spatial correlation, q- temporal correlation, c - Number of clusters, v- intermediate values, r-number of rules
6. Conclusion
In this paper, a system which employs fuzzy based anomaly detection is developed and it uses fuzzy logic to classify anomaly and conformity based on the spatial, temporal and attribute correlation acts. Each act is evaluated for various numbers of clusters generated by robust fuzzy c-means clustering. After cataloguing of data, superior nodes are labelled as customary, inferior nodes as anomaly and doubtful nodes are retested until fixing the final decision. The experimental result proves that the proposed CCFS outperforms existing work in various aspects like anomaly detection accuracy, false alarm, sensitivity and specificity in decision making support.
BIO
U. Barakkath Nisha had received her Bachelor of Engineering in Computer Science & Engineering in 2008 with distinction from Anna University, Chennai. She got her Master of Engineering in Computer Science & Engineering from Anna University, Coimbatore in 2010 as full time student with distinction. Currently she is working as assistant professor in Department of Computer Science & Engineering in PSNA College of Engineering of Technology, Dindigul. Her research interests are Ad hoc Networks, Wireless Networks, Information Security, Computer Networks, and Sensor Networks etc. She had published various national, international conferences related to Wireless Sensor Networks for the past 2 years.
N. Uma Maheswari received her M.E in Computer Science and Engineering from the Madras University, Chennai, India in 2002 and Ph.D. in Information and Communication Engineering in 2011 from Anna University, Chennai. Currently, she is working as a Professor in the Department of Computer Science and Engineering at the P.S.N.A. College of Engineering and Technology, Dindigul, India. She has totally 15 years of teaching experience which includes 11 years of research experience. Her research interests include Biometrics, Image processing, Compiler design, Artificial Intelligence, Speech Processing, and Wireless Sensor Networks. She has published 20 papers in International journals, 2 papers in National journals, and presented 22 papers at International conferences, and 10 papers at National conferences. She has co-authored a book entitled ‘‘Compiler Design’’ published by Yes Dee Publishing. She is a recognized Ph.D. supervisor in Anna University of Technology in the area of Image processing; Cloud computing, Network security and Networks.
R. Venkatesh received his M.E in Computer Science and Engineering from Anna University Chennai in India, in 2007 and Ph.D. Computer Science and Engineering in 2010 at Alagappa University, Karaikudi. Currently, he is working as a Professor in the Department of Information Technology in PSNA College of Engineering and Technology, Dindigul, India. He has totally twenty years of teaching experience which includes 11 years of research experience. He has published 20 papers in International journals, 2 papers in National journals, and presented 22 papers at International conferences and 10 papers at National conferences. His research interests include Biometrics, Artificial intelligence, Compiler design, Neural Networks, Soft computing, Network security and Networks. He has co-authored a book entitled ‘‘Compiler Design’’ published by Yes Dee Publishing.
R.Yasir Abdullah received his Bachelor of Engineering in Electronics & Communication Engineering in 2005 from Anna University , Chennai. He got his Masters in Computer Science & Engineering as full time student with distinction by Anna University - Coimbatore, Tamilnadu, India in 2009. Currently he is working as assistant professor in Computer science Department for SSCET, Palani, India. His research interests’ lie in Wireless Networks, Information Security, Computer Networks, Sensor Networks etc. He is been publishing various national conferences related to Wireless Networking for the past 3 years.
References
Rawat P. , Singh K.D. , Chaouchi H. , Bonnin J.M. 2014 "Wireless sensor networks: a survey on recent developments and potential synergies," Journal of super computing 68 1 - 48    DOI : 10.1007/s11227-013-1021-9
Pottie G.J. , Kaiser W.J. 2000 "Wireless Integrated network sensors," ACM Communications 43 (5) 51 - 58    DOI : 10.1145/332833.332838
Xu H. , Huang L. , Zhang Y. , Huang H. , Jiang S. , Liu G. 2010 "Energy Efficient cooperative data aggregation for wireless sensor networks,” Journal of parallel and distributed computing 70 (9) 953 - 961    DOI : 10.1016/j.jpdc.2010.05.009
Sun Bo , Shan Xuemei , Wu Kui , Xiao Yang 2013 "Anomaly Detection Based Secure In-Network Aggregation for Wireless Sensor Networks," IEEE Systems Journal 7 (1) 13 - 25    DOI : 10.1109/JSYST.2012.2223531
Roy S , Conti M , Setia S , Jajodia S 2012 "Secure data aggregation in wireless sensor networks," IEEE Information Forensics and Security 7 (3) 1040 - 1052    DOI : 10.1109/TIFS.2012.2189568
Xie Miao , Han Song , Tian Biming , Parvin Sazia 2011 "Anomaly Detection in Wireless Sensor Networks: A survey," Journal of Network and computer Applications 34 1302 - 1325    DOI : 10.1016/j.jnca.2011.03.004
O’Reilly C , Gluhak A , Imran M.A , Rajasegarar S 2013 "Anomaly Detection in Wireless Sensor Networks in a Non-Stationary Environment,” IEEE Communications Surveys & Tutorials 16 (3) 1 - 20
Forero P. , Cano A. , Giannakis G. 2011 "Distributed clustering using wireless sensor networks," IEEE Journal of Selected Topics in Signal processing 5 (4) 702 - 724    DOI : 10.1109/JSTSP.2011.2114324
Neamatollahi P , Mashhad Iran , Taheri H , Naghibzadeh M , Yaghmaee M "A hybrid clustering approach for prolonging lifetime in wireless sensor networks," IEEE International Symposium on Computer Networks and Distributed Systems February 2011 170 - 174
Baig Z.A. , Khan S.A. "Fuzzy Logic-Based Decision Making for Detecting Distributed Node Exhaustion Attacks in Wireless Sensor Networks," in Proc. of Second International Conference on Future Networks, IEEE January 2010 185 - 189
Curiac Daniel-Ioan , Volosencu Constantin 2012 "Ensemble based sensing anomaly detection in wireless sensor networks," Journal of Expert Systems with Applications 39 9087 - 9096    DOI : 10.1016/j.eswa.2012.02.036
Ozdemir Suat , Çam Hasan 2010 "Integration of False Data Detection With Data Aggregation and Confidential Transmission in Wireless Sensor Networks," ACM Transaction on Networking 18 (3) 736 - 749    DOI : 10.1109/TNET.2009.2032910
Chitra Devi N , Palanisamy V , Baskaran K , Barakkath Nisha U “Outlier aware Data Aggregation in Distributed Wireless Sensor Network using Robust Principal Component Analysis,” in Proc. of Second International Conference on Computing, Communication and Networking Technologies, IEEE July 2010 1 - 9
Janakiram D , Mallikarjuna A , Reddy V , Kumar P "Outlier Detection in wireless sensor networks using Bayesian belief Networks,” in Proc. of International IEEE Workshop on Software for Sensor Networks August 2006 1 - 6
Chitra Devi N , Palanisamy V , Baskaran K , Prabeela S 2011 “Efficient distributed clustering based anomaly detection algorithm for sensor stream in clustered Wireless Sensor Network," European Journal of Scientific Research 54 (4) 484 - 498
Zhang Yang , Meratnia Nirvana , Havinga Paul J.M 2012 "Distributed online outlier detection in wireless sensor networks using ellipsoidal support vector machine," Journal of Ad Hoc Networks 11 1062 - 1074    DOI : 10.1016/j.adhoc.2012.11.001
Zhang Y. , Hamm N.A.S. , Meratina N. , Stein A. , Van de Voort M. , Havinga P.J.M. 2011 "Statistics based outlier detection for wireless sensor networks," International Journal of Geographical Information Science 1 - 20
Kapitanova K. , Son S. H. , Kang K.-D. 2011 "Using fuzzy logic for robust event detection in wireless sensor networks," Journal of Ad Hoc Networks 10 709 - 722    DOI : 10.1016/j.adhoc.2011.06.008
Liang Q. , Wang L. "Event detection in wireless sensor networks using fuzzy logic system," in Proc. of International Conference on Computational Intelligence for Homeland Security and Personal Safety, IEEE April 2005 52 - 55
Kumaragea Heshan , Ibrahim Khalil , Zahir Tari , Zomaya Albert 2013 "Distributed anomaly detection for industrial wireless sensor networks based on fuzzy data modeling," Journal of Parallel and Distributed Computing 73 790 - 806    DOI : 10.1016/j.jpdc.2013.02.004
Zadeh L.A. 1994 "Soft Computing and Fuzzy Logic," ACM Journal of Software 11 (6) 48 - 56    DOI : 10.1109/52.329401
Zimmermann H.J. 1996 "Fuzzy Set Theory and Its Applications," 3rd edition Publisher kluwer Academic Publishers Norwell
Bezdek J.C. , Ehrlich R. , Full W. 1984 "FCM: the fuzzy c-means clustering algorithm,” Journal of Computers & Geosciences 10 (3) 191 - 203    DOI : 10.1016/0098-3004(84)90020-7
Izakian H. , Pedrycz W. "Anomaly detection in time series data using a fuzzy c means clustering," IFSA World Congress and NAFIPS Annual Meeting, IEEE June 2013 1513 - 1518
Shamshirband S. , Amini A. , Anur N. , Kiah M. , Teh Y. , Furnell S. 2014 "D-FICCA: A density based fuzzy imperialist competitive clustering algorithm for intrusion detection in wireless sensor networks," Journal of Measurement Elsevier 55 212 - 226    DOI : 10.1016/j.measurement.2014.04.034
Tang C. , Wang S.G. 2010 "Adaptive fuzzy clustering model based on internal connectivity of all data points," Acta Automatica Sinica (11) 1544 - 1556    DOI : 10.3724/SP.J.1004.2010.01544
Barnett V. , Lewis T 1994 "Outliers in Statistical Data," 3rd edition John Wiley & Sons
Cai Weiling , chen Song , Zhang Daoqiang 2007 "Fast and robust fuzzy c-means clustering algorithms incorporating local information for image segmentation," Journal of Pattern Recognition 40 (3) 825 - 838    DOI : 10.1016/j.patcog.2006.07.011
Vuran M. C. , Akan B. , Akyildiz I.F. 2004 "Spatio-temporal correlation: Theory and applications for wireless sensor networks," Computer Networks: International Journal of Computer and Telecommunication Networking 45 (3) 245 - 259
Liu Zhidan , Xing Wei , Zeng Bo , Wang Yongchao , Lu Dongming "Distributed Spatial Correlation-based Clustering for Approximate Data Collection in WSNs," in Proc. of IEEE International Conference on Advanced Information Networking and Applications March 2013 56 - 63
Ishibuchi H. , Nakashima T. , Kuroda T. "A hybrid fuzzy GBML algorithm for designing compact fuzzy rule-based classification systems," in Proc. of IEEE International Conference on Fuzzy Systems May 2000 706 - 711
Mitra Sushmita , Hayashi Yoichi 2000 "Neuro–Fuzzy Rule Generation: Survey in Soft Computing Framework," IEEE Transactions on Neural Networks 11 (3) 1 - 20    DOI : 10.1109/72.846746
IBRL Dataset http://db.csail.mit.edu/labdata/labdata.html
SensorScope Dataset http://lcav.epfl.ch/page-86035-en.html