Advanced
A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis
A New Estimation Model for Wireless Sensor Networks Based on the Spatial-Temporal Correlation Analysis
Journal of Information and Communication Convergence Engineering. 2015. Jun, 13(2): 105-112
Copyright © 2015, The Korean Institute of Information and Commucation Engineering
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/li-censes/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
  • Received : February 04, 2015
  • Accepted : May 01, 2015
  • Published : June 30, 2015
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Xiaojun Ren
HyonTai Sug
sht@dongseo.ac.kr
HoonJae Lee

Abstract
The estimation of missing sensor values is an important problem in sensor network applications, but the existing approaches have some limitations, such as the limitations of application scope and estimation accuracy. Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCAM). STCAM can make full use of spatial and temporal correlations and can recognize whether the sensor parameters have a spatial correlation or a temporal correlation, and whether the missing sensor data are continuous. According to the recognition results, STCAM can choose one of the most suitable algorithms from among linear interpolation algorithm of temporal correlation analysis (TCA-LI), multiple regression algorithm of temporal correlation analysis (TCA-MR), spatial correlation analysis (SCA), spatial-temporal correlation analysis (STCA) to estimate the missing sensor data. STCAM was evaluated over Intel lab dataset and a traffic dataset, and the simulation experiment results show that STCAM has good estimation accuracy.
Keywords
I. INTRODUCTION
With the rapid development of wireless communication, microelectronics, and embedded computing technologies, sensor networks are widely used in certain fields, such as the military, environment, and medicine. Therefore, nowadays, these networks have become a popular topic of research. In a wireless sensor network, sensors always communicate with the server and other sensors (e.g., for sending data or accepting data). However, in the process of communication, we can expect the transmitted sensor data to get lost or corrupted for many reasons, such as bad weather conditions, the sensor node’s communication ability, wireless signal strength, power outage at the sensor node, or a relatively high bit error rate of the wireless radio transmissions as compared with wired communications. In general, we can re-query data or discard data, but re-querying data is a naive alternative as it may induce a long wait or quicken the power exhaustion of the node, and most importantly, it does not guarantee having the original reading available. Discarding data is also a bad choice as it may lead to the loss of some interesting data. Therefore, it is essential to develop a technique for estimating missing data.
Data mining can produce knowledge from the existing data, and this knowledge can be used for estimating the missing sensor data. However, the existing missing sensor data estimation approaches do not achieve good results (as discussed in the following section). Therefore, in this paper, we propose a new estimation model based on a spatial-temporal correlation analysis (STCA). This model can discover intrinsic relationships among sensors and then incorporate the intrinsic relationships and the spatial-temporal relationship into data estimation. Finally, STCAM is tested with data from a traffic monitoring sensor application.
II. RELATED WORKS
In fact, the topic of missing data estimation belongs to the field of statistics, and many researchers have conducted a considerable amount of research on this topic by using methods such as mean substitution, linear regression, Bayesian estimation, expectation maximization, k-nearest neighbor, and neural networks [1 , 2] . However, because of the characteristics of a wireless sensor network, these techniques cannot provide a good estimation of the missing sensor data. To solve the problem of missing sensor data, many techniques have been proposed.
To avoid the problem of missing sensor data, many researchers have redesigned the sensor network architecture. NASA/JPL [3] is one of the most famous architectures. In NASA/JPL, if one sensor fails, its neighboring sensors compensate for the lost data by increasing their sampling rates. This implies that there must be a tight collaboration among sensors for a sensor to know that its neighboring sensor has failed. This increases the power consumption of every sensor even during its normal operation. Further, this approach does not address how sampling rates should be adjusted in order to guarantee good QoS. It is also possible that when some neighboring sensors fail, no sampling adjustment can potentially compensate for the missing values.
Some of the researchers used association rule mining to estimate the missing sensor data. Halatchev and Gruenwald [4] proposed the WARM algorithm. In this algorithm, if sensor node a fails, WARM will find its neighbor sensor node b and use b ’s data to estimate a ’s missing data. WARM makes use of the sliding window concept, where only the latest w rounds of data reports are stored and used for estimation. However, the algorithm has one limitation, which is its disregard of the temporal aspect since it views all data as equally important. Gruenwald et al. [5] proposed an improved algorithm called FARM, which uses association rule mining to discover intrinsic relationships among sensors and incorporates them into the data estimation while taking data freshness into consideration. However, WARM and FARM can only be used in the case of discrete data; because most of the sensor data are numeric, WARM and FARM cannot be used widely.
III. ESTIMATION MODEL BASED ON SPATIAL-TEMPORAL CORRELATION ANALYSIS
In fact, there are two types of missing sensor data, namely single missing data elements and continuous missing data; therefore, STCAM must have the ability to provide different solutions for different types of missing data according to the sensor’s spatial-temporal correlation. Before the given STCA, we will discuss the problem description, temporal correlation algorithm (TCA), spatial correlation algorithm (SA), and spatial-temporal correlation algorithm (STCA).
- A. Problem Description
STCAM uses a temporal series form to represent the collected data of a sensor node ak . The temporal series form is as follows:
PPT Slide
Lager Image
where T 1 ...... Tn R denote the sampling time and Vk 1 ...... Vkn R represent the sampling values of sensor node ak at time T 1 ...... Tn . Assuming that Vki denotes the missing sensor data and
PPT Slide
Lager Image
represents the estimated sensor data at time Ti , we can reduce the problem of the estimation of the missing sensor data to the calculation of the smallest value of
PPT Slide
Lager Image
- B. Temporal Correlation Algorithm
In some applications, the data of the monitoring parameter have a tight temporal correlation, such as temperature, humidity, and light intensity. Therefore, we can use temporal correlations to build the TCA model. In the next section, we will introduce two algorithms, namely the linear interpolation algorithm (TCA-LI) and multiple regression algorithm (TCA-MR).
- 1) TCA-LI Algorithm
The linear interpolation algorithm is a method of curve fitting using linear polynomials, which have a high efficiency. In this section, TCA-LI can be expressed by the following formula [6] :
PPT Slide
Lager Image
where Tu and Tv denote the two nearest time points from Ti , and Tu < Ti < Tv ;
PPT Slide
Lager Image
denotes the estimated sensor data at time Ti , and Viu and Viv represent the sampling data at time Tu and Tv , respectively.
For a single missing data element, the TCA-LI algorithm can give a better attestation value, but if the missed sensor data are continuous, the accuracy of the TCA-LI algorithm decreases, as shown in Fig. 1 . Sensor V measures the temperature every 24 minutes. Assuming that T 1176 ( V 1176 = 32.90) is missed and that T 1152 ( V 1150 = 33.10) and T 1200 ( V 1200 = 32.50) are the two nearest time points from T 1176 , we find that
PPT Slide
Lager Image
is close to V 1176 . However, assuming the data between T 1008 and T 1272 , we find that T 984 ( V 984 = 29.80) and T 1296 ( V 1296 = 30.30) are the two nearest time points from T 1176 ; therefore,
PPT Slide
Lager Image
and the value of
PPT Slide
Lager Image
is very large. Hence, the TCA-LI algorithm is only used for estimating single missing data elements.
PPT Slide
Lager Image
Temperature data collected by sensor V for one day.
- 2) TCA-MR Algorithm
From the above section, we see that TCA-LI has good accuracy for single missing sensor data elements in TCA, but for continuous missing sensor data, TCA-LI cannot provide good estimation data. Therefore, in this section, we will introduce the multiple regression algorithm (TCA-MR) to estimate the continuous missing sensor data of the TCA model. Assuming that Vki denotes the missing data of sensor node ak at time Ti , the problem of estimating
PPT Slide
Lager Image
can be solved by using the following multiple regression formula:
PPT Slide
Lager Image
where { β 0 , β 1 , β 2 ....... βm } denote regression coefficients, which represent the contribution level for
PPT Slide
Lager Image
.
To estimate
PPT Slide
Lager Image
, we should use the training dataset to estimate the value of { β 0 , β 1 , β 2 ....... βm }. Assuming that the training dataset is { Vki , Vk ( i +1) , Vk ( i +2) ....... Vkj }, j > i + 2 m + 1. To estimate { β 0 , β 1 , β 2 ....... βm }, we should build h linear equations ( h > m + 1) that can be expressed as follows:
PPT Slide
Lager Image
Let
PPT Slide
Lager Image
Therefore, Eq. (3) can be rewritten as follows:
PPT Slide
Lager Image
The coefficient β can be estimated by using the leastsquares approach [7] , which can be expressed as follows:
PPT Slide
Lager Image
After we calculate the value of coefficient β , we can use Equation (2) to estimate the continuous missing sensor data of TCA.
- C. SCA Algorithm
For continuous missing data and loose temporal correlation parameters, the TCA algorithm cannot provide a good estimation value for the missing data. However, the SCA algorithm can discover the spatial relationship between the sensor nodes and use the discovered spatial knowledge to estimate the missing data.
Assuming that Vki denotes the missing data of sensor node ak at time Ti and { α 1 , α 2 ...... αm } represent the neighbors of ak , we find that { V 1 i , V 2 i ...... Vmi } represent the data values of { α 1 , α 2 ...... αm } at time Ti . The problem of estimating Vki can be solved by { V 1 i , V 2 i ...... Vmi } using the multiple regression as follows:
PPT Slide
Lager Image
where { β 0 , β 1 , β 2 ....... βm } denote regression coefficients, which represent the contribution level for
PPT Slide
Lager Image
.
To calculate
PPT Slide
Lager Image
SCA needs a dataset to estimate the value of { β 0 , β 1 , β 2 ....... βm }. According to the solution rules of linear equations, the dataset contains at least (m + 1) groups of { V 1 , V 2 ....... Vm }. Note that when h > m +1, the linear equations can be expressed as follows:
PPT Slide
Lager Image
Let
PPT Slide
Lager Image
Therefore, Eq. (3) can be represented by using matrix algebra as follows:
PPT Slide
Lager Image
Hence, we can calculate the value of coefficient β as follows:
PPT Slide
Lager Image
- D. STCA Algorithm
The TCA algorithm is always used for estimating tight temporal correlations and single missing data elements. The SRA algorithm is used for estimating tight spatial correlations. However, when the temporal or spatial correlation is unknown, TCA or SCA may not give a good estimation value. To solve this problem, the STCA algorithm is proposed. This algorithm takes into account the weight of the temporal and spatial correlations; therefore, the STCA algorithm can be represented as follows:
PPT Slide
Lager Image
where Ws and WT denote the weight of
PPT Slide
Lager Image
respectively;
PPT Slide
Lager Image
represents the result of the SCA algorithm, and
PPT Slide
Lager Image
denotes the result of the TCA algorithm.
To obtain the optimum value of Ws and WT , STCA calculates the residual sum of squares (RSS) as follows:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
denote the estimation error of
PPT Slide
Lager Image
respectively, and h represents the number of selected datasets.
Let
PPT Slide
Lager Image
Therefore, this question of getting optimum value of Ws and WT becomes a quadratic programming problem:
PPT Slide
Lager Image
Therefore, we can use the least-squares approach [7] to obtain an optimal solution as follows:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
- E. Correlation Analysis Algorithm
We use Pearson’s product-moment correlation coefficient to measure the correlation of the output variable and the input variable. The value of ρ is between +1 and −1 (inclusive), where 1 denotes a total positive correlation, 0 represents no correlation, and −1 indicates a total negative correlation. The formula for ρ is as follows:
PPT Slide
Lager Image
where y denotes the output variable; x represents the input variable; μx and μy denote the mean of x and y , respectively; and σx and σy indicate the standard deviation of x and y , respectively.
Further, 0.5 < | ρ | ≤ 1 is regarded as a high correlation, 0.3 < | ρ | ≤ 0.5 as a medium correlation, and 0.0 < | ρ | ≤ 0.3 as a low correlation.
- 1) Temporal Correlation Analysis
If we want to find whether the sampling data of a sensor have a temporal relationship, we should choose a training dataset for the analysis. Assuming that { Vki , Vk ( i +1) , Vk ( i +2) ,......., Vk ( j -1) , Vkj }, is the training dataset of sensor node ak , we use Vkj , Vk ( j -1) ,....., V ( j-h ) to denote the sampling value of ak at time T h . Thus, we obtain the subdataset at T (h-1) ,T (h-2) ,T (h-3) ,..., T 1 , as shown in Table 1 .
Dataset at time Th, T(h-1),T(h-2),..., T1
PPT Slide
Lager Image
Dataset at time Th, T(h-1),T(h-2),..., T1
Therefore, we can use Eq. (12) to analyze the relationship of the dataset of T h and the dataset of another time (T (h-1) ,T (h-2) ,...,T 1 ). Now, we can define the temporal correlation as follows:
Definition 1: In a training dataset, if the sub-dataset of T h is highly relevant to one or more sub-datasets of another time (0.5 < |ρ| ≤ 1), the dataset of the sensor node has a high temporal correlation. If the sub-dataset of T h is only moderately relevant to one or more sub-datasets of another time (0.3 < |ρ| ≤ 0.5), the dataset of the sensor node has a medium temporal correlation.
- 2) Spatial Correlation Analysis
If we want to determine whether the sampling data of a sensor have a spatial relationship, we should also choose a training dataset for the analysis. Assuming that a ( k +1) , a ( k +2) ,..., a ( k+i ) are the nearest nodes from ak , we obtain the values listed in Table 2 .
Dataset ofak,a(k+1),a(k+2),...,a(k+i)at different times
PPT Slide
Lager Image
Dataset of ak, a(k+1),a(k+2),...,a(k+i) at different times
Definition 2: In a training dataset, if the sub-dataset of ak is highly relevant to one or more other sensor node subdatasets (0.5 < |ρ| ≤ 1), the dataset of the sensor node has a high spatial correlation. If the sub-dataset of T h is only moderately relevant to one or more other sensor node subdatasets (0.3 < |ρ| ≤ 0.5), the dataset of the sensor node has a medium spatial correlation.
- F. Process of STCAM Decision
STCAM uses the following four algorithms: TCA-LI, TCA-MR, SCA, and STCA. If a sensor node has a tight temporal correlation and does not miss continuous data, STCAM will use the TCA-LI or TCA-MR algorithm to estimate the missing sensor data. Further, if a sensor node has a high spatial correlation, STCAM will use SCA to estimate the missing data. Otherwise, STCAM will choose the STCA algorithm. The process of STCAM decision making is shown in Fig. 2 . From these figures, we can conclude the applicable conditions of the four algorithms.
PPT Slide
Lager Image
Process of STCAM decision. STCAM: a model based on spatialtemporal correlation analysis, TCA: temporal correlation analysis, TCA-MR: multiple regression algorithm of TCA, TCA-LI: linear interpolation algorithm of TCA, SCA: spatial correlation analysis, STCA: spatial-temporal correlation analysis.
TCA-LI: The training dataset has a high temporal correlation when the type of missing sensor data is single. In contrast, the training dataset has a medium temporal correlation when the training dataset has a low spatial correlation and the type of missing sensor data is single.
TCA-MR: The training dataset has a high temporal correlation, and the type of missing sensor data is continuous. The training dataset has a medium temporal correlation when the training dataset has a low spatial correlation and the type of missing sensor data is continuous.
SCA: The training dataset has a medium temporal correlation when the training dataset has a high spatial correlation. The training dataset has a low temporal correlation when the training dataset has a low spatial correlation, and the training dataset has a low temporal correlation when the training dataset has a medium spatial correlation.
STCA: The training dataset has a medium temporal correlation when the training dataset has a medium spatial correlation.
If the training dataset has a low spatial and temporal correlation, there is no matching algorithm for the estimation.
IV. SIMULATION EXPERIMENTS
The estimation model proposed in this paper is simulated using Java and evaluated over the Intel lab dataset [8] and a traffic dataset of a city in China. The Intel lab dataset is a trace of readings from 54 sensor nodes deployed in the Intel Research Berkeley lab. These sensor nodes collected the light, humidity, temperature, and voltage readings once every 30 seconds. The traffic dataset is a trace of readings from 596 sensor nodes that are deployed on different roads.
To evaluate the accuracy and performance of STCAM, we choose DESM [9] for a comparison. The DESM algorithm is also an estimation approach based on the spatial-temporal correlation, and the result formula is as follows:
PPT Slide
Lager Image
where Vk ( i -1) denotes the value of sensor node ak at (i -1) time, Vzi represents the value of az at time i, β denotes the weight of Vzi , and DESM chooses az as the nearest node from ak (for a detailed description of DESM, refer to [9] ).
To evaluate the four abovementioned algorithms, we need to choose different datasets for testing.
- 1) Comparison between TCA-LI/TCA-MR and DESM
By analyzing the temporal correlation, we know that the temperature dataset has a high temporal correlation; therefore, we use the temperature dataset of sensor 23 to test the accuracy and performance of TCA-LI/TCA-MR. Firstly, we assume that the 121 th , 131 th , 141 th ,..., 311 th data elements are missed; therefore, under this condition, STCAM chooses the TCA-LI algorithm to estimate the missing sensor data. The experiment results are presented in Fig. 3 and Table 3 .
PPT Slide
Lager Image
Comparison of experimental results of TCA-LI and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
Performance comparison of TCA-LI and DESM
PPT Slide
Lager Image
TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
We assume that data 121–140 are missing; therefore, under this condition, STCAM chooses the TCA-MR algorithm to estimate the missing sensor data. The experiment results are presented in Fig. 4 and Table 4 .
PPT Slide
Lager Image
Comparison of experimental results of TCA-MR and DESM. TCA-LI: linear interpolation algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
Performance comparison of TCA-MR and DESM
PPT Slide
Lager Image
TCA-LI: multiple regression algorithm of temporal correlation analysis, DESM: data estimation using statistical model.
According to Fig. 4 , the accuracy of TCA-MR decreases with an increase in the amount of continuous missing sensor data. Therefore, there is a threshold. If the amount of missing sensor data is less than the threshold, TCA-MR exhibits good accuracy, and if the amount of missing sensor data is more than the threshold, TCA-MR is not suitable for estimating the missing sensor data with a high temporal correlation. According to Table 4 , the performance of TCA-MR is significantly higher than that of DESM.
- 2) Comparison between STCA and DESM
By analyzing the temporal and spatial correlation, we know that the humidity dataset has a medium temporal and spatial correlation. Under this condition, STCAM chooses the STCA algorithm to estimate the missing sensor data. We choose the humidity dataset of sensor 11 to test the accuracy and performance of STCA. Assuming that data 81–105 are missing, we obtain the experimental results shown in Fig. 5 and Table 5 .
PPT Slide
Lager Image
Comparison of experimental results of STCA and DESM. STCA: spatial-temporal correlation analysis, DESM: data estimation using statistical model.
Performance comparison of STCA and DESM
PPT Slide
Lager Image
STCA: spatial-temporal correlation analysis, DESM: data estimation using statistical model.
According to Fig. 5 and Table 5 , STCA exhibits better accuracy, but the performance is lower than that of DESM.
- 3) Comparison between SCA and DESM
By analyzing the temporal and spatial correlation, we find that the traffic dataset has a low temporal correlation and a high spatial correlation. Therefore, under this condition, STCAM chooses the SCA algorithm to estimate the missing sensor data. We suppose that data 121–144 of sensor a 6 are missing. The experimental results are shown in Fig. 6 and Table 6 .
PPT Slide
Lager Image
Comparison of experimental results of SCA and DESM. SCA: spatial correlation analysis, DESM: data estimation using statistical model.
Performance comparison of SCA and DESM
PPT Slide
Lager Image
SCA: spatial correlation analysis, DESM: data estimation using statistical model.
According to Fig. 6 and Table 6 , SCA exhibits better accuracy, but the performance is lower than that of DESM.
V. CONCLUSION
In this paper, we propose a data estimation technique called STCAM, which can discover the correlation of the training dataset, and depending on this correlation and the type of missing sensor data, STCAM can choose one of the most suitable algorithms from SCA-LI, SCA-MR, TCA, and STCA to estimate the missing sensor data. From the simulation result, we conclude that STCAM exhibits good accuracy for the missing sensor data, but in terms of performance, STCAM has a relatively low computational efficiency. Therefore, STCAM can only be deployed at the sink node or in the central server. Moreover, by the simulation, we found that the accuracy of TCA-MR decreases with an increase in the amount of continuous missing sensor data, and this may influence the total accuracy of STCAM, but in the paper, we do not provide an effective solution for this issue. Therefore, in the future, we will conduct further research to fill the gap.
Acknowledgements
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. NRF-2011-0023076). Further, it was supported by the BB21 project of Busan Metropolitan City.
BIO
Ren Xiaojun
received his B.S. and M.S. degrees from Shandong University of Science and Technology in 2007 and 2010, respectively. Since 2013, he is pursuing his Ph.D. in Data Mining from Dongseo University, Korea. His research interests include artificial intelligence, machine learning, database technology, statistics, and sensor networks.
Hyontai Sug
received his B.S. in Computer Science and Statistics from Busan National University, Korea, in 1983; his M.S. in Applied Computer Science from Hankuk University of Foreign Studies, Korea, in 1986, and his Ph.D. in Computer and Information Science and Engineering from University of Florida, USA, in 1998. He was a researcher of Agency for Defense Development, Korea, from 1986 to 1992 and a full-time lecturer at Pusan University of Foreign Studies, Korea, from 1999 to 2001. Currently, he is a professor at Dongseo University, Korea, from 2001. His research interests include data mining and database applications.
HoonJae Lee
received his B.S., M.S., and Ph.D. in Electrical Engineering from Kyungpook National University in 1985, 1987, and 1998, respectively. He was engaged in the research on cryptography and network security at Agency for Defense Development from 1987 to 1998. In 2002, he joined Department of Computer Engineering of Dongseo University as an associate professor, and now, he is a full professor here. His current research interests include security communication systems, side-channel attacks, USN, and RFID security. He is a member of the Korea Institute of Information Security and Cryptology, IEEE Computer Society, IEEE Information Theory Society, etc.
References
Pan L. , Gao H. , Gao H. , Liu Y. 2014 “A spatial correlation based adaptive missing data estimation algorithm in wireless sensor networks,” International Journal of Wireless Information Networks 21 (4) 280 - 289    DOI : 10.1007/s10776-014-0253-9
Niu K. , Zhao F. , Qiao X. “A missing data imputation algorithm in wireless sensor network based on minimized similarity distortion,” in Proceedings of the 6th International Symposium on Computational Intelligence and Design (ISCID) Hangzhou, China 2013 235 - 238
Ramakrishnan S. 2003 “Sensing the world,” Jasubhai Digital Media 10 (1) 26 - 28
Halatchev M. , Gruenwald L. “Estimating missing values in related sensor data streams,” in Proceedings of the 11th International Conference on Management of Data (COMAD) Goa, India 2005 83 - 94
Gruenwald L. , Chok H. , Aboukhamis M. “Using data mining to estimate missing sensor data,” in Proceedings of 7th IEEE International Conference on Data Mining Workshops Omaha, NE 2007 207 - 212
Yarman B. S. , Kilinc A. , Aksen A. 2004 “Immitance data modelling via linear interpolation techniques: a classical circuit theory approach,” International Journal of Circuit Theory and Applications 32 (6) 537 - 563    DOI : 10.1002/cta.295
Kanamori T. , Hido S. , Sugiyama M. 2009 “A least-squares approach to direct importance estimation,” Journal of Machine Learning Research 10 1391 - 1445
Madden S. Intel lab data [Internet] Available: .
Li Y. , Ai C. , Deshmukh W. P. , Wu Y. “Data estimation in sensor networks using physical and statistical methodologies,” in Proceedings of 28th International Conference on Distributed Computing Systems (ICDCS'08) Beijing, China 2008 538 - 545