Physiological signals provide important clues in the diagnosis and prediction of disease. Analyzing these signals is important in health and medicine. In particular, data preprocessing for physiological signal analysis is a vital issue because missing values, noise, and outliers may degrade the analysis performance. In this paper, we propose PhysioCover, a system that can recover missing values of physiological signals that were monitored in real time. PhysioCover integrates a gradual method and EMbased Principle Component Analysis (PCA). This approach can (1) more readily recover long and shortterm missing data than existing methods, such as traditional EMbased PCA, linear interpolation, 5average and Missing Value Singular Value Decomposition (MSVD), (2) more effectively detect hidden variables than PCA and Independent component analysis (ICA), and (3) offer fast computation time through realtime processing. Experimental results with the physiological data of an intensive care unit show that the proposed method assigns more accurate missing values than previous methods.
1. INTRODUCTION
Physiological data are observations of physiological activities from neurons, cardiac rhythms, tissues and organs. They are measured by noninvasive methods such as surface sensors on the skin of a user, or invasive methods such as measurement method of Arterial Blood Pressure (ABP). They may provide a robust and accurate means of detecting and predicting diseases, because the signals correspond to internal physiology. Many physiological signals, such as Electroencephalography (EEG), Electrocardiography (ECG), ABP, and Heart rate are recorded in digital format. Analyzing these digital signals to extract useful health information is an emerging research area in biomedical engineering.
Physiological signal analysis is performed for many reasons. These signals are mostly collected by multi sensors per patient over time. They are often sampled as a matrix to correctly analyze data. Recently, physiological signals have been measured by small, wearable, and wireless sensors, while the patient is moving in real time. Many studies using physiological signals focus on the prediction or classification of events, such as epileptic seizure, acute hypotensive episodes, emotion recognition, and so on
[1]

[3]
. To correctly analyze data, the collected data are required to be precise and reliable.
However, the dataset of most Physiological signals includes a lot of short or longterm missing values, and noise (outliers). Most studies treat missing values by removing a specific signal, or by using a simple method, such as averaging of observed data, normalization, and linear interpolation
[4]

[7]
. These methods may cause a biased model because of the loss of information. Furthermore, they underestimate standard deviation, since they do not consider the uncertainty in missing values. Consequently, this problem may result in irreversible health damage and death through faulty analysis due to the characteristics of the physiological data.
In this paper, we propose
PhysioCover
, which recovers missing values of physiological signals by gradually updating the weight values of Principle Component Analysis (PCA) based on an Expectation Maximization approach. It also summarizes large data by detecting hidden variables in realtime. The proposed method can solve the problem of a biased model, which is inherited from missing data, and reduce the loss of information by recovering missing values. Experimental results with physiological data of an intensive care unit show that the proposed method replaces more accurate missing values than previous methods such as the traditional EMbased PCA (EMPCA), linear interpolation, 5average and Missing Value Singular Value Decomposition (MSVD) with respect to classification accuracy. Our contributions are as follows;

–Robust missing value recovery: The proposed method which combines an EMbased PCA and gradual approach provides a good recovery result for longterm missing values of physiological signals.

–Hidden variable detection: our method detects a few hidden variables, which summarize the whole signals.

–Scalability:PhysioCoverprovides a scalable approach that needs computation timeO(r ⋅ k) , whereris the number of iterations andkis the number of hidden variables in the model to recover missing values, and summarizes physiological signals. Therefore, we expect it will scale well for various real time series data of multidimensions.
The remainder of this paper is organized as follows: section 2 presents the proposed method for recovering missing values. Section 3 describes the result of experiments of the proposed method with physiological data of an intensive care unit. In section 4, we discuss the existing methods in comparison with our method for recovering missing values. Finally, conclusions are drawn in section 5.
2. MATERIALS AND METHODS
The major goal of analyzing these physiological time series data is to forecast or to detect disease. Many mathematical tools, such as Linear Regression and AutoRegression
[8]
, assume completely observed data. However, missing observations often occur in many real applications, and thus, it presents a major challenge to model physiological time series in the presence of missing data.
 2.1 Background
Given a data matrix
X_{m×n}
that contains missing values, an improved PCA is proposed to use alongside the Expectation and Maximization (EM) algorithm
[9]
,
[10]
. It recovers the values of missing data through the Expectation and the Maximization steps. As a first step for recovering missing values, the initial values of missing values are filled with the mean of the column vector, and the recovered data are projected by PCA
[11]
. The Expectation step can be easily derived with the projection data for learning. The Maximization step recomputes the principal components with the obtained value at the Expectation step, and the missing values are replaced with the updated unitary values. The optimal value to fill the missing values is predicted in the iterative process of EM.
where
Y
is a projected matrix of
m× k
, and
W
is a
k × n
matrix of the unknown states. The columns of
Y
will span the space of the first
k
principal components. The data matrix
X
can be projected into this
k
dimensional subspace by computation of the corresponding eigenvectors and eigenvalues explicitly. The EMbased PCA projects the data matrix
X
using
W_{new}
, which is updated through the iterative process of Expectation and Maximization until convergence. The data matrix
X
can be reconstructed by
Y
and
W_{new}
as
The missing values of data matrix
X
are replaced by the reconstructed matrix
The EMbased PCA recovers the missing values well. However, it requires much execution time for a large multidimension dataset because of the batch process
[10]
. Therefore, we propose the novel method of a gradual approach that allows realtime processing for recovering missing values.
 2.2 PhysioCover: A Gradual method with EMbased PCA
The proposed method is based on the gradual method and EMbased PCA. EMbased PCA demonstrated the advancement of recovering missing values
[9]
. Therefore, for the missing value recovery of the physiological time series data, we integrate a gradual model with the concept of EMbased PCA to update the weight vector in real time.
In the physiological time series data,
x_{t}
∈ ℜ is the
n
signal measurement columnvector at time tick
t
. That is, physiological data are represented as a
t × n
matrix. For real time processing, we apply the gradual method to update the weight vectors,
w_{i}
, at each time tick in the newly projected space. Each of the weight vectors,
w_{i}
is projected onto the input vector,
x_{t}
in the linear transformation of the data stream to obtain the hidden variables or components,
y_{t}
over time
[8]
.
For the real time processing of the physiological data that include missing values, firstly, the number of hidden variables is initialized by an arbitrary number,
k
. Given input data
x_{t} = x_{t},
_{1}
,
x_{t},
_{2}
,...,
x_{t},_{n}
with
n
dimensions at time
t
, the
i
th component,
y_{t},_{i}
, is obtained as follows:
where the input data
x_{t}
is computed by the previous weight vector,
w_{t}
_{−1}
_{,i}
(1 ≤
i
≤
k
) . Second, we estimate the energy,
p_{t},_{i}
and reconstruction error,
e_{t},_{i}
by Eqs. (4) and (5), respectively, in order to adjust the number of hidden variables or components. The energy initializes with a small positive value.
This gradual approach uses the exponential forgetting factor, λ, to reflect more recent trends in the data stream. The exponential forgetting factor, λ, commonly uses values between 0.96 and 0.98
[8]
,
[12]
. It helps to reduce the huge memory usage, because no buffer space is required for the whole data. The magnitude of the estimates should also consider the past data captured by the participation weight vector
w_{t},_{i}
, because the update is inversely proportional to the current energy
E_{t},_{i}
of the
i
th hidden variable as follows:
The participation weight vector is updated based on the following equation:
Thirdly, we maximize the updated weight vector using Eqs. (1) and (2) by using the stopping criterion. In this study, we define the stopping criterion as follows: if the absolute value of the difference of a new weight vector and an old weight vector of
w
is smaller than δ (for example, δ = 0.001), or if the absolute value of the sum of the new weight vector and old weight vector of
w
is smaller than δ , it is treated as stopping criterion. For recovering missing values, we compute the reconstruction data
with the updated weight matrix
w_{new}
, and the new projected vector,
y_{t}
. The missing values of input data,
x_{t}
, are recovered by the reconstruction data,
, but the observed values in
x_{t}
are not replaced by
Finally, we obtain the actual hidden variables,
y_{t} = x_{t,replace} ⋅w^{T}_{t,i}
at time
t
, which is computed by newly recovered missing values of the input data
x_{t}
and the weight matrix
w_{t,i}
by Eq. (8), which is whitened to maximize the weight vector.
To automatically determine the number of hidden variables, we compute the energy
E_{hv}
based on the values of the hidden variables. In practice, we do not know the number of hidden variables
k
. Therefore, we use an energy threshold to determine the number of hidden variables. The energy threshold corresponds to a bound, which contains the upper bounds
FE_{x}
and lower bounds
fE_{x}
of the energy
[8]
,
[13]
. The energy of the hidden variable
E_{hv}
is compared with the predefined upper and lower bounds. If
E_{hv}
<
fE_{x}
, the number of hidden variables,
k
, increases. On the other hand, if
E_{hv}
>
FE_{x}
,
k
decreases. We keep the number of hidden variables within the range
FE_{x}
to
fE_{x}
. If the lower bound of energy is too low, the useful information of the data may be lost. In
PhysioCover
, we use the upper and lower energy thresholds of 0.98 and 0.95, respectively. This means that the energy of input data
x_{t}
is retained between 95% and 98%. When new data
x_{t}
_{+1}
that include missing values arrives, missing values are recovered with the updated weight vector through the iterative process of the Expectation and Maximization step. The number of hidden variables will be adjusted automatically, while maintaining the predefined bounds. The algorithm of
PhysioCover
is shown in
Table 1
.
PhysioCover algorithm
3. EXPERIMENT RESULTS
 3.1 Data descriptions
The dataset we used to verify the efficiency of the proposed method was obtained from a public access Intensive Care Unit (ICU) database
[7]
,
[14]
. We used the dataset of 923 patients over 45 hours from the ICU database. The physiological signals of each patient were monitored using 18 sensors, such as heart rate (HR), arterial blood pressure (ABP), noninvasive indirect blood pressure (NBP), respiration, and saturation of oxygen measured by a pulse oximeter (SpO2), and so on. Each patient belongs to the Acute Hypotensive Episode (AHEs) or NonAHEs class. The patients were separated into two groups; the group of Acute Hypotensive Episodes (AHEs) includes an AHE in the forecast window, and the other group does not include an AHE in the forecast window section. AHE group has 314 patients and the nonAHE group has 609 patients. In this dataset, several sensors were not recorded signal during the monitoring period from most of the patients (e.g. from 8 to 21th sensors). Therefore, we used only the monitored signals excluding all missing signals. We used the seven signals: HR, ABPSys, ABPDias, ABPMean, Pulse, RESP, and SpO2. The normal HR beats 50100 per one minute. ABPSys and ABPDias mean systolic and diastolic pressures, respectively. ABPMean is the mean arterial pressure, and Pulse is the heart beats that are strong enough to be felt, at the wrist/knee/ankle, etc. RESP means respiration (1316/min), and SpO2 is saturation of oxygen measured by a pulse oximeter. These signals include many missing values. We evaluated the performance of the proposed method using both simulated sample data and real time data. The simulated sample data were generated from some of the real data, and it not includes missing values. To measure the accuracy of recovering short and longterm missing values, we artificially removed the values of area during 3 or 4 hours in the simulated sample data.
 3.2 Recovery of Missing Values
For detection or prediction of AHEs, the signals that are monitored at least one hour before an AHEs event occurs are important
[15]
. However, if signal of this important point is missing, the detection or prediction of an AHEs event may be difficult. The purpose of the proposed method is to impute the missing values to prevent failures of detection and prediction because of the incomplete data. Given multiple physiological time series data with missing values, we propose
PhysioCover
, which recovers the missing values, finds the latent variables, and summarizes data.
Fig. 1
shows the original input signals of patient a40921and the recovered signals by the proposed method. In
Figure 1
, if the value of a signal is zero, it means a missing value. Also, in the figures, the xaxis means the time point and yaxis means the real values that were recorded from the sensors. In
Figure 1
(a), the ABPSys signal of patient a40921 has much shortterm missing data. That is, the dropped down point to zero is a missing value. The missing signal is recovered without failure, which is the red color signal at the bottom of
Figure 1
(a). The RESP signal of a40921 has longterm missing values (in
Fig. 1
. (b)).
Comparison between Original signal and Recovered signal by our method
In this paper, we define longterm missing values as values that have been missing for more than an hour. Red ellipses indicate the missing area, and longterm missing lasted for 5 hours and 7 hours. In the observed signal, one time point is 1 minute. In the recovered signal graph of RESP, blue ellipses indicate the recovered signal by the proposed method.
Figures 1
(c) and (d) show the original and recovered signals of the ABPDias and Pulse of patient a40928, respectively. The ABPDias signal of a40928 has a strong longterm missing period from the 14500 to 17500 time point. This means missing of more than 40 hours. In the case of the normal patient, ABPDias appears near 80mmHg. However, ABPDias of a40928 are between 40 and 60mmHg, and the recovered signal also appears nearby in the range of the original signal. The Pulse is mostly equal to the heart rate. The normal HR beat is between 50 and 100 per minute. Blue ellipses in
Figures 1
(d) indicate the recovered signals by the proposed method, and they appear in a similar range to the original signal of Pulse. The similarity between the original and recovered signals may mean a suitable method for recovering of a missing value from a physiological signal.
To verify the effectiveness of
PhysioCover
in recovering shortterm and longterm missing values, we generated simulated sample data that were extracted from a part without missing values in the original data, and we coercively created missing values in a part of the sample data. This experiment contemplated two cases: shortterm missing and longterm missing. Shortterm missing periodically arises for several minutes spread over 3 or 4hour. Longterm missing is complete missing for 3 or 4hour. To compare the imputation power of the missing values, we applied five methods to the simulated sample data: traditional EMPCA
[10]
,
[16]
, MSVD
[17]
, Linear Interpolation, 5average, and our method. .
Fig. 2
shows the recovered result in the short and longterm missing alongside a comparison with the existing methods.
Fig. 2
(a) is the result of shortterm missing recovery of the simulated data, which were extracted from ABPDias of patient a41770, and almost recovered signal appeared similar to original signal, but MSVD has the highest RMSE as
Table 2
. In case of the longterm missing over 4 hours, it shows surprising results in
Fig. 2
(c). The existing methods have the unvaried values while missing values recovered, but our proposed method flexibly recovered missing values of longterm. In traditional EMPCA, MSVD, and 5average, the recovered signal appears as a nearly straight line in the longterm missing section. Linear Interpolation method is a traditional approach of missing value recovery in the research such as detection and prediction of AHE
[5]
[7]
. The recovered signal shows linear lines, because it draws a straight line between the starting point of missing and the end point of missing.
Comparison between original and recovered signals by existing methods such as EMPCA, MSVD, Linear Interpolation, 5Average, and the proposed method using the sample data extracted from original data that do not include missing values. The extracted sample data contain missing values that were created coercively.
RMSE of recovery methods for the sample data set
RMSE of recovery methods for the sample data set
RMSE of
PhysioCover
guaranteed the lowest value (see
Table 2
).
Fig. 2
(b) and (d) show the results of simulated data from patient a40384 (ABPDias signal). The results validated that our proposed method is robust for longterm missing problem.
 3.3 Detection of Hidden Variables
Our proposed method summarizes a large time series dataset by detecting a few hidden variables. To check the performance of hidden variables detection, we apply our proposed method, iPCA and PCA to a real dataset. iPCA is a remarkable method for real time processing
[18]
. This method can summarize or can detect a few hidden variables from a large multidimensional dataset. Therefore, it is useful for a time series health dataset. However, if features are independent, or signals have longterm missing values, it is impossible to recover missing values with iPCA. The experimental dataset is arbitrary extracted from real patient data that do not include longterm missing values. Patients are randomly selected from all of patients, and signals where the length of missing is not over 15 minutes are collected for 25 hours from the selected patients.
Fig. 3
(a) shows the signals of a patient. It is composed of 7 signals, and a few signals include shortterm missing values.
Fig. 3
(b) shows the detected first hidden variables by the proposed method, iPCA, and PCA. iPCA and PCA were applied on the recovered dataset by the proposed method to compare the patterns of the first hidden variable with the proposed method. In
Fig. 3
(b), the blue line indicates the first hidden variable with the recovered missing values by the proposed method. The green line is the first hidden variable of iPCA, and the red line appears as the first hidden variable of PCA. These have the same patterns as the first hidden variable of the proposed method.
Detected first hidden variables by our proposed method, iPCA and PCA: (a) all signals of Patient a40012, and (b) the first hidden variable detected by the proposed method, iPCA, and PCA from the recovered signal by the proposed method.
We compare hidden variables among the recovered signals of the proposed method and the existing methods to verify if the hidden variables are similar to the recovered data by each method. Traditional EMPCA, MSVD, Linear Interpolation, and 5average as the existing methods are applied to the real patients’ dataset, which include shortterm missing values as well as longterm missing values. After that, we projected the recovered dataset by PCA to compare the first hidden variable.
Fig. 4
shows the original signals of two patients.
Fig. 4
(a) is the original signal of a40012, which includes shortterm and longterm missing values. The a40802 of
Fig. 4
(b) has mostly shortterm missing values.
Table 3
shows the first hidden variable of two patients (a40012 and a40802). The proposed method is able to recover missing values, and detect a hidden variable at the same time. Therefore, our method makes it unnecessary to use an extra dimension reduction or hidden detection method.
Table 3
(a) and (b) are the detected first hidden variables by the proposed method. The original dataset of Patient A40012 has longterm missing values, from 2462 to the end time point of ABPSys, ABPDias, ABPMean, and Pulse signal (over 4 hour) as in
Fig. 4
(a).
Table 3
(c) and (d) show the first hidden variable of each patient that was detected by traditional EMPCA. This recovery method can recover and project the data at the same time. Therefore, the detection process is not needed for a hidden variable. (c) and (d) of
Table 3
are the results that were detected in the recovering process autonomously.
Table 3
(e) and (f) are the first hidden variable by PCA from the recovered missing values by MSVD.
Table 3
(g) and (i) show the first hidden variables of data that recovered the missing values by Linear Interpolation and (i) and (j) are the results of 5average, respectively. As a result, we can find a singularity of the first hidden variables where the first hidden variable of the longterm missing section dropped down, when MSVD and Linear Interpolation were used to recover missing values (see (e) and (g) of
Table 3
).
Original signals of a40012 and a40802 that include shortterm and longterm missing values
Detected first hidden variables by the proposed method and PCA
Detected first hidden variables by the proposed method and PCA
That is, the first hidden variable of the longterm missing section is relatively lower than the values of the range without longterm missing.
Table 3
(f), (h), and (j) are the first principal components of Patient a40802. These data have many shortterm missing values as in
Fig. 4
(b). For the comparison, the shortterm missing values of patient a40802 are recovered using traditional EMPCA, MSVD, Linear Interpolation, and 5average, and PCA is applied to detect hidden variables except in the traditional EMPCA. In
Table 3
(d), (f), (h), and (j), many drop down spikes appeared in the hidden variable before 1000 time points, because of the shortterm missing values, but the patterns of first hidden variable among them are similar. Consequently, the proposed method clearly detects the high quality of the hidden variables from the recovered large dataset.
 3.4 Real Time Processing
In this section, we compared the execution time of the proposed method with those of other approaches. The execution time was measured for both missing value recovery and feature extraction. In the case of existing methods to recover missing values, these require a method to extract features such as PCA or ICA.
Comparison of execution time
In our experiment, we apply PCA to extract features from the recovered dataset, while measuring time complexity.
Fig. 5
shows the plot of execution time vs. the stream size. As a result, the execution times of the existing methods grow exponentially because PCA processes in batches. Although traditional EMPCA is not the extra feature selection method, it is performed too in batches. However, the execution time of our method only maintains nonincreasing operations since it is based on the gradual method. Our method updates a few variables such as weights and hidden variables, without recomputation of the overall data matrix, when a new signal is entered in each time point. Our method does not increase computation time even though the dataset size is gradually increased.
 3.5 Classification Accuracy
In this paper, we adjunctively evaluated the classification accuracy of the proposed method in the physiological dataset. To measure the classification accuracy, existing methods need additional feature selections or feature reduction steps to detect hidden variables except the traditional EMPCA. Therefore, we used PCA and ICA to detect hidden variables for the existing methods. For comparison, traditional EMPCA, MSVD, Linear Interpolation, and 5average are used to recover missing values, and hidden variables from the recovered signal are detected using PCA and ICA. Our method and traditional EMPCA can recover missing values and can detect a few hidden variables at the same time.
We used the recorded real datasets from 923 Patients to measure classification accuracy. For the experiments, we classified two classes: those that included AHE (314 Patients), and those that did not include AHE (609 Patients). Our method is able to offer real time processing by updating the weight vector at every time point. This weight vector is used to derive the hidden variables of a new input. However, the existing methods should be processed in the last time point because these methods are processed in batches
[19]
.
In PCA and ICA, the number of components can be determined by the energy rate. In our experiment, we used 95% energy rates to detect the principal components and independent components for PCA and ICA because our method used energy rates from 0.95 to 0.98. We measured the classification accuracy by 10fold crossvalidation with 5NN and SVM classifier.
Table 4
is the classification results from the detected hidden variables by PCA, ICA and our method. Our method achieves the 76% best result in SVM classifier when the recovered dataset is projected by PCA. In the case of 5NN classifier, Linear Interpolation shows the highest classification rate when the hidden variables were detected by ICA.
Classification accuracy
The proposed method itself without a feature selection shows 54% and 63% of classification accuracy on 5NN and SVM, respectively. Our method can recover missing values and detect hidden values in real time. In addition,
PhysioCover
extracts a smaller number of hidden variables than other approaches. On average, our method extracts 3 hidden variables, traditional EMPCA has 5 hidden variables, MSVD has 5 and 4 hidden variables, and Linear Interpolation extracts 6 hidden variables for both PCA and ICA. Finally, 5average detected 5 and 6 hidden variables on PCA and ICA, respectively. A smaller number of hidden variables can reduce memory usage and computation time by the classifiers.
4. DISCISSION
PhysioCover
combines the gradual method and EMbased PCA. It automatically recovers the contained missing values. The dataset we used includes acute hypotensive episodes (AHE). If the blood flow is too low to deliver enough oxygen and nutrients to vital organs, it can cause dangerous situations, such as fainting, visual impairment and multiple organ damage. Thus, if not promptly treated, AHE may result in irreversible vital organ damage and death. For this reason, the prediction or detection of AHE is a significant challenge.
Chen et al.
[7]
developed a method to predict which patients would experience an AHE prior to the occurrence of the AHE based on the weighted average of ABP. Henriques et al.
[5]
proposed the application of generalized regression neural network multimodels, which most effectively predicted AHE in intensive care units (ICU). Lehman et al.
[20]
carried out classification and forecasting tasks, using a similaritybased searching and pattern matching algorithm. More recently, Rocha et al.
[3]
proposed neural network multimodels to predict adverse AHE occurring in ICU.
These researches carried out preprocessing to recover missing values using linear interpolation, which is one of the traditional methods because their purpose was focused on predicting AHE. The signal at least one hour before occurrence of AHE is regarded as an important element for predicting an AHE event
[15]
. However, if the signal of this time section is missing, prediction will be less accurate. In the study for future values, prediction such as air pollution, gene expression, and traffic data, missing values are recovered through observed data, interpolation
[21]
, support vector regression (SVR)
[22]
, Bayesianbased PCA
[23]
, neural networks, an autoregressive integrated moving average (ARIMA), and regression model
[24]
besides the methods that are used in our experiment.
However, these methods require many input parameters, long execution time by batch processing, or complete data sets. Our method recovers missing values quickly and accurately from real time processing. We compared recovery methods of missing values. As the experimental result, our
PhysioCover
provided more robust recovering results when compared with existing methods. In addition,
PhysioCover
can summarize multidimension data by detecting a few hidden variables. Our method can detect hidden variable simultaneously with missing value recovery in each time tick.
Our proposed method has the advantage of realtime processing considering the characteristics of time series data. Although PCA
[25]
,
[26]
and ICA
[2]
,
[27]
,
[28]
are robust dimension reduction methods or hidden variable detection methods, they require long execution times by batch processing, as shown in
Figure 5
. Because our method recovers missing values and detects hidden variables in real time, it can be scaled for various types of time series data, such as econometrics, mathematical finance, weather forecasting, and earthquake prediction data, as well as physiological data.
5. CONCLUSIONS
We propose
PhysioCover
, which automatically recovers missing values, and summarizes physiological time series data consisting of multiple dimensions. It computes the optimal missing values, and identifies a specific pattern using detected hidden variables. The reduction of data based on hidden variables can be used as learning data. Moreover, the proposed method can reduce the processing complexity and memory requirements. This method provides better results than other recovery methods such as interpolation and MSVD. The proposed approach is a valuable method for the accurate analysis of physiological signals. Its effectiveness is demonstrated by the accurate recovery of missing values and the automatic detection of hidden variables from physiological signals.
In this paper, we only evaluated physiological time series data for missing value recovery. For further work, we will assess various types of multiple time series data, and will use a robust method that can treat multiway physiological data or multi modal physiological data.
Acknowledgements
“This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MEST)(2013056480)”, “This research was supported by the MSIP(Ministry of Science, ICT and Future Planning), Korea, under the ITRC(Information Technology Research Center) support program (NIPA2014H0301141014) supervised by the NIPA(National IT Industry Promotion Agency)”.
BIO
SunHee Kim
She received the B.S in Multimedia from Korean Educational Development Institute in 2004 and the M.S. degree in Computer Science from Dongguk University, Korea in 2006. She received the Ph. D. degrees in Computer Science from Chonnam National in 2011. She recently works in Chonnam National University as a researcher. Her research interests include Data Mining, Machine Learning and Bioinformatic.
HyungJeong Yang
She received her B.S., M.S. and Ph. D from Chonbuk National University, Korea. She is currently an associate professor at Dept. of Electronics and Computer Engineering, Chonnam National University, Gwangju, Korea. Her main research interests include multimedia data mining, pattern recognition, artificial intelligence, eLearning, and eDesign.
SooHyung Kim
He received his B.S. at Dept. of Computer Engineering, Seoul National University, and M.S. and Ph.D. at Dept. of Computer Science, Korea Advanced Institute of Science and Technology, Korea. He is currently a professor at Dept. of Electronics and Computer Engineering and a viceDean of the Engineering College, Chonnam National University, Gwangju, Korea.
GueeSang Lee
He received his BS in electrical engineering and his MS in computer engineering from Seoul National University, Seoul, Rep. of Korea, in 1980 and 1982, respectively. He received his PhD in computer science from Pennsylvania State University, University Park, PA, USA, in 1991. He is currently a professor of the Department of Electronics and Computer Engineering at Chonnam National University, Gwangju, Rep. of Korea. His main research interests are image processing, computer vision, and video technology.
Canento F.
,
Fred A.
,
Silva H.
,
Gamboa H.
,
Lourenço A.
2011
"Multimodal biosignal sensor data handling for emotion recognition,"
Proc. IEEE Sensors
647 
650
Rocha T.
,
Paredes S.
,
Carvalho P.D.
,
Henriques J.
2011
"Prediction of acute hypotensive episodes by means of neural network multimodels,"
Computers in Biology and Medicine
41
881 
890
DOI : 10.1016/j.compbiomed.2011.07.006
Stanimirova I.
,
Daszykowski M.
,
Walczak B.
2007
"Dealing with missing values and outliers in principal component analysis,"
Talanta
72
172 
178
DOI : 10.1016/j.talanta.2006.10.011
Henriques J.
,
Rocha T.
2009
"Prediction of acute hypotensive episodes using neural network multimodels,"
Computers in Cardiology
36
549 
552
Paalasmaa J.
,
Murphy D. J.
,
Holmqvist O.
2012
"Analysis of Noisy Biosignals for musical performance,"
Proc. IDA’12
241 
252
Chen X.
,
Xu D.
,
Zhang G.
,
Mukkamala R.
2009
"Forecasting acute hypotensive episodes in intensive care patients based on a peripheral arterial blood pressure waveform,"
Computers in Cardiology
36
545 
548
Papadimitriou S.
,
Sun J.
,
Faloutsos C.
2005
"Streaming Pattern Discovery in Multiple TimeSeries,"
Proc. VLDB’05
697 
708
Adams E.
,
Walczak B.
,
Vervaet C.
,
Risha P. G.
,
Massart D. L.
2002
"Principal component analysis of dissolution data with missing elements,"
International Journal of Pharmaceutics
234
169 
178
DOI : 10.1016/S03785173(01)009668
Roweis S.
1997
"EM algorithms for PCA and SPCA,"
Proc. NIPS’97
626 
632
Smith L.
2002
A Tutorial on Principal Components Analysis
Cornell University
USA
Sun J.
,
Papadimitriou S.
,
Faloutsos C.
2005
“Online Latent Variable Detection in Sensor Networks,”
Proc. ICDE’05
1126 
1127
Pan J. Y.
,
Kitagawa H.
,
Hamamoto M.
,
Faloutsos C.
2005
"AutoSplit: Fast and Scalable Discovery of Hidden Variables in Stream and Multimedia Databases,"
Proc. PAKDD’05
519 
528
http://physionet.org/challenge/2009/
Chiarugi F.
,
Karatzanis I.
,
Sakkalis V.
,
Tsamardinos I.
,
Dermitzaki Th.
,
Foukarakis M.
,
Vrouchos G
2009
"Predicting the Occurrence of Acute Hypotensive Episodes: The PhysioNet Challenge,"
Computers in Cardiology
36
621 
624
Zhao L.
,
Chai T.
,
Cong Q.
2006
"Operating Condition Recognition of Predenitrification Bioprocess Using Robust EMPCA and FCM,"
Proc. WCICA’06
9386 
9390
Troyanskaya O.
,
Cantor M.
,
Sherlock G.
,
Brown P.
,
Hastie T.
,
Tibshirani R.
,
Botstein B.
,
Altman R. B.
2001
"Missing value estimation methods for DNA microarrays,"
Bioinformatics
17
520 
525
DOI : 10.1093/bioinformatics/17.6.520
Doan X. T.
,
Srinivasan R.
,
Bapat P. M.
,
Wangikar P. P.
2007
"Detection of phase shifts in batch fermentation via statistical analysis of the online measurements: A case study with rifamycin B fermentation,"
Journal of Biotechnology
132
156 
166
DOI : 10.1016/j.jbiotec.2007.06.013
Lehman L.
,
Saeed M.
,
Moody G.
,
Mark R.
2008
"Similaritybased searching in multiparameter time series databases,"
Computers in Cardiology
35
653 
656
Wang X.
,
Li A.
,
Jiang Z.
,
Feng H.
2006
"Missing value estimation for DNA microarray gene expression data by support vector regression imputation and orthogonal coding scheme,"
BMC Bioinformatics
7
(32)
1 
10
Brock G. N.
,
Shaffer J. R.
,
Blakesley R. E.
,
Lotz M. J.
,
Tseng G. C.
2008
"Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes,"
BMC Bioinformatics
9
1 
12
DOI : 10.1186/14712105912
Sharma S.
,
Lingras P.
,
Zhong M.
2004
"Effect of missing values estimations of traffic parameters,"
Transportation Planning and Technology
27
119 
144
DOI : 10.1080/0308106042000218203
Milovanovic I.
,
Popovic D. B.
2012
"Principal Component Analysis of Gait Kinematics Data in Acute and Chronic Stroke Patients,"
Computational and Mathematical Methods in Medicine
2012
(649743)
1 
8
DOI : 10.1155/2012/649743
Lee J.
,
Mark R. G.
2010
"An investigation of patterns in hemodynamic data indicative of impending hypotension in intensive care,"
BioMedical Engineering OnLin
9
(62)
1 
17
Chawla M. P. S.
2009
"Detection of Indeterminacies in Corrected ECG Signals Using Parameterized Multidimensional Independent Component Analysis,"
Computational and Mathematical Methods in Medicine
10
85 
115
DOI : 10.1080/17486700802193153
Jiang X.
,
Zhang L.
,
Zhao Q.
,
Albayrak S.
2006
"ECG Arrhythmias Recognition System Based on Independent Component Analysis Feature Extraction,"
Proc. TENCON’06
1 
4