Advanced
Discriminative Power Feature Selection Method for Motor Imagery EEG Classification in Brain Computer Interface Systems
Discriminative Power Feature Selection Method for Motor Imagery EEG Classification in Brain Computer Interface Systems
International Journal of Fuzzy Logic and Intelligent Systems. 2013. Mar, 13(1): 12-18
Copyright ©2013, Korean Institute of Intelligent Systems
This is an Open Access article distributedunder the terms of the CreativeCommons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercialuse, distribution, and reproductionin any medium, provided the originalwork is properly cited.
  • Received : January 31, 2013
  • Accepted : March 15, 2013
  • Published : March 25, 2013
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
XinYang Yu
Seung-Min Park
Kwang-Eun Ko
Kwee-Bo Sim
kbsim@cau.ac.kr

Abstract
Motor imagery classification in electroencephalography (EEG)-based brain–computer interface (BCI) systems is an important research area. To simplify the complexity of the classification, selected power bands and electrode channels have been widely used to extract and select features from raw EEG signals, but there is still a loss in classification accuracy in the stateof- the-art approaches. To solve this problem, we propose a discriminative feature extraction algorithm based on power bands with principle component analysis (PCA). First, the raw EEG signals from the motor cortex area were filtered using a bandpass filter with μ and β bands. This research considered the power bands within a 0.4 second epoch to select the optimal feature space region. Next, the total feature dimensions were reduced by PCA and transformed into a final feature vector set. The selected features were classified by applying a support vector machine (SVM). The proposed method was compared with a state-of-art power band feature and shown to improve classification accuracy.
Keywords
1. Introduction
A brain–computer interface (BCI) is a non-muscular communication system that people can use to directly communicate their intentions from their brains to the environment [1 , 2] . The BCI system attaches a function to brain signals, thereby creating a new communication channel between the brain and external devices. This communication method is partially focused on the brain signal features extracted by the BCI system for device control and provides mutual interaction between the user and the system. Using different sensors and brain signals, many studies over the past two decades have evaluated the possibility that BCI systems could provide new augmentative technology without muscle control [3 - 8] . BCI systems have measured specific features of brain activity and translated them into device control commands. For example, an arbitrary limb movement changes the brain activity, such as electroencephalography (EEG), in the related cortex. In fact, even preparing to move and imaging a movement changes the so-called sensory rhythms. We can record α rhythm activity from sensorimotor areas, also called μ rhythm activity.
The decrease of oscillatory activity in a specific frequency band is called event-related desynchronization (ERD). Correspondingly, the increase of oscillatory activity in a specific frequency band is called event-related synchronization (ERS). The ERD/ERS patterns can be volitionally produced by motor imagery, which is the process of imagining the movement of a limb without actual movement [9] . In general, EEGs are recorded over primary sensorimotor cortical areas that often display 8–12 Hz ( μ rhythm) and 18–26 Hz ( β rhythm) activity. Several researchers have shown that people can learn to control the amplitude of μ / β rhythms in the absence of actual movement or sensation. Because μ / β rhythm changes are associated with normal motor/sensory function, they could be good signal features for BCI-based communication. Movement or preparation for movement, but typically not specific aspects of a movement such as its direction [10] , are typically accompanied by a decrease in μ and β activity over the sensorimotor cortex, particularly contralateral to the movement. Furthermore, the changes in μ / β rhythms also occur with motor imagery. Because people can change these rhythms without engaging in actual movements, these rhythms could serve as the basis for a BCI system. To improve classification accuracy, in this study, a stochastic analysis-based method is employed for optimal feature selection and a linear regression classifier is applied.
This paper is organized as follows. In Section 2, we briefly describe related works for discriminant power feature selection, such as Laplacian spatial filter and principal component analysis (PCA). We explain the discriminant power feature selection method and motor imagery pattern classification method in Section 3. A motor imagery EEG classification experiment is introduced and the results are discussed in Section 4. Finally, in Section 5, we conclude this paper and suggest future works for improving our work.
2. Related Works
- 2.1 Laplacian Spatial Filter
General EEG signal analysis in BCI systems consists of three major parts: preprocessing, feature extraction, and classification of the EEG mental tasks. In this study, the proposed method focuses on the feature extraction step. The initial procedure in feature extraction employs a spatial filter. The purpose of using a spatial filter is to reduce the effect of spatial blurring from the raw signals. Spatial blurring is an effect of the distance between the sensor and the signal sources in the brain and is caused by the inhomogeneities of the tissues between the brain areas. Several spatial filtering approaches have attempted to increase system fidelity. The most typical realization is a Laplacian filter, which consists of discretized approximations of the second-order spatial derivative of the two-dimensional Gaussian distribution on the scalp surface. A Laplacian filter attempts to invert the process that blurs the brain activities detected on the scalp. The approximations can be further simplified. For example, at each time point t , the weighted sum of the potential Si of the four nearest or next-nearest electrodes are subtracted from the potential Sh at a center electrode for the small and large Laplacian, respectively.
PPT Slide
Lager Image
Eq. (1) shows a description of the Laplacian filter. Sh and h are the input and output of the EEG signal, respectively, corresponding to electrode h . Si is the set of neighbor electrodes surrounding electrode h . Weight whi is a function of the distance between the electrode of interest h and its neighbor i , dhi .
PPT Slide
Lager Image
whi is the constant weight of the signal from electrode i of Sh , which could be calculated from Eq. (2), where dhi is the Euclidean distance from electrode i to h . In practice, this filter is often implemented by simply subtracting the average of the four next-nearest neighbors (i.e., the weight for each neighbor is 0.25).
The main purpose of the spatial filter is to derive a more faithful representation of the sources within the brain and/or to remove the influence of the reference electrode from the signal. The influence is sensitive to the Laplacian mapping size as shown in Figure 1 . In Figure 1 (a) and (b), the red and black dots indicate the location of the reference electrode and its spatial filter, respectively.
- 2.2 FIR Filter and Hamming Window
As described earlier, motor imagery has been shown to produce changes in the μ and β frequency band. EEG signals are filtered using a band pass filter, i.e., an finite impulse response (FIR) filter with a Hamming window for extracting the motor imageryrelated frequency bands (7–15 Hz and 18–22 Hz).
Figure 2 (a) shows raw EEG signals without the FIR filtering and Hamming window processing, while Figure 2 (b) shows the result of a filtered EEG signal by FIR filter.
The power bands are calculated from all electrodes of the extracted EEG with an epoch time of 1 second and a 0.5 second
PPT Slide
Lager Image
Spatial filters with different sized Laplacian sketch maps. (a)Asmall size of Laplacianmapping and (b) a large size of Laplacian mapping size.
PPT Slide
Lager Image
Finite impulse response (FIR) filtering with a Hamming window of a motor imagery electroencephalography (EEG) signal: (a) a raw EEG signal and (b) an EEG signal filtered by FIR with a Hamming window.
delay from each epoch out of 4 second, during which human subjects start imagining a motor task (such as left/right hand or foot movement) and then finish it. We used 20% of the dataset as the testing set. The remaining 64% and 16% of the dataset were alternately used for training and validation.
- 2.3 Principal Component Analysis (PCA)
Real-world data is often noisy because various signals from different perspectives are recorded and information is hidden in a few dimensions. PCA is a classical statistical method used to re-express the noisy data in a different framework. This linear transform has been widely used in data analysis and compression. PCA is based on the statistical representation of a random variable. Suppose that we have a random vector population,
PPT Slide
Lager Image
where
PPT Slide
Lager Image
and the population mean is denoted by
PPT Slide
Lager Image
Mean subtraction from each data dimension is necessary for performing PCA to ensure that the first principal component describes the direction of maximum variance. If mean subtraction is not performed, the first principal component might instead correspond more or less to the mean of the data. A mean of zero is needed to find a basis that minimizes the mean square error (MSE) of the approximation of the data. The singular value decomposition (SVD) of
PPT Slide
Lager Image
where the m × m matrix
PPT Slide
Lager Image
is a matrix of the eigenvectors of the covariance matrix
PPT Slide
Lager Image
matrix ∑ is an m × n rectangular diagonal matrix with nonnegative real numbers on the diagonal, and the n × n matrix
PPT Slide
Lager Image
is a matrix of the eigenvectors of the covariance matrix
PPT Slide
Lager Image
Under this condition, the principal component
PPT Slide
Lager Image
of a dataset
PPT Slide
Lager Image
can be defined as
PPT Slide
Lager Image
With the first N – 1 components, the N th component can be defined by subtracting the first N – 1 principal components from
PPT Slide
Lager Image
as shown in Eq. (4):
PPT Slide
Lager Image
We can achieve our goal of decorrelating the original dataset and reducing its dimension. Many methods exist for solving eigenvalues and corresponding eigenvectors, which is a nontrivial task. By ordering the eigenvectors in a descending order, we can create an ordered orthogonal basis with the first eigenvector (corresponding to the largest eigenvalue) having the direction of the largest variance of the data. In this way, we can find directions in which the dataset has the most significant amount among the multiple dimensions.
Our goal is to find a new matrix
PPT Slide
Lager Image
which is a dimensionalityreduced random variable dataset, such that the covariance matrix
PPT Slide
Lager Image
is a diagonal matrix, and each successive dimension in
PPT Slide
Lager Image
is rank-ordered according to variance from the
PPT Slide
Lager Image
Now, PCA allows us to find
PPT Slide
Lager Image
It assumes an orthogonal matrix that acts as a transition matrix or function, as in
PPT Slide
Lager Image
3. Discriminant Power Feature Selection Using PCA and Classification Using Support Vector Machine
- 3.1 Discriminant Power Feature Selection using PCA
In this study, the motor imagery EEG signals are represented as
PPT Slide
Lager Image
which were segmented from 1 second before the cue onset and 4 seconds after the cue onset of raw EEG signals for a total 5 seconds long time interval from 59 channels. The raw signals were sampled at 100 Hz, and they were spatially bandpassfiltered at 7–15 Hz and 18–22 Hz. We calculated a set of data that consists of a sample mean of
PPT Slide
Lager Image
and a covariance matrix of
PPT Slide
Lager Image
for each motor imagery state: {left/right hand, foot movement}. For PCA to work properly, we subtracted
PPT Slide
Lager Image
from each data dimension in
PPT Slide
Lager Image
This produced a dataset with a mean that is equal to zero.
Let
PPT Slide
Lager Image
be a transition matrix that consists of eigenvectors of the covariance matrix as the row vectors. It is applied to reduce the dimensionality of
PPT Slide
Lager Image
as shown in
PPT Slide
Lager Image
Our goal is to find a matrix
PPT Slide
Lager Image
The rows of
PPT Slide
Lager Image
are the principal components of
PPT Slide
Lager Image
Now, combining all the above information, we have
PPT Slide
Lager Image
If
PPT Slide
Lager Image
is orthogonally diagonalizable, it can be applied to the SVD such that
PPT Slide
Lager Image
is a diagonal matrix and
PPT Slide
Lager Image
is an orthogonal eigenvector that is diagonal to the symmetric matrix
PPT Slide
Lager Image
Then, the i th column of
PPT Slide
Lager Image
is the i th eigenvector of
PPT Slide
Lager Image
Therefore, we can rewrite Eq. (5) as
PPT Slide
Lager Image
In this case,
PPT Slide
Lager Image
so that
PPT Slide
Lager Image
By setting the principal components
PPT Slide
Lager Image
equal to the eigenvectors
PPT Slide
Lager Image
we can achieve dimensionality reduction.
The original vector
PPT Slide
Lager Image
was projected on the coordinate axes defined by the orthogonal basis. It was then reconstructed by a linear combination of the orthogonal basis vectors. Instead of using all the eigenvectors of the covariance matrix, we may represent the data in terms of only a few basis vectors of the orthogonal basis. If we denote the matrix having K first eigenvectors as rows by
PPT Slide
Lager Image
we can create a similar transformation. This means that we project the original data vector on the coordinate axes having dimension K and transform the vector by a linear combination of the basis vectors. This minimizes the MSE between the data and this representation with the given number of eigenvectors.
In this study, PCA was applied to reduce the dimensionality of the transformation matrix of the training set to
PPT Slide
Lager Image
which is used to calculate the final feature using Eq. (8), where the transposed matrix of
PPT Slide
Lager Image
has a dimension of k × N :
PPT Slide
Lager Image
The output feature
PPT Slide
Lager Image
is reduced to the k dimension ( k N ), which is determined by varying the number of k from 1 to N according to the arranged variance of the PCA in descending order of the training set to find the best model. The model that generates the best classification accuracy using a support vector machine (SVM) on the validation set is selected as the final model for the feature reduction method and applied to the testing set to test the proposed method.
- 3.2 Classification of Discriminant Power Feature Using SVM
The classification of the discriminant power feature is the most important step in analyzing motor imagery EEG signals. After we selected the optimal features as described in the previous section, we calculated the Euclidean distances between the motor imagery classes {left/right hand, foot movement} to apply them using a SVM.
In this study, the SVM algorithm is used by receiving input data during a training phase and shown good performance in classification phase. Thus, we built a classifier that could have been used to predict future data [11 - 13] . By using the SVM with dimensionality reduction, the system shows improved classification accuracy compared with previous approaches. To employ an SVM to classify the discriminant power feature of motor imagery EEGs, we must consider the following.
Given two-class training samples,
PPT Slide
Lager Image
( xi is the feature vector and m denotes the dimensionality of input space) and yi ∈ {1, -1} denotes the class label of xi . In the SVM training procedure, the optimal hyperplane has to be found, which can maximize the margin that separates the two-class samples. To minimize the problem, a convex quadratic program (QP) is commonly used.
We determined the Lagrange multipliers
PPT Slide
Lager Image
that maximize the objective function, f ( x ):
PPT Slide
Lager Image
PPT Slide
Lager Image
where C is a positive constant specified by the user and K is a kernel function. In this study, we applied a Gaussian kernel function:
PPT Slide
Lager Image
where σ is a constant specified by the user.
From Eq. (9), we can see that the size of the QP problem is equal to the number of training samples. SVMs are usually slow, especially for a large problem. Solving the above Lagrange multiplier, we can obtain the below decision function:
PPT Slide
Lager Image
where b is a bias.
From Eq. (10), we know that 0 ≤ αi * C holds for i = 1, 2, . . . , l . All the training samples are support vectors (SVs), and they correspond to αi * ˃ 0. Let αi * ˃ 0 for i = lsv + 1, lsv + 2, . . . , l . Thus, Eq. (12) could be rewritten as
PPT Slide
Lager Image
4. BCI Experiment for Motor Imagery EEG Classification
In this study, we used the BCI competition III-dataset IVa in which five subjects (aa, al, av, aw, and ay) imagined a right hand and foot movement. We also used the BCI competition IV dataset I, which involved four subjects (a, b, f, and g); two of them imagined a right and left hand movement, and two others imagined a left hand movement and foot movement. BCI III contained 280 trials for each subject, and BCI IV contained 200 trials for each subject. All the datasets were normalized to 59 electrodes, and the epoch time was selected from 0–4 seconds, for which the stimuli was given to subjects at 0 second. The raw signals were segmented from 1 second before the cue onset (0 second) and 4 second after the cue onset, so in total, they
PPT Slide
Lager Image
Flowchart of discriminant power feature selection and motor imagery EEG classification. EEG, electroencephalography; PCA, principle component analysis; SVM, support vector machine.
Experimental results of model selection for each subject with its best classification accuracy
PPT Slide
Lager Image
Experimental results of model selection for each subject with its best classification accuracy
included 5 second long time intervals from 59 channels. As the result, the power feature of the original signal
PPT Slide
Lager Image
can be regarded to have a dimension of [59 × 2 × 7].
Figure 3 illustrates the generalized process for discriminant feature selection with K dimensions and discrimination of the states of motor imagery EEGs. The original power features of
PPT Slide
Lager Image
have a dimension of [59 × 2 × 7], so we have to find the dimensionality-reduced discriminant feature vector from them. K is the dimension of the discriminant feature vector for a new power orthogonal basis, so K is the dimension of the principal component of
PPT Slide
Lager Image
Table 1 shows the results of model selection of each subject with its best classification accuracy on the validation set. Because of individual differences, every subject has a different value of K . Thus, K is the number of components. In this case, we determine the accuracy rate of classification, pacc , as follows:
PPT Slide
Lager Image
For example, when subject aa has K = 32 based on the proposed discriminative feature extraction method, pacc is 75%.
PPT Slide
Lager Image
Experimental results of accuracy rate compared with previous approaches (type 1, type 2).
This means the 32nd dimension contains 75% of the total features. We can see that the EEGs are subject-dependent, and the value of K needed for each subject to obtain its best performance is naturally different for each subject. To compare the performance of our method with that of other approaches, we simulated state-of-art methods using the EEG feature extraction algorithm. First, we call type 1 the band power where all channels are the same, meaning all electrodes with average power have a 4s epoch. Type 2 is the method of extracting the band power feature based on the selected channels ( C 3 , C 4 , C z ), which are related to motor imagery neuroscience studies.
As shown in Figure 4 , the proposed method outperformed the other methods. In case of type 1, it outperformed for subjects av, aw, ay, a, and f, and in the case of type 2, it outperformed for subjects al, ay, b, and g. The proposed method outperformed both the type 1 and type 2 methods.
5. Conclusion
To simplify the complexity of motor imagery EEG analysis in BCI systems, we proposed a discriminant feature selection method for motor imagery using an EEG-based BCI system. The proposed method is based on PCA and SVM. By applying PCA, we can successfully achieve our goal for discriminant feature selection, which is to decorrelate the original dataset of motor imagery EEGs and reduce its dimensionality while maintaining the discriminants. The selected features are in the form of a final feature vector set that we apply using a SVM as a classifier. By comparing the proposed method to previous methods, we found that the proposed method enhanced the availability of features up to 8% for each subject.
In the future, we will investigate other approaches for optimal feature selection without loss of performance. Although the proposed method improved the classification progress, the accuracy of the classification did not reach our goal. To improve the overall classification performance, we will study non-linear dynamical analysis approaches instead of stochastic analysis for brain signals.
- Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This work was supported by the Mid-career Researcher Programthrough an NRF grant funded by the MEST (No. 2012-0008726).
References
Santhanam G. , Ryu S. I. , Yu B. M. , Afshar A. , Shenoy K. V. 2006 “A high-performance brain-computer interface” Nature 442 (7099) 195 - 198    DOI : 10.1038/nature04968
Wolpaw J. R. , Boulay C. B. 2010 “Brain signals for braincomputerinterfaces” in Brain-Computer Interface: TheFrontiers Collection, B. Graimann, G. Pfurtscheller, and B.Allison, Eds. Springer-Verlag Heidelberg 29 - 46
Wolpaw J. R. , Birbaumer N. , McFarland D. J. , Pfurtscheller G. , Vaughan T. M. 2002 “Brain-computer interfacefor communication and control” Clinical Neurophysiology 113 (6) 767 - 791    DOI : 10.1016/S1388-2457(02)00057-3
Gao X. , XU D. , Cheng M. , Gao S. K. 2003 “A BCIbasedenvironmental controller for the motion-disabled” IEEE Transactions on Neural Systems and Rehabilitation Engineering 11 (2) 137 - 140    DOI : 10.1109/TNSRE.2003.814449
Donoghue J. P. , Nurmikko A. , Black M. , Hochberg L. R. 2007 “Assistive technology and robotic control usingmotor cortex ensemble-based neural interface systemsin humans with tetraplegia” The Journal of Physiology 579 (3) 603 - 611    DOI : 10.1113/jphysiol.2006.127209
Farwell L. A. , Donchin E. 1988 “Talking off the top of yourhead: toward a mental prosthesis utilizing event-relatedbrain potentials” Electroencephalography and Clinical Neurophysiology 70 (6) 510 - 523    DOI : 10.1016/0013-4694(88)90149-6
Hochberg L. R. , Serruya M. D. , Friehs G. M. , Mukand J. A. , Saleh M. , Caplan A. H. , Branner A. , Chen D. , Penn R. D. , Donoghue J. P. 2006 ”Neuronal ensemblecontrol of prosthetic devices by a human with tetraplegia” Nature 442 (7099) 164 - 171    DOI : 10.1038/nature04970
Le J. , Gevins A. S. 1993 “Method to reduce blur distortionfrom EEG’s using a realistic head mode” IEEE Transactions on Biomedical Engineering 40 (6) 517 - 528    DOI : 10.1109/10.237671
Hjorth B. 1991 “Principles for transformation of scalp EEG frompotential field into source distribution” Journal of Clinical Neurophysiology 8 (4) 391 - 396
McFarland D. J. , Neat G. W. , Read R. F. , Wolpaw J. R. 1993 “An EEG-based method for graded cursor control” Psychobiology 21 (1) 77 - 81
Sangeetha R. , Kalpana B. 2011 ”Performance evaluationof kernels in multiclass support vector machines” International Journal of Soft Computing and Engineering 1 (5) 138 - 145
Hassan A. , Damper R. I. 2012 ”Classification of emotionalspeech using 3DEC hierarchical classification” Speech Communication 54 (7) 903 - 916    DOI : 10.1016/j.specom.2012.03.003
Guler I. , E. D. Ubeyli 2007 “Multiclass support vector machinesfor EEG-signals classification” IEEE Transactions on Information Technology in Biomedicine 11 (2) 117 - 126    DOI : 10.1109/TITB.2006.879600