Advanced
Chaotic Features for Traffic Video Classification
Chaotic Features for Traffic Video Classification
KSII Transactions on Internet and Information Systems (TIIS). 2014. Aug, 8(8): 2833-2850
Copyright © 2014, Korean Society For Internet Information
  • Received : December 20, 2013
  • Accepted : May 14, 2014
  • Published : August 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Yong Wang
School of Aeronautics and Astronautics, Shanghai Jiao Tong University Shanghai, 200240 - China
Shiqiang Hu
School of Aeronautics and Astronautics, Shanghai Jiao Tong University Shanghai, 200240 - China

Abstract
This paper proposes a novel framework for traffic video classification based on chaotic features. First, each pixel intensity series in the video is modeled as a time series. Second, the chaos theory is employed to generate chaotic features. Each video is then represented by a feature vector matrix. Third, the mean shift clustering algorithm is used to cluster the feature vectors. Finally, the earth mover’s distance (EMD) is employed to obtain a distance matrix by comparing the similarity based on the segmentation results. The distance matrix is transformed into a matching matrix, which is evaluated in the classification task. Experimental results show good traffic video classification performance, with robustness to environmental conditions, such as occlusions and variable lighting.
Keywords
1. Introduction
T raffic monitoring is a fundamental issue confronting many urban centers. A key step in addressing this issue is to gather real-time information on traffic flows. Traditional solutions have mainly involved burying inductive-loop detectors underneath roads to count vehicles traveling over them. However, such methods are becoming less feasible because of installation costs and the disruption of roadways.
Video technology serves an increasingly important function in traffic monitoring systems [1 6 , 18 , 19] . The capability to monitor traffic flow automatically helps in reducing the workload of human operators, identifying illegal vehicles, and providing forensic clues, such as vehicle speed and traffic congestion. Building and using large camera networks to monitor traffic can reduce the number of accidents and traffic jams in urban highways.
Two approaches can be employed for traffic video classification. The first method is based on vehicle detection and tracking and involves three steps [2 , 3 , 20 22 , 26] . First, vehicles are detected by motion segmentation [2] or background subtraction [3 , 20] . Second, vehicles are tracked by various tracking algorithms [20 , 22 , 24 , 25] , such as rule-based reasoning [2] and the Kalman filter [3] . Third, trajectories are represented as curves or vehicle attributes (area, pattern, and direction). The second method models traffic flow holistically to avoid the need to track vehicles [1 , 23] . In [4] , features that describe traffic speed and density from MPEG video data are extracted as training sets. The training data are then learned by the Gaussian mixture and hidden Markov models to detect traffic conditions. The maximum likelihood criterion is used to calculate the confidence score, which determines the classification.
The aforementioned methods have the following disadvantages: (i) Motion detection is difficult to implement under varying environmental conditions, especially in crowd scenarios, lighting changes, and occlusions. (ii) Low resolution poses a great challenge for tracking. The drawback of the second approach is that extracting reliable motion cues attributed to traffic scenarios, especially congestion, is difficult to achieve. To overcome these drawbacks, new modeling techniques have been established. Linear dynamic systems (LDS) have been employed to model traffic flows. In [5] , the autoregressive (AR) stochastic process with spatial and temporal components is employed to model the traffic flow, and classification performance shows promising results. Linear dynamic systems usually assume the model first-order Markov property or linearity, which restricts the modeling of adverse traffic video scenes.
Notably, the LDS-based model covers traffic flow in a holistic manner to some degree, but loses the motion information of the video. Meanwhile, the traditional pixel model preserves all spatial information, but typically fails to capture temporal information. To cover the integral video without losing the spatial information, the pixel intensity series is proposed as the video descriptor. When a car goes through the surveillance area, the pixel intensity series changes. The changes become more frequent and intense as more cars go through. Changes in the pixel intensity series also indicate traffic conditions, such as light traffic, medium traffic, or heavy traffic.
However, the constraint pixel intensity series has several limitations that prevent its use in this work. First is the alignment problem. The alignment algorithm should be applied to compare the pixel intensity series. Second is that the raw pixel intensity series includes more information than needed, similar to pixels in image analysis. Pixels in an image represent all image information, whereas local descriptors are developed to depict the image precisely [32 , 33] . The relationships among pixels are used to formulate a hypergraph for image classification [30] . Pixels are combined with contextual cues to detect salient regions [31] . Both approaches use pixels with other information to achieve better performance. Unlike the case in image analysis, motion information is important in video analysis. Numerous methods have been proposed for analyzing the time series. these methods include autoregressive models, moving average models, and autoregressive moving average models. Nonlinearity poses a great challenge for these methods. The chaos theory is chosen to characterize the time series to overcome the difficulty in identifying the model of the time series and to represent the time series accurately. The chaos theory has been studied for several decades. The theory is widely used in the field of econometrics and weather forecasting for its capability to characterize nonlinear systems effectively. The chaos theory has recently been introduced to the computer vision community for action recognition [27] , anomaly detection [28] , and dynamic scene recognition [29] .
This paper proposes a framework to model the pixel intensity series for a holistic comparison of traffic conditions. The function of chaotic features in the representation of traffic videos is studied. The problem of finding generalizable methods for characterizing unconstraint traffic videos through a proposed feature vector is addressed. Finally, we demonstrate how the feature vector facilitates meaningful traffic video organization and accurate traffic video classification.
The remainder of the paper is organized as follows: Section 2 provides the workflow of the framework. Section 3 introduces the concept of the chaos theory and the chaotic features used in this work. Section 4 presents the feature vector clustering algorithm. Section 5 describes the feature matching algorithm. Section 6 presents the experimental results and discussions. Finally, Section 7 concludes the paper.
2. Overview of the Framework
Fig. 1 shows the change in the pixel intensity series over time. The central part of the figure shows one frame from a traffic video and the change in the intensity of four pixels over time. The x-axis denotes time, whereas the y-axis denotes the gray value. Accordingly, the video can be segmented into static and traffic parts based on the pixel intensity series. The different parts and the cluster center are conjectured to be two important clues for traffic condition categorizations. Therefore, the video is composed of a W * L matrix of time series, where W and L are the dimensions of the frames in the video.
PPT Slide
Lager Image
Pixel intensity change in a traffic video over time
Fig. 2 shows a summary of our algorithm. Details on the generation of feature vectors and feature matching are presented. Each pixel intensity series is modeled as a chaotic time series in the traffic video, and chaotic features are extracted. These features are used to form a feature vector. A traffic video is represented by a feature vector matrix. The mean shift algorithm is applied to the feature vector matrix to summarize the distribution of the feature vectors in the form of a signature that consists of cluster centers and relative weights. The signature is a descriptive representation of the distribution of feature vectors in each video. Earth mover’s distance (EMD) is employed to compare signatures in different videos. The entries of an EMD matrix record the similarity between each pair of signatures in the video dataset for use in classification tasks.
PPT Slide
Lager Image
Flow chart of the proposed algorithm
3. Chaotic Features
The chaotic features are introduced based on the chaos theory [7] . Embedding refers to the mapping from a one-dimensional space to an m-dimensional space. Dynamical systems are characterized as mapping functions that describe how variables change over time, i.e., x(t)=f(x(t–1)). The state variable x(t)=[x 1 (t),x 2 (t),⋯,x n (t)]∈R n defines the status of the system at time t. Takens’ theorem [8] states that an embedding exists from an original state space to a reconstructed state space. The underlying idea is that, for a sufficiently large embedding dimension m and embedding delay τ, the vector x′(t)=[x i (t),x i+τ (t),⋯,x i+mτ (t)] performs the same functions as the original variables of the system. The embedding delay τ is computed by using mutual information [9] . The embedding dimension d is computed by using the false nearest neighbors [10] . Once the two variables are determined, the state x(t) can be written into a matrix
PPT Slide
Lager Image
- 3.1 Box Counting Dimension
The box counting dimension [7] presents an upper bound on the Hausdorff dimension that characterizes the self-similarity of a set. If a point set is covered by a regular grid of boxes of length r , and N ( r ) is the number of boxes which contain at least one point, then for a self-similar set,
PPT Slide
Lager Image
Db is called the box counting dimension.
- 3. 2 Information Dimension[7]
The information dimension specifies how this amount of information scales with the radius ε, which is defined as
PPT Slide
Lager Image
where μ is a fractal measure defined in state space, p ε (x) denotes the probability of finding a typical trajectory in a ball of radius ε around x, and 〈lnp ε μ is the average Shannon information needed to specify a point x with accuracy ε.
- 3. 3 Correlation Dimension (CD)[7]
The CD measures the change in the density of the phase space with respect to the neighboring point within a radius ϵ and can be calculated as the slope of a graph by plotting lnc(ϵ) and lnϵ.
PPT Slide
Lager Image
- 3. 4 Feature Vector
The standard variance of the pixel intensity series encodes the fluctuation information of the time series. Such information is important for classification. The embedding dimension and embedding delay characterize the geometry structure of the pixel intensity series. The standard variance S is integrated with the chaotic features in the feature vector, F={ τ, m, D c , D b , D I , SV }. Given a W * L * T sequence, W , L , and T are the width, length, and time dimension of the sequence, respectively. The chaotic features of each pixel intensity series are extracted, and the video is represented by a W*L*6 dimensional feature matrix.
4. Feature Clustering
Several clustering algorithms can be used for feature clustering. Unlike the k-means and the Gaussian mixture model that need to pre-define the number of clusters, the mean shift algorithm can automatically cluster the features with only one parameter having to be fixed. The parameter, bandwidth, is easy to determine because it has a physical meaning. Therefore, the mean shift algorithm [11] [12] is used for feature clustering.
Given n feature vectors f i ,i=1,⋯,n in the d-dimensional space R d , the mean feature vector is given by
PPT Slide
Lager Image
where the profile of kernel G is defined as a function g: [0,∞)→R, such that G(f)=g(‖f‖ 2 ), and profiles k and g satisfy g(f)=−k′(f). H is a symmetric positive definite d*d bandwidth matrix. w(f i )≥0 is the weight of sample points. The goal of mean shift clustering is to identify the local maxima feature center f c and assign a label to each feature.
The mean shift algorithm is shown as follows:
  • (1) The number of search windows is defined at a random location in the feature space.
  • (2) The initial feature vector f0is chosen.
  • (3) The neighbors of point f0are those within a kernel window centered at f0. The mean shift vector is found as a weighted sum of neighbors, f1= f0+M(f), where M(f) is the mean shift vector at point f0and is computed by Equation 5.
  • (4) Step 3 is repeated until the mean shift vector is considered to be the zero vector because its magnitude is less than a predetermined threshold. Therefore, fnis the mode of the component to which the point f0belongs.
  • (5) The feature vectors with similar modes (P’) are then merged into components.
  • (6) Class labels are assigned to clusters.
5. Feature Matching
An appropriate similarity measure has to be defined to compute for the similarities between videos that are represented by feature clusters. The EMD algorithm [13] , which compares similarities among images, perform promising results in several applications, such as content-based image retrieval and texture classification [14] . The feature cluster representation of a set of clusters is similar to the signature representation, which is defined as a set of k clusters and relative weights. Therefore, the EMD algorithm is applied to compute for the feature cluster similarities, as shown in Fig. 3 . The feature cluster can be seen as a signature, e.g., ((p i ,wp i )|1≤i≤m), where each cluster is represented by the mean feature vector pi and the weight of the feature vector wpi. Computing the EMD cost is based on a solution to the transportation problem [15] . Matching feature clusters can be naturally cast as a transportation problem by defining one feature cluster in a feature vector matrix as the supplier and the other as the consumer, as well as by setting the cost for a supplier–consumer pair to be equal to the ground distance between an element in the first feature cluster and an element in the second.
PPT Slide
Lager Image
Example of EMD-based matching between two feature clusters P and Q; lines indicate the flow between the two clusters
Let
PPT Slide
Lager Image
be two feature clusters, where p i and q i are the mean feature cluster, wp i and wq i are the weights of the feature cluster, and m and n are the number of feature clusters. The distance is defined as
PPT Slide
Lager Image
where D={d ij } is the distance between the two feature cluster p i and q j . F=[f ij ] is the flow between p i and q j . Equation 6 is governed by the following constraints:
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
The EMD cost is then used in the form of a Gaussian kernel as follows:
PPT Slide
Lager Image
where ρ is the kernel parameter. The matching matrix used for traffic video classification is obtained by Equation 11.
6. Experimental Results
- 6.1 Experiment Setup
A dataset consisting of 225 traffic videos in four different surveillance areas was acquired. The dataset contains a variety of traffic scenes. Fig. 4 shows sample images from each dataset. Each video clip shrinks to a 50*50 resolution with 50 frames. The surveillance area of Dataset 1 (78 videos) is an intersection during the daytime. The surveillance area of Dataset 2 (50 videos) is an intersection at night. Pedestrians were seen across the road in the two datasets, and vehicles stopped at the red light in Dataset 2. The surveillance area of Dataset 3 (30 videos) is characterized by a light change, which significantly affects the segmentation results. The surveillance area of Dataset 4 (67 videos) is similar to that of Dataset 1, but in a different intersection. The surveillance camera was mounted at a low position. Thus, the vehicle shapes changed significantly as the vehicles approached the camera.
PPT Slide
Lager Image
Examples from our dataset
The ground truth classification for each video clip is determined manually. The dataset is classified into four categories according to traffic conditions: red, light, medium, and heavy. Red signifies the occurrence of a red light. Light signifies few vehicles on the road. Medium signifies several vehicles on the road. Heavy signifies the occurrence of traffic jams.
For the classification strategy, the K nearest neighbor classifier with k=5 is chosen as the classifier, and the one vs. all classification strategy is employed. At each time, one video is chosen for testing and the rest are used for training.
Fig. 5 illustrates part of the computed feature results. The pixel intensity series in different positions shows different chaotic features.
PPT Slide
Lager Image
Computed features
Fig. 6 provides an example of our segmentation result. Neighboring feature vectors with similar values are clustered. Traffic roads are separated with buildings. Segmentation results vary with different traffic conditions. The EMD algorithm is then employed to compare the traffic conditions according to the segmentation results.
PPT Slide
Lager Image
Segmentation results
- 6.2 Traffic Classification Results
Fig. 7 shows the confusion matrix of our dataset. The overall classification performance is 73.33%. When the dataset is separated, the classification results for Datasets 1, 2, 3, and 4 are 65.38%, 64%, 83.33%, and 85.07%, respectively. The proposed scheme can approximately classify different traffic conditions.
PPT Slide
Lager Image
Confusion matrix for our datasets
As shown in Fig. 7 , the majority of misclassifications occur between neighboring classes (i.e., light vs. medium, and medium vs. heavy), which is reasonable for such matches because of the indeterminate nature of category boundaries and the corresponding ambiguities of generating ground truth. Furthermore, different surveillance areas and lighting conditions will deteriorate such scenarios. In Datasets 1 and 2, the traffic condition is more complex than that in Datasets 3 and 4. Hence, the boundary effect is more evident.
The reason for the confusion of light, heavy, and red is that light traffic largely depicts the background (i.e., few cars are present), whereas heavy traffic and red depict cars that are virtually at a standstill. From the viewpoint of the change in pixel intensity series, both scenes are similar, thus making mismatches reasonable, especially in Dataset 2 under night time conditions in which the color of the cars will be similar to the color of the road.
The classification rates in Datasets 1 and 2 are lower than that for the other two datasets. The reason for Dataset 1 is that several people occasionally cross the road before the red light flashes, which results in segmentation failures. The reason for Dataset 2 is that the colors of the cars at night are similar to that of the road, which also results in segmentation failures. In most cases, the proposed method can match different traffic conditions accurately.
The ground truth of the overall dataset and our classification results are shown in Fig. 8 . The misclassified results are highlighted with red squares.
PPT Slide
Lager Image
Classification of traffic videos: (a) Ground truth; (b) Classification results of our method. Errors are highlighted with red squares
Part of the classification results are shown in Figs. 9 , 11 , 13 , and 15 . The classification results are mainly determined by the EMD algorithm, which compares the segmentation between videos. As shown in Fig. 5 , different parts of each video vary significantly. Different cluster centers are important factors affecting the EMD results. Several representative segmentation results with original frames are shown in Figs. 10 , 12 , 14 , and 16 .
PPT Slide
Lager Image
Video classification results for Dataset 1. (a) A correct classification result; (b) A wrong classification result.
PPT Slide
Lager Image
Example frames of segmentation results. (a) Segmentation result of a light traffic condition; (b) segmentation result of a medium traffic condition; (c) Segmentation result of a heavy traffic condition.
PPT Slide
Lager Image
Video classification results for Dataset 2. (a) A correct classification result; (b) A wrong classification result.
PPT Slide
Lager Image
Example frames of segmentation results. (a) Segmentation result of the traffic light is on red; (b) Segmentation result of the light traffic condition.
PPT Slide
Lager Image
Video classification results for Dataset 3. (a) A correct classification result; (b) A wrong classification result.
PPT Slide
Lager Image
Example frames of segmentation results. (a) Segmentation result of a light traffic condition; (b) Segmentation result of a light traffic condition in sunlight.
PPT Slide
Lager Image
Video classification results for Dataset 4. (a) A correct classification result; (b) A wrong classification result.
PPT Slide
Lager Image
Example frames of segmentation results. (a) Segmentation result of two paths; (b) Segmentation result of the road.
In Fig. 9 , (a) is a correct classification result, whereas (b) is a wrong classification result. Fig. 9 (b) shows that the medium condition is confused with the light condition. In the third and sixth columns, the videos are under light traffic condition. However, pedestrians crossing the road, as well as vehicles and motors traveling, affect the segmentation result. As a result, the two videos are similar to the medium condition.
In Fig. 10 , (a) shows a light traffic condition with a few cars passing through quickly. The cars can be separated from the road. Fig. 10 (b) shows a medium traffic condition, and the segmentation result shows that the cars integrate with part of the road, thus indicating the presence of more cars. Fig. 10 (c) shows a heavy traffic condition. Several cars are passing through the road. Pixels of cars dominate each pixel intensity series. The segmentation result shows numerous connected parts.
In Fig. 11 , (a) is a correct classification result, whereas (b) is a wrong classification result. Fig. 11 (b) shows that the red light condition is confused with the heavy condition. In the first, fourth, and sixth columns, the videos are in red light, and pedestrians are crossing the road. In the remaining columns, a traffic jam occurs while the vehicles are on the road.
In Fig. 12 (a), the traffic light is on red, and pedestrians are crossing the road. The segmentation result shows the path of the people. In Fig. 12 (b), the cars are traveling at a slow speed, which caused the car pixels to occupy a large part of the whole pixel intensity series. The property of light traffic condition pixel intensity series is similar to that of the heavy traffic condition. The segmentation result shows several cars passing through, but only a few cars pass through at low speeds.
In Fig. 13 , (a) is a correct classification result, whereas (b) is a wrong classification result. In the wrong classification results, sunlight varied significantly in the videos, thus affecting the segmentation results.
In Fig. 14 (a), several cars pass through, and the road is separated. In Fig. 14 (b), sunlight appears and significantly affects the segmentation result.
In Fig. 15 , (a) is a correct classification result, whereas (b) is a wrong classification result. Part of the heavy traffic condition is defined as that in which several cars turn right or left. Fig. 15 (b) shows traffic jams occurring in the video of the first column and vehicles stopping for a long time, the motion information of which is similar to the light traffic condition.
In Fig. 16 (a), several cars proceed northward and from east to south. The segmentation result shows the two paths. In Fig. 16 (b), the segmentation result mainly shows the road and other backgrounds.
The experiments show that our proposed framework can effectively classify different traffic conditions and is robust to occlusion, low resolution, and sunlight. The segmentation results and feature vector ensure classification accuracy.
- 6.3 Comparison
LDS [16] is applied to the dataset described above. LDS is a parametric model for spatio-temporal data and can be represented by
PPT Slide
Lager Image
PPT Slide
Lager Image
where x(t) is the hidden state vector; z(t) is the observation vector; and w(t) and v(t) are noise components that are modeled as normal with 0 mean and covariance R and Q, respectively. In this work, A is a state-transition matrix, and C is the observation matrix. Let [z(1),z(2),…z(τ)]=UΣV T , T be the singular value decomposition of the data matrix for τ observations. Then, the model parameters are calculated [16] as
PPT Slide
Lager Image
where D 1 =[0 0;I τ−1 0] and D 1 =[I τ−1 0;0 0]. The distance metric used was based on subspace angles [17] . The overall classification performance is 30.67%.
Overall, two results were observed for the traffic video representations when evaluated on the datasets. Our proposed feature vector outperforms the LDS approach. The poor performance of the LDS on the dataset can be explained by the viewpoint and illumination change.
A significant limitation of the LDS approach is that the metrics used for comparing traffic video are not designed to be invariant to changes in viewpoint and scale. As a consequence, these methods perform poorly when videos contain traffic scenarios with such variabilities.
Another shortcoming is that the choice of the metric used in these approaches requires that the training and testing data have the same number of pixels. This requirement poses a challenge when one wants to compare local regions of a video sequence, thus adding additional overhead for normalizing all video sequences to the same spatial size.
7. Conclusions
This paper introduced a feature vector matrix representation of traffic videos. Such representation measures traffic conditions under varying environmental conditions and at low resolutions. Compared with most extant approaches, the proposed approach has two advantages: (1) non-reliance on tracking or optical flow estimation and (2) robustness to lighting variation. The experiment was performed on a traffic dataset that we collected, which allowed us to test the descriptive power of our proposed feature vector. Classification results demonstrate that the algorithm is effective.
Acknowledgements
The authors would like to thank the anonymous reviewers for their constructive comments. This work was partly supported by the National Natural Science Foundation of China "61374161"and "61074106".
BIO
Yong Wang is a Ph.D. candidate in control science and engineering in the School of Aeronautics and Astronautics at Shanghai Jiao Tong University. His research interests include visual tracking, pattern recognition, and machine learning.
Shiqiang Hu is a Professor and the Chairman of the Department of Aerospace Information and Control at Shanghai Jiao Tong University. He received his M.S. (1998) and Ph.D. (2002) degrees at Beijing Institute of Technology both in electronics and information technology. His research areas include intelligent information processing, image understanding, and nonlinear filtering.
References
Yu X , Duan Ling-Yu , Tian Qi 2002 “Highway traffic information extraction from Skycam MPEG video” in Proc. of the IEEE Conference on. Intelligent Transportation Systems Sept.6-6 37 - 42
Cucchiara R. , Piccardi M. , Mello P. 2000 “Image Analysis and Rule-Based Reasoning for a Traffic Monitoring System” IEEE Transactions. on Intelligent Transportation Systems Article (CrossRef Link) 1 (2) 119 - 130    DOI : 10.1109/6979.880969
Jung Y. K. , Lee K. W. , Ho Y. S. 2001 “Content-Based Event Retrieval Using Semantic Scene Interpretation for Automated Traffic Surveillance” IEEE Transactions. on Intelligent Transportation Systems Article (CrossRef Link) 2 (3) 151 - 163    DOI : 10.1109/6979.954548
Porikli F. , Li X. 2004 “Traffic Congestion Estimation Using HMM Models without Vehicle Tracking” In IEEE Intelligent Vehicle Symposium June 14-17 188 - 193
Chan A. B. , Vasconcelos N. 2005 “Classification and Retrieval of Traffic Video using Auto-Regressive Stochastic Processes” IEEE Intelligent Vehicles Symposium June 6-8 771 - 776
Stallkamp J. , Schlipsing M. , Salmen J. , Igel C. 2012 “Introduction to the Special Issue on Machine Learning for Traffic Sign Recognition,“ IEEE Transactions on Intelligent Transportation Systems Article (CrossRef Link) 13 (4) 1481 - 1483    DOI : 10.1109/TITS.2012.2225192
Kantz H. , Schreiber T. 1997 Nonlinear Time Series Analysis Cambridge University Press Cambridge
Taken F. 1981 “Detecting Strange Attractors in Turbulence” Lecture Notes in Mathematics, ed D. A.Rand & L. S. Young
Fraser A. M. , Swinney H. L. 1986 “Independent Coordinates for Strange Attractors from Mutual Information” Physical Review A Article (CrossRef Link) 33 (2) 1134 - 1140    DOI : 10.1103/PhysRevA.33.1134
B.Kennel M. , Brown R. , Abarbanel H. D. I. 1992 “Determining Embedding Dimension for Phase Space Reconstruction using A Geometrical Construction” Physical Review A Article (CrossRef Link) 45 (6) 3403 - 3411    DOI : 10.1103/PhysRevA.45.3403
Cheng Y. 1995 “Mean shift, mode seeking, and clustering” IEEE Transactions. on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 17 (8) 790 - 799    DOI : 10.1109/34.400568
Comaniciu D. , Meer P. 2002 “Mean Shift: A Robust Approach Toward Feature Space Analysis” IEEE Transactions. on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 24 (5) 603 - 619    DOI : 10.1109/34.1000236
Rubner Y. , Tomasi C. , Guibas L. 2000 “The earth mover’s distance as a metric for image retrieval” International Journal of Computer Vision Article (CrossRef Link) 40 (2) 99 - 121    DOI : 10.1023/A:1026543900054
Xu D. , Chang S. F. 2007 “Visual Event Recognition in News Video Using Kernel Methods with Multi-Level Temporal Alignment” in Proc. of IEEE Conf. Computer Vision and Pattern Recognition June 17-22 1 - 8
Dantzig G. B. 1951 “Application of the simplex method to a transportation problem” Activity Analysis of Production and Allocation JohnWiley and Sons 359 - 373
Doretto G. , Chiuso A. , Wu Y. N. , Soatto S. 2003 “Dynamic texture” International Journal of Computer Vision Article (CrossRef Link) 51 (2) 91 - 109    DOI : 10.1023/A:1021669406132
Martin R. 2000 “A metric for arma processes” IEEE Transactions. on Signal Processing Article (CrossRef Link) 48 (4) 1164 - 7    DOI : 10.1109/78.827549
Buch N. , Velastin S. A. , Orwell J. 2011 “A Review of Computer Vision Techniques for the Analysis of Urban Traffic” IEEE Trans. Intelligent Transportation Systems Article (CrossRef Link) 12 (3) 920 - 939    DOI : 10.1109/TITS.2011.2119372
Vargas M. , Toral S. L. , Milla J. M. , Barrero F. 2010 “A shadow removal algorithm for vehicle detection based on reflectance ratio and edge density” in Proc. Of IEEE Conf. on Intelligent Transportation Systems Sept. Article (CrossRef Link) 1123 - 1128
Lai J. , Huang S. , Tseng C. 2010 “Image-based vehicle tracking and classification on the highway” in Proc. Of IEEE on Green Circuits and Systems June 21-23 666 - 670
Robert K. 2009 “Night-Time Traffic Surveillance: A Robust Framework for Multi-vehicle Detection, Classification and Tracking” in Proc. Of IEEE on Advanced Video and Signal Based Surveillance August 29-September 1 1 - 6
Gritsch G. , Donath N. , Kohn B. , Litzenberger M. 2009 “Night-time vehicle classification with an embedded, vision system” in Proc. Of IEEE Conf. on Intelligent Transportation Systems October 4-7 1 - 6
Zou Y. , Shi G. , Shi H. , Zhao H. 2011 “Traffic incident classification at intersections based on image sequences by hmm/svm classifiers” Multimedia Tools and Applications Article (CrossRef Link) 52 (1) 133 - 145    DOI : 10.1007/s11042-010-0466-6
Akoz O. , Karsligil M.E. 2010 “Severity detection of traffic accidents at intersections based on vehicle motion analysis and multiphase linear regression” in Proc. of IEEE Conf. on Intelligent Transportation Systems September 19-22 474 - 479
Pucher M. , Schabus D. , Schallauer P. , Lypetskyy Y. , Graf F. , Rainer H. , Stadtschnitzer M. , Sternig S. , Birchbauer J. , Schneider W. , Schalko B. 2010 “Multimodal highway monitoring for robust incident detection” in Proc. of IEEE Conf. on Intelligent Transportation Systems. September 19-22 837 - 842
Huang H. , Cai Z. , Shi S. , Ma X. , Zhu Y. 2009 “Automatic Detection of Vehicle Activities Based on Particle Filter Tracking” in International Symposium on Computer Science and Computational Technology December 26-28 381 - 384
Ali S. , Basharat A. , Shah M. 2007 “Chaotic invariants for human action recognition” in Proc. of IEEE Conf. on Computer Vision October 14-20 1 - 8
Wu S. , Moore B. , Shah M. 2010 “Chaotic invariants of Lagrangian particle trajectories for anomaly detection in crowded scenes” in Proc. of IEEE Conf.e on Computer Vision and Pattern Recognition June 13-18 2054 - 2060
Shroff N. , Turaga P. , Chellappa R. 2010 “Moving Vistas: Exploiting Motion for Describing Scenes” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition June 13-18 1911 - 1918
Ji Rongrong , Gao Yue , Hong Richang , Liu Qiong , Tao Dacheng , Li Xuelong 2013 “Spectral-Spatial Constraint Hyperspectral Image Classification” IEEE Transactions on Geoscience and Remote Sensing Article (CrossRef Link) 52 (3) 1811 - 1824
Ji Rongrong , Yao Hongxun , Tian Qi , Xu Pengfei , Sun Xiaoshuai , Liu Xianming 2012 “Context-Aware Semi-Local Feature Detector” ACM Transactions on Intelligent System and Technology Article (CrossRef Link) 3 (3) 44 - 71    DOI : 10.1145/2168752.2168758
Mussarat Yasmin , Muhammad Sharif , Sajjad Mohsin , Isma Irum 2013 “Content Based Image Retrieval Using Combined Features of Shape, Color and Relevance Feedback” KSII Transactions on Internet and Information Systems Article (CrossRef Link) 7 (12) 3149 - 3165    DOI : 10.3837/tiis.2013.12.011
Nguyen Huy Hoang , Lee GueeSang , Kim SooHyung , Yang Hyung Jeong 2013 “An Effective Orientation-based Method and Parameter Space Discretization for Defined Object Segmentation” KSII Transactions on Internet and Information Systems Article (CrossRef Link) 7 (12) 3180 - 3199    DOI : 10.3837/tiis.2013.12.013