Multiple Person Tracking based on Spatial-temporal Information by Global Graph Clustering
Multiple Person Tracking based on Spatial-temporal Information by Global Graph Clustering
KSII Transactions on Internet and Information Systems (TIIS). 2015. Jun, 9(6): 2217-2229
Copyright © 2015, Korean Society For Internet Information
  • Received : December 15, 2015
  • Accepted : May 07, 2015
  • Published : June 30, 2015
Export by style
Cited by
About the Authors
Yu-ting Su
School of Electronic Information Engineering, Tianjin University Tianjin, 300072, China
Xiao-rong Zhu
School of Electronic Information Engineering, Tianjin University Tianjin, 300072, China
Wei-Zhi Nie
School of Electronic Information Engineering, Tianjin University Tianjin, 300072, China

Since the variations of illumination, the irregular changes of human shapes, and the partial occlusions, multiple person tracking is a challenging work in computer vision. In this paper, we propose a graph clustering method based on spatio-temporal information of moving objects for multiple person tracking. First, the part-based model is utilized to localize individual foreground regions in each frame. Then, we heuristically leverage the spatio-temporal constraints to generate a set of reliable tracklets. Finally, the graph shift method is applied to handle tracklet association problem and consequently generate the completed trajectory for individual object. The extensive comparison experiments demonstrate the superiority of the proposed method.
1. Introduction
M ultiple object tracking is a popular research point in recent years and it is widely used in many applications such as intelligent video surveillance, gesture and action recognition [1 , 2 , 3 , 4 , 5] , and events recognition. However, it is hard to guarantee the reliability of tracking by the irregular changes of human shapes, the partical occlusions and the variations of scene illumination [6] .
Over the past two decades, a large number of different tracking algorithms [7 , 8] have been proposed to handle tracking problems. Traditional trackers such as mean-shift, Kalman, particle filter [9 , 10 , 11] can be seen as a process of optimization. They need multiple iterations to find the best location of each tracking object. However, these traditional tracking methods are more suitable for single person tracking. In recent years, tracking-by-detection methods have became increasingly popular [12 , 13 , 14 , 15] . Shu et al [16] applied an online learning method to track objects, where they trained a decision model by SVMfor each tracking object. Their method could judge the disappearance condition of object, but they need to update the decision model in the tracking process, which leads to the additional computational overhead. Nie et al [8] proposed to apply DPM [17] to detect moving people in each frame and utilized Baysian model to handle tracklet association problem, which can address occlusion problemin tracking process. However, this kind of method relies too much on the performance of detector and dataset association. For the detection results, several existing approaches address this issue by linking detections with confidence to build tracklet in order to improve the performance of detection [18] . For the dataset association problem, some existing methods apply Conditional Random Field (CRF), Byasian Model, and Convex optimization to handle. However, in these methods, the spatio-temporal information of tracked object is not fully utilized.
In order to solve these problems, we propose a graph model based on spatio-temporal information of tracking objects for object tracking. First, we employ DPM [17] to detect people in each frame. Then, spatial and temporal constraints help us to generate reliable tracklets through these detection results. Each tracklet includes a set of detection results. We apply mean pooling method to fuse all of detection results and extract Hog and HSV features for tracklet representation. Finally, we utilize spatial and temporal information of tracklets to structure graph model, and leverage graph clustering method to handle data association problem for trajectory generation.
The main contributions are two-fold:
  • We formulated tracklet association problem into one clustering problem and applied graph clustering method to address it;
  • The proposed method is suitable for different scenes. It can also be used in multiple viewing scenes for object tracking.
The remainder of this paper is organized as follows. Section 2 will introduce the state-of-the-art methods in visual tracking related to this paper. System overview and some details of the proposed method will be introduced in Section 3. The experimental results are showed in Section 4. Finally, Section 5 concludes this paper.
2. Related Work
A vast amount of work has been published on multiple person tracking [19 , 20 , 21 , 22] . Tracking-by-detection is a popular tracking algorithm in the last decade. It associates the detection candidates with spatio-temporal constraints to generate a set of tracklets, then applies data association algorithms to get the final trajectory. Shu et al [16] proposed an online learning method to track multiple people and applied greedy algorithm to handle the data association problem. However, online learning did not provide a stable detected result, and false samples will bring catastrophic effects. Mohamed et al. used HoG [23] feature as the detector in the tracking system [24] . However, the performance of HoG feature is not good in low resolution videos. A large number of works [25 , 26 , 27 , 28] have showed that good detected results will greatly help the tracking process. So we apply part-based model to detect person in each video frame aiming to improve the detected accuracy. Yang et al’s [29] approach was similar to ours, but they used a classifier to predict the potential positions of the tracking person, while we first use optical flow to predict motion regions, then apply spatial and physics information to get the final predicted region as the detected result. Zdenek et al [7] proposed a TLD tracking algorithm, which used detection+tracker model to detect objects. However, in the tracking process, Zdenek et al. applied online learning classifier to judge the detected results. Inspired by this work, features were extracted from the average image of detection windows as the whole feature of tracking object aiming to find a kind of more stable feature to represent the tracked object.
Classic multiple object trackers such as multi-hypothesis tracking [30] and joint probabilistic data association filters [31] jointly consider the data association from sensor measurements to multiple overlapping tracks. While not restricted to Markov chains, they are able to only keep few steps in memory due to the exponential task complexity and they do not take physical exclusion constraints between object volumes into account. Jiang et al. [32] employed integer linear programming to handle the data association problem. However, the number of objects need to be known priori. To overcome this limitation, Berclax et al. [33] introduced virtual source and sinked locations to initiate and terminate trajectories. A common trait of these works is that they lead to global optimization problems, which are usually solved by linear programming. However, the global optimization problem must consider the kinds of conditions such as entering, leaving and occlusion of tracking objects. These conditions will increase the complexity and calculation of the algorithm. We draw on the idea to handle occlusion problem by linear programming in this paper. Our method is different from [33] in a way that we do not apply global optimization to handle the trajectory problem, while we use greedy algorithm to generate the final trajectory for each tracking object.
3. System Overview
Our tracking system includes two steps. 1) Tracklet generation: in this stage, we apply DPM method to detect each person in each frame from video sequence. DPM is a popular detection model, which can localize the body parts of one person by dynamic programming with the visual features. This advantage is very useful to handle the occlusion problem. Then, we utilize spatial and temporal constrains to generate a set of reliable tracklets [18] . 2) Data association: each tracklet is constituted by a set of detection results. These detection results represent the same person. We apply mean pooling method to generate average image from these detection results, and extract color [34] and HoG [35] feature to represent tracklet. Finally, clustering method is used to handle data association problem and generate the final trajectory for each tracking object.
Data association is a key step in this work. Thus, we will further explain this process in section 3.1. First, we will introduce the process of feature extraction fromaverage image. Then, the process of graph building can be introduced. Finally, we will show the graph clustering method.
- 3.1. Data Association
- 3.1.1. Feature Extraction
After the tracklet generation, we have generated a set of reliable tracklets. Each tracklet is represented by a set of detected results. Obviously, these detected results belong to one same person.
We assume that each tracklet has N detected results T = { t 1 , ... tN }. We are able to get the average image from these detected results shown in Fig. 1 . The average image shows that these detected results have similar feature representations such as color or edge feature. The trajectory of each person in a video sequence must be formed by a set of tracklets, while these tracklets must have similar feature representations since one moving person should represent an average motion specially in a long time movement. So we extract color and Hog features from the average image as the feature of tracklet. If some tracklets belong to one person, these similarities of these tracklets should be higher than other tracklets.
PPT Slide
Lager Image
The information of one tracklet.
- 3.1.2. Graph Structure
In Fig. 2 , we record the initial and the terminational state of one tracklet including the positions and spatio-temporal information. We then use ti = { ns , ne , f , xs , ys , xe , ne , ye } to represent each tracklet. Here ns is the initial time of tracklet, ne is the terminational time of tracklet, f is the feature of tracklet, and ( xs , ys ) is the initial position of tracklet. ( xe , ye ) is the terminational position of tracklet. These information will be used to compute similarity between two different tracklets as follow:
PPT Slide
Lager Image
where S ( i , j ) is the similarity between tracklet i and tracklet j , p ( nie , njs ) is the relationship between two different tracklets, and it is computed by Eq 2, d ( i , j ) and h ( i , j ) are the similarities in distance and feature, which are computed by the Eq.3 and Eq.4, ε and τ are the weights of d ( i , j ) and h ( i , j ). In this work, we set ε = τ =0.5.
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
The sample of matching tracklets. The traditional methods often use matching result between initial and terminational state as the matching result between two tracklets. Our approach applies matching result between two average image from different tracklets as the final matching result.
where ( xie , yie ) is the terminational state position of tracklet i in video frame, ( xjs , yjs ) is the initial state position of tracklet j in video frame. We use Euclidean distance to represent the similarity between tracklet i and tracklet j in spatial terms.
PPT Slide
Lager Image
where fi is the feature of tracklet i , fj is the feature of tracklet j . We apply Eq.4 to compute the similarity between tracklet i and tracklet j in feature space. Finally, we fuse both similarities in feature and spatial by Eq.1 to obtain the final similarity between the tracklet i and the tracklet j . The similarity can be used to build graph G =( V , E ), where, V is the node set, E is the edge set. Each node Vi denotes one tracklet. Each edge ej denotes relevance between different tracklets. The weight of edge is computed by Eq.1.
- 3.1.3. Tracklet Association
Based on the graph model, the dense subgraphs are considered as the trajectory for one person. To detect the dense subgraphs, we applied the pair-wise clustering method of graph shift (GS) [35] , which is a popular method in detecting dense subgraph. The input of the GS method is the adjacency matrix A of the multimedia. A subgraph is represented by a probabilistic cluster x ∈Δ n , where Δ n ={ x | x Rn , x ≥0, | x | 1 =1}. In fact, x is a unit mapping vector, which is the probability that the subgraph contains each vertex. Particularly, xi =0 means that the i th vertex is not included in the dense subgraph. The GS method measures the average connection strength of subgraph x by g ( x ) in Eq.5 and efficiently finds all the local maximums x of g ( x ). Each local maximum indicates a dense subgraph of the graph, which is defined as a trajectory for each person.
PPT Slide
Lager Image
Finding the dense subgraph is equivalent to maximizing
PPT Slide
Lager Image
Since our aim is to find the dense subgraph, each time we only consider one subcluster. Without loss of generality, we denote the sub-cluster as xi . Since the problem is a constrained optimization problem, by adding Lagrangian multipliers λ for all i = 2,⋯, n , we obtain its Lagrangian function:
PPT Slide
Lager Image
Any local maximizer x must satisfy the Karush-Kuhn-Tucker (KKT) condition. That is:
PPT Slide
Lager Image
PPT Slide
Lager Image
As pointed out by [36] , we can optimize the problem in the pairwise way by Algorithm 1.
Algorithm 1 Dense subgraph
Weighted adjacency array A and an initialization x (0);
Set x = x (0);
1: Update the partial derivative g ( x ) with respect to each variable xi ;
2: Find a pair ( vi , vj )
3: Compute the best step size to maximize Eq.5;
4: Until: x is a local maximizer;
5: Output: A KKT point x
  • After this process, we have a set of sub-graphs or sub-clusters. Each sub-graph can be seen as one trajectory for the tracked object.
4. Experimental
- 4.1. Datasets
We show experiments on four video sets to demonstrate the effectiveness of the proposed method. The first set is a selection from the Pets 2012 video datasets , which is captured with a stationary camera. In this dataset, 7 cameras record several pedestrians under various angles. Among these cameras, 4 of them are located relatively close to the scene, and captured clear pictures of people. The other 3 cameras are located further from the scene and about 4-5m above the ground, giving a wide angle view of the situation. The frame rate for all cameras is about 7 fps.
The second set is the Town Center Datasets , the frame resolution is 19201080×, and the frame rate is 25 fps. In this dataset, the long-term occlusion often appears and the motion of pedestrians is often linear and predictable. The biggest advantage of this dataset is high resolution, which is useful for our detection algorithm.
The third set is Caviar video sequence , which includes 26 video sequences of a walkway in a shopping center taken by a single camera with frame size of 385×288 and the frame rate of 25 fps. At the same time, the video datasets include two angles (front and corner). We selected corner angle to test the effectiveness of our method because more occlusion happen in this angle.
- 4.2. Experimental Setting
In our implementation, we used the color and Hog features to represent each tracking object. We averagely segmented the detected window into eight parts to extract the features. The feature vector for each part consists of 256-bin RGB color histogram using 10 bins for each channel, 36-D HoG feature and 36-D color feature for each part region. We also applied normalization for each part and concatenated features from all eight parts into one feature vector. To improve the accuracy of human detection, we first implemented Gaussian Mixture Model [37] to get motion regions in each video frame and then applied part-based human detector to detect human body in the foreground. We will detail these experiment results in the next sub-section.
We evaluated our tracking results using the standard CLEAR MOT metrics [38] . TA (tracking accuracy), DP (detection precision) and DA (detection accuracy). DA and DP are used to test the detection result. Higher score means better results. TA is used to evaluate the experimental results of tracking. The the higher score the better. However, TA is computed by the number of lost targets. The difference between TA and DA can be used to evaluate the matching result. The lower score the better.
- 4.3. Experimental Results
PETS 2012 Datasets: The PETS 2012 datasets include 8 video sequences. The resolution of these video sequences is 768×576. The experimental datasets are challenging due to the existence of occlusions, crowded scenes, and cluttered backgrounds. All of video sequences are from one same scene in different angles, so the dataset also can be seen as a multiple cameras tracking dataset. We apply the dataset to test the performance of our new feature extraction. In the evaluation step, we use MOTA, MODA and MODP to evaluate the performance of tracking results. We compare our method with [8] . Nie’s method is similar with ours in detection step. So we apply these comparative results to demonstrate the performance of our approach. The tracking results are showed in the Table 1 . Fig. 3 shows some tracking results in single camera scene.
Tracking Results on PETS 2012 View001 Dataset
PPT Slide
Lager Image
Tracking Results on PETS 2012 View001 Dataset
PPT Slide
Lager Image
Tracking results in PETS 2012 View001 dataset
From these experimental results, we could find that the tracking results are better than the prior works. The frame rate of the PETS 2012 dataset is 7 fps. So the change of human shapes is bigger in the consecutive frames than other video sequences. This condition will lead that the detection results disappear. Finally, we will get a lot of tracklets. The experimental results prove better performance of our approach.
Town Center Dataset : We also test our approach in the Town Center dataset. The resolution of this dataset is 1920×1080 and the frame rate is 25 fps. This dataset contains the street scene with long-term occlusions. At the same time, the high resolution of this dataset is very useful for the Part-based model. We compared the results with the recently proposed methods [39 , 40 , 36 , 41 , 42 , 16] . With the same experimental setting, the performance of our matching method is better than others in MOTA and MODA. The experiment results are shown in the Table 2 . Fig. 4 shows some tracking results in single camera scene.
Tracking Results on Town Center Dataset
PPT Slide
Lager Image
Tracking Results on Town Center Dataset
PPT Slide
Lager Image
Tracking results in Town Center dataset
From these results, it can be observed that the detection results are similar with the results of [8] . Meanwhile, our results are also better then HoG detector. We also observe that our tracking results are better than the prior work. Our feature extraction method could provide the better discrimination between different tracking objects. The experimental results also prove the performance of the proposed method.
The CAVIAR Datasets : We also tested our approach on CAVIAR datasets, which includes 26 video sequences and every video sequence often express one events. The resolution of the dataset is 384×288 and the frame rate is 25 fps. In application, we selected 18 video sequences from these 26 video sequences to test our method. The CAVIAR dataset is a multiple cameras datasets, and every video sequence includes two view angles videos. One view angle is a corner angle, and the other angle is front angle. We selected 16 corner angle video sequences because there are more occlusion conditions in these video sequences. The tracking results are showed in Table 3 .
Tracking Results in CAVIAR Dataset
PPT Slide
Lager Image
Tracking Results in CAVIAR Dataset
In this dataset, occlusion often appears in these video sequences. The resolution of this dataset is low, so the detection results did not have an obviously improvement. However, the feature of an average image brings an improvement in tracking result. However, this dataset also has some disadvantages such as occlusion often appears between two people and the time of occlusion is not long. In other words, this dataset is designed for events detection. In most video sequences, there are only 1 to 3 people appear. In these special dataset, our approach could get a good tracking result.
5. Conclusion
In this paper, we proposed a data association method based on graph clustering to handle object tracking problem. We leverage visual, spatial, and temporal information of detection results to generate the reliable tracklets. Based on these tracklets, we built the graph model and successfully formulated the data association probleminto one graph clustering problem. Then, graph shift method is utilized to handle clustering problem and generate the final trajectory for each tracking object. Experiment results demonstrate the superiority of this method.
This work was supported in part by the National Natural Science Foundation of China (61472275, 61170239), the Tianjin Research Program of Application Foundation and Advanced Technology (15JCYBJC16200), the grant of Elite Scholar Program of Tianjin University (2014XRG-0046).
Yu-ting Su received the B.S., M.S. and Ph.D. degrees in electronic engineering from Tianjin University of China, in 1995, 1998 and 2001, respectively. He is currently a Professor at the School of Electronic Information Engineering in Tianjin University. His research interests include digital video coding, digital watermarking and data hiding multimedia forensics, and multimedia retrieval.
Xiao-rong Zhu received the B.S. degrees in Xidian University of China, and receivedM.S. degrees in the school of electronic engineering from Tianjin University of China. Her research interests include computer vision, digital watermarking and data hiding multimedia forensics.
Wei-zhi Nie received the B.S. and M.S. degrees in electronic engineering from Tianjin University of China. He is currently pursuing the Ph.D. degree from Tianjin University. His research interests include multiple object tracking, computer vision and location-based social network.
Liu An-An , Su Y-T , Jia P-P , Gao Zan , Hao Tong , Yang Z-X 2014 “Multipe/single-view human action recognition via part-induced multitask structural learning,” Cybernetics, IEEE Transactions on 1 -
Gao Zan , Zhang Hua , Liu An-An , Xue Yan-bing , Xu Guang-ping 2014 “Human action recognition using pyramid histograms of oriented gradients and collaborative multi-task learning,” KSII Transactions on Internet and Information Systems (TIIS) 8 (2) 483 - 503    DOI : 10.3837/tiis.2014.02.009
Liu An-An 2012 “Bidirectional integrated random fields for human behaviour understanding,” Electronics letters 48 (5) 262 - 264    DOI : 10.1049/el.2011.3530
Liu An-An 2011 “Human action recognition with structured discriminative random fields,” Electronics letters 47 (11) 651 - 653    DOI : 10.1049/el.2011.0880
Liu An-An , Li Kang , Kanade Takeo 2012 “A semi-markov model for mitosis segmentation in time-lapse phase contrast microscopy image sequences of stem cell populations,” Medical Imaging, IEEE Transactions on 31 (2) 359 - 369    DOI : 10.1109/TMI.2011.2169495
Nie Weizhi , Liu Anan , Su Yuting , Luan Huanbo , Yang Zhaoxuan , Cao Liujuan , Ji Rongrong 2014 “Single/cross-camera multiple-person tracking by graph matching,” Neurocomputing 139 220 - 232    DOI : 10.1016/j.neucom.2014.02.040
Kalal Z. , Mikolajczyk K. , Matas J. 2012 “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell. 34 (7) 1409 - 1422    DOI : 10.1109/TPAMI.2011.239
Nie W. , Liu A. , Su Y. “Multiple person tracking by spatiotemporal tracklet association,” In AVSS 2012 481 - 486
Tu J. , Tao H. , Huang T. S. “Online updating appearance generative mixture model for meanshift tracking,” In ACCV 2006 vol. 1 694 - 703
Han Z. , Ye Q. , Jiao J. “Online feature evaluation for object tracking using kalman filter,” In ICPR 2008 1 - 4
Bazzani L. , Cristani M. , Murino V. “Decentralized particle filter for joint individual-group tracking,” In CVPR 2012 1886 - 1893
Guan Yaowen , Chen Xiaoou , Yang Deshun , Wu Yuqian “Multi-person tracking-by-detection with local particle filtering and global occlusion handling,” In IEEE International Conference on Multimedia and Expo 2014 1 - 6
Azab Maha M. , Shedeed Howida A. , Hussein Ashraf Saad 2014 “New technique for online object tracking-by-detection in video,” IET Image Processing 8 (12) 794 - 803    DOI : 10.1049/iet-ipr.2014.0238
Schumann Arne , Bauml Martin , Stiefelhagen Rainer “Person tracking-by-detection with efficient selection of part-detectors,” In 10th IEEE International Conference on Advanced Video and Signal Based Surveillance, AVSS 2013 2013 43 - 50
Gao Z. , ZHang H. , Xu G.P , Xue Y.B 2015 “Multi-perspective and Multi-modality Joint Representation and Recognition Model for 3D Action Recognition,” NeuroComputing 151 554 - 564    DOI : 10.1016/j.neucom.2014.06.085
Shu G. , Dehghan A. , Oreifej O. , Hand E. , Shah M. “Part-based multiple-person tracking with partial occlusion handling,” In CVPR 2012 1815 - 1821
Felzenszwalb Pedro F. , McAllester David A. , Ramanan Deva “A discriminatively trained, multiscale, deformable part model,” In 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2008
Henriques Joao F. , Caseiro Rui , Martins Pedro , Batista Jorge “Exploiting the circulant structure of tracking-by-detection with kernels,” In proc. of ECCV 2012, 12th European Conference on Computer Vision 2012 702 - 715
Piatkowska E. , Belbachir A. N. , Schraml S. , Gelautz M. “Spatiotemporal multiple persons tracking using dynamic vision sensor,” In CVPR Workshops 2012 35 - 40
Zuriarrain I. , Lerasle F. , Arana-Arejolaleiba N. , Devy M. “An mcmc-based particle filter for multiple person tracking,” In ICPR 2008 1 - 4
Munoz-Salinas R. 2008 “A bayesian plan-view map based approach for multiple-person detection and tracking,” Pattern Recognition 41 (12) 3665 - 3676    DOI : 10.1016/j.patcog.2008.06.013
Gao Zan , Zhang Long-fei , Chen Ming-yu , Hauptmann Alexander , Zhang Hua , Cai Anni 2014 “Enhanced and Hierarchical Structure Algorithm for Data Imbalance Problem in Semantic Extraction underMassive Video Dataset,” Multimedia Tools and Applications 68 (3)
Dalal N. , Triggs B. “Histograms of oriented gradients for human detection,” In CVPR 2005 886 - 893
Kaaniche M. Becha , Bremond F. “Tracking hog descriptors for gesture recognition,” In AVSS 2009 140 - 145
Mauthner Thomas , Roth Peter M. , Bischof Horst “Learn to move: Activity specific motion models for tracking by detection,” in Proc. of Computer Vision - ECCV 2012 2012 183 - 192
Jiang Xiaoyan , Rodner Erik , Denzler Joachim “Multi-person trackingby-detection based on calibrated multi-camera systems,” in Proc. of Computer Vision and Graphics - International Conference 2012 743 - 751
Bredereck Michael , Jiang Xiaoyan , Korner Marco , Denzler Joachim “Data association for multi-object tracking-by-detection in multi-camera networks,” in Proc. of Sixth International Conference on Distributed Smart Cameras 2012 1 - 6
Gao Z. , Zhang H. , Xu G.P , Xue Y.B , Hauptmannc A. G. “Multi-View Discriminative and Structured Dictionary Learning with Group Sparsity for Human Action Recognition,” Signal Processing 2015
Yang B. , Nevatia R. “Multi-target tracking by online learning of nonlinear motion patterns and robust appearance models,” In CVPR 2012 1918 - 1925
Patzold M. , Evangelio R. Heras , Sikora T. “Boosting multi-hypothesis tracking by means of instance-specific models,” AVSS 2012 416 - 421
Gehrig T. , McDonough J. W. “Tracking multiple speakers with probabilistic data association filters,” CLEAR 2006 137 - 150
Jiang H. , Fels S. , Little J. J. “A linear programming approach for multiple object tracking,” In CVPR 2007
Berclaz J. , Fleuret F. , Fua P. “Robust people tracking with global trajectory optimization,” In CVPR 2006 744 - 750
Feng Liu , Xiaoyu Liu , Yi Chen “An efficient detection method for rare colored capsule based on RGB and HSV color space,” in Proc. of 2014 IEEE International Conference on Granular Computing 2014 175 - 178
Alam Khan Mohammad Nazmul , Fan Guoliang , Heisterkamp Douglas R. , Yu Liangjiang “Automatic target recognition in infrared imagery using dense HOG features and relevance grouping of vocabulary,” in Proc. of IEEE Conference on Computer Vision and Pattern Recognition 2014 293 - 298
Pellegrini S. , Ess A. , Van Gool L. J. “Improving data association by joint modeling of pedestrian trajectories and groupings,” In ECCV 2010 452 - 465
Gorur P. , Amrutur B. “Speeded up gaussian mixture model algorithm for background subtraction,” AVSS 2011 386 - 391
Kasturi R. , Goldgof D. B. , Soundararajan P. 2009 “Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol,” IEEE Trans. Pattern Anal 31 (2) 319 - 336    DOI : 10.1109/TPAMI.2008.57
Benfold B. , Reid I. “Stable multi-target tracking in real-time surveillance video,” In CVPR 2011 3457 - 3464
Zhang L. , Li Y. , Nevatia R. “Global data association for multi-object tracking using network flows,” In CVPR 2008
Yamaguchi K. , Berg A. C. , Ortiz L. E. , Berg T. L. “Who are you with and where are you going?” In CVPR 2011 1345 - 1352
Leal-Taixe L. , Pons-Moll G. , Rosenhahn B. “Everybody needs somebody: Modeling social and grouping behavior on a linear programming multiple people tracker,” In ICCV Workshops 2011 120 - 127