Advanced
Video-based Height Measurements of Multiple Moving Objects
Video-based Height Measurements of Multiple Moving Objects
KSII Transactions on Internet and Information Systems (TIIS). 2014. Sep, 8(9): 3196-3210
Copyright © 2014, Korean Society For Internet Information
  • Received : October 10, 2013
  • Accepted : March 12, 2014
  • Published : September 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Mingxin Jiang
School of Information & Communication Engineering, Dalian Nationalities University, Dalian, Liaoning, 116600, China
Hongyu Wang
School of Information & Communication Engineering,Dalian University of Technology, Dalian, Liaoning, 116024, China.
Tianshuang Qiu
School of Information & Communication Engineering,Dalian University of Technology, Dalian, Liaoning, 116024, China.

Abstract
This paper presents a novel video metrology approach based on robust tracking. From videos acquired by an uncalibrated stationary camera, the foreground likelihood map is obtained by using the Codebook background modeling algorithm, and the multiple moving objects are tracked by a combined tracking algorithm. Then, we compute vanishing line of the ground plane and the vertical vanishing point of the scene, and extract the head feature points and the feet feature points in each frame of video sequences. Finally, we apply a single view mensuration algorithm to each of the frames to obtain height measurements and fuse the multi-frame measurements using RANSAC algorithm. Compared with other popular methods, our proposed algorithm does not require calibrating the camera, and can track the multiple moving objects when occlusion occurs. Therefore, it reduces the complexity of calculation and improves the accuracy of measurement simultaneously. The experimental results demonstrate that our method is effective and robust to occlusion.
Keywords
1. Introduction
M etrology, the measurement of real world metrics, has been investigated extensively in computer vision for many applications. The technique of measuring the geometric parameters of objects from video has been developed as an interesting issue in computer vision field in recent years [1 - 2] . With the increasing use of video surveillance systems [3] , more and more crimes and incidents have been captured on video. When the incidents have been captured, we need to gain an understanding of the events or identify a particular individual.
As height is an important parameter of a person, some methods have been presented for estimating height information from video [4 - 5] . They can be roughly divided into two categories: absolute measurement and relative measurement. Absolute measurement requires fully calibrating camera, which is a complicated process [6] . Relative measurement only requires the minimal calibration. Guo and Chellappa [7] presented a video metrology approach using an uncalibrated single camera that is either stationary or in planar motion. This paper also leverages object motion in videos to acquire calibration information for measurement. No constant velocity motion is assumed. Furthermore, it incorporates all the measurements from individual video frames to improve the accuracy of final measurement.
Several automatic mensuration algorithms have been developed to take advantage of tracking results from video sequences. Renno et al. [8] used projected sizes of pedestrians to estimate the vanishing line of a ground plane. Bose and Crimson [9] proposed a method that uses constant velocity trajectories of objects to derive vanishing lines for recovering the reference plane and planar rectification. The basic idea of their algorithm is to use an additional constraint brought by the constant-velocity assumption, which is not always available in surveillance sequences. Shao et al. [10] proposed a minimal-supervised algorithm based upon monocular videos and uncalibrated stationary cameras. The author recovered the minimal calibration of the scene based upon tracking moving objects, then applied the single view metrology algorithm to each frame, and finally fused the multi-frame measurements using the LMedS as the cost function and the RMSA as the optimization algorithm.
However, most of the existing approaches are direct extension of image-based algorithms, which have not considered the occlusions between objects and lack robustness. Reliable tracking of multiple objects in complex situations is a challenging visual surveillance problem since the high density of objects results in occlusion. When occlusion between multiple objects is common, it is extremelly difficult to perform the tasks of height measurements of objects.
In this paper, we propose a new method for height measurements of multiple objects based on rubust tracking. Firstly, the foreground likelihood map is obtained by using the Codebook background modeling algorithm. Secondly, tracking of multiple objects are performed by a combined tracking algorithm. Then, the vanishing line of the ground plane and the vertical vanishing point are computed, and the head feature points and the feet feature points are extracted in each frame of video sequences. Finally, we obtain height measurements of multiple objects according to the projective geometric constraint, and the multiframe measurements are fused using RANSAC algorithm.
Compared with other popular methods,our proposed algorithm does not require calibrate the camera, and can track the multiple moving objects in crowded scenes.Therefore, it reduces the complexity and improves the accuracy simultaneously. The experimental results demonstrate that our method is effective and robust in the occlusion case..
The organization of this paper is as follows. In Section 2, we introduce the multi-target detecting and tracking algorithm.Section 3 addresses video-based height measurements of multiple moving objects. Section 4 presents experimental results. Section 5 concludes this paper.
2. Multi-Target Detecting and Tracking Algorithm
- 2.1 Multi-Target Detecting Algorithm
The capability of extracting moving objects from a video sequence captured using a static camera is a typical first step in visual surveillance. A common approach for discriminating moving objects from the background is detection by background subtraction [11 - 12] . The idea of background subtraction is to subtract or difference the current image from a reference background model. The subtraction identifies non-stationary or new objects. The generalized mixture of Gaussians (MOG) has been used to model complex, non-static backgrounds. MOG does have some disadvantages. Backgrounds having fast variations are not easily modeled with just a few Gaussians accurately, and it may fail to provide sensitive detection.
In this paper, codebook algorithm has been used to model backgrounds. The algorithm is an adaptive and compact background model that can capture structural background motion over a long period of time under limited memory. This allows us to encode moving backgrounds or multiple changing backgrounds. At the same time, the algorithm has the capability of coping with local and global illumination changes.
A quantization/clustering technique is adopted to construct a background model in the codebook algorithm. Samples at each pixel are clustered into the set of codewords. The background is encoded on a pixel by pixel basis.
Let X ={ x 1 , x 2 ,..., x N } be a training sequence for a single pixel consisting of N RGB-vectors. Let C =( c 1 , c 2 ,..., c L ) represent the codebook for the pixel consisting of L codewords. Each pixel has a different codebook size based on its sample variation. Each codeword c i ( i =1,..., L ) consists of an RGB vector
PPT Slide
Lager Image
and a 6-tuple
PPT Slide
Lager Image
.The tuple aux i contains intensity (brightness) values and temporal variables described below.
  • : the min and max brightness accepted for the codewordirespectively;
  • fi: the frequency for the codewordioccurring;
  • λi: the maximum negative run-length (MNRL) defined as the longest interval during the training period that the codeword has NOT recurred;
  • pi,qi: the first and last access times, respectively, that the codeword has occurred.
In the training period, each value, x t , sampled at time t is compared to the current codebook to determine which codeword c m (if any) it matches (m is the index of matching codeword). We use the matched codeword as the sample's encoding approximation. To determine which codeword will be the best match, we employ a color distortion measure and brightness bounds.
When we have an input pixel x t = ( R , G , B ) and a codeword c i where
PPT Slide
Lager Image
.
The color distortion can be calculated by
PPT Slide
Lager Image
The logical brightness function is defined as
PPT Slide
Lager Image
The detailed algorithm of constructing codebook is given in [11] .
We segment foreground using subtracting the current image from the background model. When we have a new input pixel x i = ( R , G , B ) and its codebook M . The subtraction operation BGS ( xi ) for the pixel is defined as:
  • Step1.Compute the brightnessI=R+G+B.Define a boolean variablematch
  • Step2.Find the codewordCmmatching toxbased on two conditions:
PPT Slide
Lager Image
PPT Slide
Lager Image
if the codeword Cm is found, let match =1,else let match =0.
  • Step3.Determine the foreground moving object pixel:
PPT Slide
Lager Image
  • Step4.The likelihood of observationxibelonging to the foreground:
PPT Slide
Lager Image
Fig. 1 depicts the results of comparison of foreground likelihood maps obtained using different methods for an indoor data set. Fig. 1 (a) is an image extracted from an indoor video. Fig. 1 (b) depicts the foreground likelihood map of the image using mixture of Gaussians algorithm. Fig. 1 (c) depicts the foreground likelihood map of the image using Codebook-based method.
PPT Slide
Lager Image
Comparison of foreground likelihood maps obtained using different methods
- 2.2 Multi-Target Tracking Algorithm
Tracking multiple people accurately in cluttered and crowded scenes is a challenging task primarily due to occlusion between people [13 - 14] . Particle filter can work well when the object gets an occlusion, but it has difficulty in satisfying the requirement of real-time computing. Meanshift can solve this problem easily, while it has poor robustness during mutual occlusion. Aiming at all above problems, this section proposes a robust multi-target tracking algorithm by combining the particle filter with meanshift method.
Particle filters, provide an approximative Bayesian solution to discrete time recursive problem by updating an approximative description of the posterior filtering density [15] .
At time k , when a measurement zk becomes available, z 1:k = { z 1 , z 2 ,..., zk }.Assume that probability distribution function p ( x k-1 | z 1:k-1 ) is available at time k -1. According to the Bayes’ rule, the posterior probability function of the state vector can be calculated using the following equations.
PPT Slide
Lager Image
This is the prior of the state xk at time k without the knowledge of the measurement zk , i.e. the probability given only previous measurements. Update step combines likelihood of current measurement with predicted state.
PPT Slide
Lager Image
p ( z k | z 1:k-1 ) is a normalizing constant. It can be calculated by:
PPT Slide
Lager Image
Because p ( z k | z 1:k ) is a constant, (8) can be written as:
PPT Slide
Lager Image
Supposing that at time step k there is a set of particles, { xik , i =1,..., N } with associated weights { ωik , i =1,..., N } randomly drawn from importance sampling, where N is the total number of particles. The weight of particle i can be defined as:
PPT Slide
Lager Image
We use the transition prior p ( x k | x k-1 ) as the importance density function q ( xik | x i k-1 , z 1:k ). Then, we can simplify (11) as:
PPT Slide
Lager Image
Furthermore, if we use Grenander’s factored sampling algorithm, Eq.(16) can be modified as:
PPT Slide
Lager Image
The particle weights then can be normalized by using:
PPT Slide
Lager Image
to give a weighted approximation of the posterior density in the following form:
PPT Slide
Lager Image
where δ is the Dirac’s delta function.
Meanshift algorithm was first analyzed in [16] and developed in [17] . Meanshift is a non-parametric statistical approach that seeks the mode of a density distribution in an iterative procedure [18] . Let X denote the current location, then its new location X' after one iteration is :
PPT Slide
Lager Image
where { ai , i =1,..., N } are normalized points within the rectangle area specified by the current location X , ω ( ai ) is the weight associated to each pixel ai , and g ( x ) is a kernel profile function, and h is window radius to normalize the coordinate ai .
In our tracking algorithm, we assume that the dynamic of state transition corresponds to the following second order auto- regressive process.
PPT Slide
Lager Image
where A , B , C are the autoregression coefficients, nk is the Gaussian noise .
We use HSV color histogram to build the observation model. Given the current observation zk , the candidate color histogram Q ( xk ) is calculated on zk in the region specified by xk . The similarity between Q ( xk ) and the reference color histogram Q * by Bhattacharyya coefficient d (,).The likelihood distribution is evaluated as
PPT Slide
Lager Image
In our method, the meanshift algorithm is applied on every sample in sample set, this will greatly reduce the computational time of particle filtering. It might not be able to capture the true location of the objects during mutual occlusion. The particle filter can improve the robustness of the algorithm. We propagate particle { x i k-1 , i =1,..., N } according to the dynamic of state transition to get
PPT Slide
Lager Image
. The samples set
PPT Slide
Lager Image
is the first transition to get
PPT Slide
Lager Image
by meanshift according to Eq.(16). With meanshifted samples
PPT Slide
Lager Image
,we update their weights { ωik , i =1,..., N } according to Eq.(14). The likelihood distribution p ( zk | xik ) is given by Eq.(18). Then we resample
PPT Slide
Lager Image
and generate unweighted sample set { xik ,1/ N } i=1,...N . In Fig. 2 the tracking results are demonstrated for outdoor video sequences in different frames.
PPT Slide
Lager Image
Tracking results for test video sequences
3. Video-based Height Measurements of Multiple Moving Objects
- 3.1 Projective Geometry
In this section, we introduce the main projective geometric ideas and notation that are required for understanding our measurement algorithm well. We use upper case letters to indicate points in the world system and the corresponding lower case letters for their images.
Fig. 3 shows the basic geometry of the scene. A line segment in space, orthogonal to the ground plane and identified by its top point Hi and base point Fi , is denoted by Hi Fi , and its length is denoted by d ( Hi , Fi ). Hi Fi is projected onto the image plane as the line segment hi fi .The line l is the vanishing line of the ground plane, and v the vertical vanishing point. Given one reference height d ( Hl , Fl )= dl in the scene, the height of any object on ground plane(e.g. d 2 )) can be measured using geometry method shown in Fig. 3 (b).
PPT Slide
Lager Image
Basic geometry of the scene
The measurement is achieved with two steps. At the first stage, we map the length of the line segment h 1 f 1 onto the other h 2 f 2 . The intersection of the line through the two base points f 1 and f 2 with the vanishing line l determines the point u , and the intersection of the line through h 1 and u with the line through v and f 2 determines the point i . Because v and u are vanishing points, h 1 f 1 and h 1 i are parallel to if 2 and f 1 f 2 respectively. h 1 , i , f 2 , and f 1 forms a parallelogram with d ( h 1 , f 1 )= d ( i , f 2 ). We now have four collinear points v , h 2 , i , f 2 on an imaged scene line and thus there is a cross ratio available. The distance ratio d ( h 2 , f 2 ): d ( i , f 2 ) is the computed estimate of d 2 : d 1 by applying a 1-D projective transformation. At the second stage, we compute the ratio of length on the imaged scene line using cross ratio [19] .
The ratio between two line segments h 2 f 2 and if 2 can be written by:
PPT Slide
Lager Image
with d 2 = rd 1 . The height of any object can be measured using this method.
With the assumption of perfect projection, e.g. with a pinhole camera, a set of parallel lines in the scene is projected onto a set of lines in the image which meet in a common point. This point of intersection, perhaps at infinity, is called the vanishing point. Different approaches are adopted to detect vanishing points for the reference direction, according to the environments of video data sets.
In pinhole camera model, the vanishing line of the ground plane can be determined as the line through two or more vanishing points of the plane. If we have N vertical poles of same height in the perspective view, the vertical vanishing point VY can be computed just by finding the intersection of two (or more) poles. And the vanishing line of the ground plane VL is the line consisted by the points of intersection of the lines connecting the top and bottom of the poles. Thus, we can fix the vanishing line through three (or more) non-coplanar poles. In this paper, we denote the poles by {( ti , bi )} i =1,2... N , where ti , bi represent the image positions of the top and bottom of the poles, respectively. {(∑ ti ,∑ bi )} i =1,2,... N are the associated covariance matrices. VY can be fixed by finding the point v that minimizes the sum of distances from ti and bi to the line linking mi and v . Where mi is the midpoint of ti and bi , ( wi , bi ) is the line determined by mi and v .
PPT Slide
Lager Image
VL can be computed by
PPT Slide
Lager Image
where wVL is the unit vector of VL and bvL is a point on vanishing line.
The point xi is the intersection of line tjbj and line tkbk . The covariance matrix ∑ i of xi can be computed by using Jacobian as
PPT Slide
Lager Image
where
PPT Slide
Lager Image
.
- 3.2 Extracting head and feet feature points from moving objects
Humans are roughly vertical while they stand or walk. However, because human walking is an articulated motion, the shape and height of the human vary in different walking phases. As shown in Fig. 4 , at the phase which the two legs cross each other, height of the human we measured from the video sequence is highest, and is also the most appropriate height to represent human’s static height.
PPT Slide
Lager Image
The height of human varies periodically during walking cycle
The phase at which the two feet cross each other (leg-crossing) is of particular interest in that the feet position is relatively easy to locate and the shape is relatively insensitive to viewpoint. Thus, we aim to extract the head and feet locations at leg-crossing phases. We first detect a walking human from a video sequence by change detection.Then, we extract the leg-crossing phases by temporal analysis of the object shape. Finally, we compute the principal axes of the human’s body and locate the human’s head and feet positions at those phases.
To every single frame t , the head feature point hti of the object i { i =1,2,... N } can be obtained using the following steps.
Step 1. Construct the target likelihood matrix Lti corresponding to the foreground blobs Bti ( wi , hi ), where wi and hi denote the width and height of foreground blob Bti , respectively.
Step 2. Compute the covariance matrix Cti of target likelihood matrix Lti . The covariance matrix Cti can computed as
PPT Slide
Lager Image
Where Lti ( m ) and Lti ( n ) denote the m and n column of foreground target matrix at frame t .
Step 3. Compute the first eigenvectors
PPT Slide
Lager Image
of covariance matrix Cit .The centroid and
PPT Slide
Lager Image
of the blob give the principal axis Pit of the target’s body. The head feature point is assumed to be located on the principal axis.
Step 4. Project target blob Bit onto its corresponding principal axis Pit . Locate the head feature point hit by finding the first end point along the principal axis whose projection count is above a threshold along the principal axis from the top to the bottom.
Humans are roughly vertical at different phase of a walking cycle. This means that the head feature point, the feet feature points and vanishing point are collinear. We obtain the feet feature points of target fi by applying the collinear constraint. The fi can be computed as fi =( hi × VY l b(i) . hi denotes head feature point of object i. VY denotes the vertical vanishing point. l b(i) denotes the bottom line of blob.
- 3.3 Multiple frame fusion
The measurements from multiple frames include outlier observations due to bad tracking errors, articulated motion, and occlusions. It makes using mean of the multiframe estimates not robust. The RANSAC technique has the well-known property of being less sensitive to outliers. Thus, in this paper, we use RANSAC algorithm to estimate the height from a set of data contaminated by outliners.
4. Experimental Results and Analysis
In order to show the robustness of the height measurement algorithm discussed in this paper, we conducted several experiments with the data that we collected from stationary cameras under different environments. Moving objects include vehicles and humans. Given the limited space, in this section we only list two of them to show the experimental results and the forms of data statistics. The number of particles used for our method is 300 for all experiments.
The implementation of the algorithm is based on Windows 7 operating system and using MATLAB as the software platform. The configuration of the computer is AMD Athlon (TM) X2DualCore QL-62 2.00GHz,1.74GB memory.
The results of height measurements for test video 1 are shown in Fig. 5 .
PPT Slide
Lager Image
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in Table 1 .
Statistics of the measured value for test video 1
PPT Slide
Lager Image
Statistics of the measured value for test video 1
The results of height measurement for test video 2 are shown in Fig. 6 . The tracking blobs are object 1, object 2, object 3 from right to left respectively. The heights of objects are shown on the top of the blobs.
PPT Slide
Lager Image
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in Table 2 .
Statistics of the measured value for test video 2
PPT Slide
Lager Image
Statistics of the measured value for test video 2
From the experimental results, we can see that our algorithm shows better accuracy and robustness than algorithm proposed in [10] .
The results of height measurement for test video 3 are shown in Fig. 7 .
PPT Slide
Lager Image
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in Table 3 .
Statistics of the measured value for test video 3
PPT Slide
Lager Image
Statistics of the measured value for test video 3
From the experimental results, we can see that our proposed algorithm does not require calibrating the camera, and can track the multiple moving objects when occlusion occurs. Therefore, it reduces the complexity of calculation and improves the accuracy of measurement simultaneously.
5. Conclusion
We have presented a new algorithm for estimating height of multiple moving objects. We first compute the vanishing line of the ground plane and the vertical vanishing point. Secondly, detect and track multiple moving objects. Then, the head feature points and the feet feature points are extracted in each frame of video sequences. The height measurements of multiple objects are obtained according to the projective geometric constraint. Finally, the multi-frame measurements are fused by using RANSAC algorithm. The experimental results demonstrate that our method is effective and robust to the occlusion.This is a preliminary study and further work is required to do.
BIO
Mingxin Jiang received the M.S. degree in communication and information system from Jilin University,Changchun,China,in 2005. She received the Ph.D. in Dalian University of Technology in 2013. Her research interests include multi-object tracking, video content analysis and visual metrology.
Hongyu Wang received his B.S. degrees from Jilin University of Technology, Changchun, China, in 1990,and M.S. degrees fromGraduate School of Chinese Academy of Sciences, Changchun, China, in 1993, both in Electronic Engineering. He received the Ph.D. in Precision Instrument and Optoelectronics Engineering from Tianjin University, Tianjin, China, in 1997. Currently, he is a Professor in the institute of Information Science and Communication Engineering, Dalian University of Technology, China. His research interests include computer vision, video coding and wireless video sensor networks.
References
Cai J. , Walker R. 2010 “Height estimation from monocular image sequences using dynamic programming with explicit occlusions” IET Comput. Vis. Article (CrossRef Link) 4 (4) 149 - 161    DOI : 10.1049/iet-cvi.2009.0063
Criminisi A. , Distinguished dissertation. 2001 “Accurate visual metrology from single and multiple uncalibrated images” Springer-Verlag New York Distinguished dissertation.
Hu W. , Tan T. , Wang L. , Maybank S. 2004 “A survey on visual surveillance of object motion and behaviors” IEEE Trans. on Systems, Man, and Cybernetics—Part C: Applcations and Reviews Article (CrossRef Link) 34 (3) 334 - 352    DOI : 10.1109/TSMCC.2004.829274
Criminisi A. , Reid I. , Zisserman A. 2000 “Single view metrology” Int. J. Comput. Vis. Article (CrossRef Link) 40 (2) 123 - 148    DOI : 10.1023/A:1026598000963
Reid I. , Zisserman J. A. 1996 “Goal-derected video metrology” in Proc. of 4th European Conference on Computer Vision (ECCV) April 15–18 647 - 658
Zhang Z. 2000 “A flexible new technique for camera calibration” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) Article (CrossRef Link) 22 (11) 1330 - 1334    DOI : 10.1109/34.888718
Guo F. , Chellappa R. 2010 “Video metrology using a single camera” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) Article (CrossRef Link) 32 (7) 1329 - 1335    DOI : 10.1109/TPAMI.2010.26
Renno J. , Orwell J. , Jones G. 2002 “Learning surveillance tracking models for the self-calibrated ground plane” in Proc. of British Machine Vision Conf. 607 - 616
Bose B. , Grimson E. 2003 “Ground plane rectification by tracking moving objects” in Proc. of Joint IEEE International Workshop on Visual Surveillance and Performence Evaluation of Tracking and Surveillance
Shao J. , Zhou S. K. , Chellappa R. 2010 “Robust height estimation of moving objects from uncalibrated videos” IEEE Trans. On Image Processing Article (CrossRef Link) 19 (8) 2221 - 2232    DOI : 10.1109/TIP.2010.2046368
Kim K. , Chalidabhongse T.H. , Harwood D. , Davis L.S. 2005 “Real-Time foreground-background segmentation using codebook model” Real-Time Imaging 11 (5) 167 - 256
Bao B. K. , Liu G. , Xu C. , Yan S. 2012 “Inductive Robust Principal Component Analysis” IEEE Transactions on Image Processing Article (CrossRef Link) 21 (8) 3794 - 3800    DOI : 10.1109/TIP.2012.2192742
Khan Z.H. , Gu. I.Y.-H 2013 “Nonlinear dynamic model for visual object trackingon grassmann manifolds with partial occlusion handling” IEEE Trans. on Cybernetics Article (CrossRef Link) 43 (6) 2005 - 2019    DOI : 10.1109/TSMCB.2013.2237900
Khatoonabadi S.H. , Bajic. I.V. 2013 “Video object tracking in the compressed domain using spatio-temporal markov random fields” IEEE Trans. On Image Processing Article (CrossRef Link) 22 (1) 300 - 313    DOI : 10.1109/TIP.2012.2214049
Du M. , Nan X. M. , Guan L. 2013 “Monocular Human Motion Tracking by Using DE-MC Particle Filter” IEEE Trans. On Image Processing Article (CrossRef Link) 22 (10) 3852 - 3865    DOI : 10.1109/TIP.2013.2263146
Comaniciu D. , Meer P. 2002 “Mean shift: A robust approach toward feature space analysis” IEEE Trans. Pattern Anal. Mach. Intell. Article (CrossRef Link) 24 (5) 603 - 619    DOI : 10.1109/34.1000236
Wang L. F. , Yan H. P. , Wu H. Y. , Pan C. H. 2013 “Forward–backward mean-shift for visual tracking with local-background-weighted histogram” IEEE Trans. On Intelligent Transportation Systems Article (CrossRef Link) 14 (3) 1480 - 1489    DOI : 10.1109/TITS.2013.2263281
Comaniciu D. , Ramesh V. , Meer P. 2003 “Kernel based object tracking” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI) Article (CrossRef Link) 25 (5) 564 - 577    DOI : 10.1109/TPAMI.2003.1195991
Hartley R. , Zisserman. A. 2003 Multiple view geometry in computer vision. 2nd Edition Cambridge University Press Cambridge