This paper presents a novel video metrology approach based on robust tracking. From videos acquired by an uncalibrated stationary camera, the foreground likelihood map is obtained by using the Codebook background modeling algorithm, and the multiple moving objects are tracked by a combined tracking algorithm. Then, we compute vanishing line of the ground plane and the vertical vanishing point of the scene, and extract the head feature points and the feet feature points in each frame of video sequences. Finally, we apply a single view mensuration algorithm to each of the frames to obtain height measurements and fuse the multiframe measurements using RANSAC algorithm. Compared with other popular methods, our proposed algorithm does not require calibrating the camera, and can track the multiple moving objects when occlusion occurs. Therefore, it reduces the complexity of calculation and improves the accuracy of measurement simultaneously. The experimental results demonstrate that our method is effective and robust to occlusion.
1. Introduction
M
etrology, the measurement of real world metrics, has been investigated extensively in computer vision for many applications. The technique of measuring the geometric parameters of objects from video has been developed as an interesting issue in computer vision field in recent years
[1

2]
. With the increasing use of video surveillance systems
[3]
, more and more crimes and incidents have been captured on video. When the incidents have been captured, we need to gain an understanding of the events or identify a particular individual.
As height is an important parameter of a person, some methods have been presented for estimating height information from video
[4

5]
. They can be roughly divided into two categories: absolute measurement and relative measurement. Absolute measurement requires fully calibrating camera, which is a complicated process
[6]
. Relative measurement only requires the minimal calibration. Guo and Chellappa
[7]
presented a video metrology approach using an uncalibrated single camera that is either stationary or in planar motion. This paper also leverages object motion in videos to acquire calibration information for measurement. No constant velocity motion is assumed. Furthermore, it incorporates all the measurements from individual video frames to improve the accuracy of final measurement.
Several automatic mensuration algorithms have been developed to take advantage of tracking results from video sequences. Renno et al.
[8]
used projected sizes of pedestrians to estimate the vanishing line of a ground plane. Bose and Crimson
[9]
proposed a method that uses constant velocity trajectories of objects to derive vanishing lines for recovering the reference plane and planar rectification. The basic idea of their algorithm is to use an additional constraint brought by the constantvelocity assumption, which is not always available in surveillance sequences. Shao et al.
[10]
proposed a minimalsupervised algorithm based upon monocular videos and uncalibrated stationary cameras. The author recovered the minimal calibration of the scene based upon tracking moving objects, then applied the single view metrology algorithm to each frame, and finally fused the multiframe measurements using the LMedS as the cost function and the RMSA as the optimization algorithm.
However, most of the existing approaches are direct extension of imagebased algorithms, which have not considered the occlusions between objects and lack robustness. Reliable tracking of multiple objects in complex situations is a challenging visual surveillance problem since the high density of objects results in occlusion. When occlusion between multiple objects is common, it is extremelly difficult to perform the tasks of height measurements of objects.
In this paper, we propose a new method for height measurements of multiple objects based on rubust tracking. Firstly, the foreground likelihood map is obtained by using the Codebook background modeling algorithm. Secondly, tracking of multiple objects are performed by a combined tracking algorithm. Then, the vanishing line of the ground plane and the vertical vanishing point are computed, and the head feature points and the feet feature points are extracted in each frame of video sequences. Finally, we obtain height measurements of multiple objects according to the projective geometric constraint, and the multiframe measurements are fused using RANSAC algorithm.
Compared with other popular methods,our proposed algorithm does not require calibrate the camera, and can track the multiple moving objects in crowded scenes.Therefore, it reduces the complexity and improves the accuracy simultaneously. The experimental results demonstrate that our method is effective and robust in the occlusion case..
The organization of this paper is as follows. In Section 2, we introduce the multitarget detecting and tracking algorithm.Section 3 addresses videobased height measurements of multiple moving objects. Section 4 presents experimental results. Section 5 concludes this paper.
2. MultiTarget Detecting and Tracking Algorithm
 2.1 MultiTarget Detecting Algorithm
The capability of extracting moving objects from a video sequence captured using a static camera is a typical first step in visual surveillance. A common approach for discriminating moving objects from the background is detection by background subtraction
[11

12]
. The idea of background subtraction is to subtract or difference the current image from a reference background model. The subtraction identifies nonstationary or new objects. The generalized mixture of Gaussians (MOG) has been used to model complex, nonstatic backgrounds. MOG does have some disadvantages. Backgrounds having fast variations are not easily modeled with just a few Gaussians accurately, and it may fail to provide sensitive detection.
In this paper, codebook algorithm has been used to model backgrounds. The algorithm is an adaptive and compact background model that can capture structural background motion over a long period of time under limited memory. This allows us to encode moving backgrounds or multiple changing backgrounds. At the same time, the algorithm has the capability of coping with local and global illumination changes.
A quantization/clustering technique is adopted to construct a background model in the codebook algorithm. Samples at each pixel are clustered into the set of codewords. The background is encoded on a pixel by pixel basis.
Let
X
={
x
_{1}
,
x
_{2}
,...,
x
_{N}
} be a training sequence for a single pixel consisting of N RGBvectors. Let
C
=(
c
_{1}
,
c
_{2}
,...,
c
_{L}
) represent the codebook for the pixel consisting of L codewords. Each pixel has a different codebook size based on its sample variation. Each codeword
c
_{i}
(
i
=1,...,
L
) consists of an RGB vector
and a 6tuple
.The tuple
aux
_{i}
contains intensity (brightness) values and temporal variables described below.

: the min and max brightness accepted for the codewordirespectively;

fi: the frequency for the codewordioccurring;

λi: the maximum negative runlength (MNRL) defined as the longest interval during the training period that the codeword has NOT recurred;

pi,qi: the first and last access times, respectively, that the codeword has occurred.
In the training period, each value,
x
_{t}
, sampled at time t is compared to the current codebook to determine which codeword
c
_{m}
(if any) it matches (m is the index of matching codeword). We use the matched codeword as the sample's encoding approximation. To determine which codeword will be the best match, we employ a color distortion measure and brightness bounds.
When we have an input pixel
x
_{t}
= (
R
,
G
,
B
) and a codeword
c
_{i}
where
.
The color distortion can be calculated by
The logical brightness function is defined as
The detailed algorithm of constructing codebook is given in
[11]
.
We segment foreground using subtracting the current image from the background model. When we have a new input pixel
x
_{i}
= (
R
,
G
,
B
) and its codebook
M
. The subtraction operation
BGS
(
x_{i}
) for the pixel is defined as:

Step1.Compute the brightnessI=R+G+B.Define a boolean variablematch

Step2.Find the codewordCmmatching toxbased on two conditions:
if the codeword
C_{m}
is found, let
match
=1,else let
match
=0.

Step3.Determine the foreground moving object pixel:

Step4.The likelihood of observationxibelonging to the foreground:
Fig. 1
depicts the results of comparison of foreground likelihood maps obtained using different methods for an indoor data set.
Fig. 1
(a) is an image extracted from an indoor video.
Fig. 1
(b) depicts the foreground likelihood map of the image using mixture of Gaussians algorithm.
Fig. 1
(c) depicts the foreground likelihood map of the image using Codebookbased method.
Comparison of foreground likelihood maps obtained using different methods
 2.2 MultiTarget Tracking Algorithm
Tracking multiple people accurately in cluttered and crowded scenes is a challenging task primarily due to occlusion between people
[13

14]
. Particle filter can work well when the object gets an occlusion, but it has difficulty in satisfying the requirement of realtime computing. Meanshift can solve this problem easily, while it has poor robustness during mutual occlusion. Aiming at all above problems, this section proposes a robust multitarget tracking algorithm by combining the particle filter with meanshift method.
Particle filters, provide an approximative Bayesian solution to discrete time recursive problem by updating an approximative description of the posterior filtering density
[15]
.
At time
k
, when a measurement
z_{k}
becomes available,
z
_{1:k}
= {
z
_{1}
,
z
_{2}
,...,
z_{k}
}.Assume that probability distribution function
p
(
x
_{k1}

z
_{1:k1}
) is available at time
k
1. According to the Bayes’ rule, the posterior probability function of the state vector can be calculated using the following equations.
This is the prior of the state
x_{k}
at time
k
without the knowledge of the measurement
z_{k}
, i.e. the probability given only previous measurements. Update step combines likelihood of current measurement with predicted state.
p
(
z
_{k}

z
_{1:k1}
) is a normalizing constant. It can be calculated by:
Because
p
(
z
_{k}

z
_{1:k}
) is a constant, (8) can be written as:
Supposing that at time step
k
there is a set of particles, {
x^{i}_{k}
,
i
=1,...,
N
} with associated weights {
ω^{i}_{k}
,
i
=1,...,
N
} randomly drawn from importance sampling, where
N
is the total number of particles. The weight of particle
i
can be defined as:
We use the transition prior
p
(
x
_{k}

x
_{k1}
) as the importance density function
q
(
x^{i}_{k}

x
^{i}
_{k1}
,
z
_{1:k}
). Then, we can simplify (11) as:
Furthermore, if we use Grenander’s factored sampling algorithm, Eq.(16) can be modified as:
The particle weights then can be normalized by using:
to give a weighted approximation of the posterior density in the following form:
where
δ
is the Dirac’s delta function.
Meanshift algorithm was first analyzed in
[16]
and developed in
[17]
. Meanshift is a nonparametric statistical approach that seeks the mode of a density distribution in an iterative procedure
[18]
. Let
X
denote the current location, then its new location
X'
after one iteration is :
where {
a_{i}
,
i
=1,...,
N
} are normalized points within the rectangle area specified by the current location
X
,
ω
(
a_{i}
) is the weight associated to each pixel
a_{i}
, and
g
(
x
) is a kernel profile function, and
h
is window radius to normalize the coordinate
a_{i}
.
In our tracking algorithm, we assume that the dynamic of state transition corresponds to the following second order auto regressive process.
where
A
,
B
,
C
are the autoregression coefficients,
n_{k}
is the Gaussian noise .
We use HSV color histogram to build the observation model. Given the current observation
z_{k}
, the candidate color histogram
Q
(
x_{k}
) is calculated on
z_{k}
in the region specified by
x_{k}
. The similarity between
Q
(
x_{k}
) and the reference color histogram
Q
^{*}
by Bhattacharyya coefficient
d
(,).The likelihood distribution is evaluated as
In our method, the meanshift algorithm is applied on every sample in sample set, this will greatly reduce the computational time of particle filtering. It might not be able to capture the true location of the objects during mutual occlusion. The particle filter can improve the robustness of the algorithm. We propagate particle {
x
^{i}
_{k1}
,
i
=1,...,
N
} according to the dynamic of state transition to get
. The samples set
is the first transition to get
by meanshift according to Eq.(16). With meanshifted samples
,we update their weights {
ω^{i}_{k}
,
i
=1,...,
N
} according to Eq.(14). The likelihood distribution
p
(
z_{k}

x^{i}_{k}
) is given by Eq.(18). Then we resample
and generate unweighted sample set {
x^{i}_{k}
,1/
N
}
_{i=1,...N}
. In
Fig. 2
the tracking results are demonstrated for outdoor video sequences in different frames.
Tracking results for test video sequences
3. Videobased Height Measurements of Multiple Moving Objects
 3.1 Projective Geometry
In this section, we introduce the main projective geometric ideas and notation that are required for understanding our measurement algorithm well. We use upper case letters to indicate points in the world system and the corresponding lower case letters for their images.
Fig. 3
shows the basic geometry of the scene. A line segment in space, orthogonal to the ground plane and identified by its top point
H_{i}
and base point
F_{i}
, is denoted by
H_{i}
F_{i}
, and its length is denoted by
d
(
H_{i}
,
F_{i}
).
H_{i}
F_{i}
is projected onto the image plane as the line segment
h_{i}
f_{i}
.The line
l
is the vanishing line of the ground plane, and
^{v}
the vertical vanishing point. Given one reference height
d
(
H_{l}
,
F_{l}
)=
d_{l}
in the scene, the height of any object on ground plane(e.g.
d
_{2}
)) can be measured using geometry method shown in
Fig. 3
(b).
Basic geometry of the scene
The measurement is achieved with two steps. At the first stage, we map the length of the line segment
h
_{1}
f
_{1}
onto the other
h
_{2}
f
_{2}
. The intersection of the line through the two base points
f
_{1}
and
f
_{2}
with the vanishing line
l
determines the point
^{u}
, and the intersection of the line through
h
_{1}
and
^{u}
with the line through
^{v}
and
f
_{2}
determines the point
i
. Because
^{v}
and
^{u}
are vanishing points,
h
_{1}
f
_{1}
and
h
_{1}
i
are parallel to
if
_{2}
and
f
_{1}
f
_{2}
respectively.
h
_{1}
,
i
,
f
_{2}
, and
f
_{1}
forms a parallelogram with
d
(
h
_{1}
,
f
_{1}
)=
d
(
i
,
f
_{2}
). We now have four collinear points
^{v}
,
h
_{2}
,
i
,
f
_{2}
on an imaged scene line and thus there is a cross ratio available. The distance ratio
d
(
h
_{2}
,
f
_{2}
):
d
(
i
,
f
_{2}
) is the computed estimate of
d
_{2}
:
d
_{1}
by applying a 1D projective transformation. At the second stage, we compute the ratio of length on the imaged scene line using cross ratio
[19]
.
The ratio between two line segments
h
_{2}
f
_{2}
and
if
_{2}
can be written by:
with
d
_{2}
=
rd
_{1}
. The height of any object can be measured using this method.
With the assumption of perfect projection, e.g. with a pinhole camera, a set of parallel lines in the scene is projected onto a set of lines in the image which meet in a common point. This point of intersection, perhaps at infinity, is called the vanishing point. Different approaches are adopted to detect vanishing points for the reference direction, according to the environments of video data sets.
In pinhole camera model, the vanishing line of the ground plane can be determined as the line through two or more vanishing points of the plane. If we have
N
vertical poles of same height in the perspective view, the vertical vanishing point
V_{Y}
can be computed just by finding the intersection of two (or more) poles. And the vanishing line of the ground plane
V_{L}
is the line consisted by the points of intersection of the lines connecting the top and bottom of the poles. Thus, we can fix the vanishing line through three (or more) noncoplanar poles. In this paper, we denote the poles by {(
t_{i}
,
b_{i}
)}
_{i}
_{=1,2...}
_{N}
, where
t_{i}
,
b_{i}
represent the image positions of the top and bottom of the poles, respectively. {(∑
_{ti}
,∑
_{bi}
)}
_{i}
_{=1,2,...}
_{N}
are the associated covariance matrices.
V_{Y}
can be fixed by finding the point
v
that minimizes the sum of distances from
t_{i}
and
b_{i}
to the line linking
m_{i}
and
v
. Where
m_{i}
is the midpoint of
t_{i}
and
b_{i}
, (
w_{i}
,
b_{i}
) is the line determined by
m_{i}
and
v
.
V_{L}
can be computed by
where
w_{VL}
is the unit vector of
V_{L}
and
b_{vL}
is a point on vanishing line.
The point
x_{i}
is the intersection of line
t_{j}b_{j}
and line
t_{k}b_{k}
. The covariance matrix ∑
_{i}
of
x_{i}
can be computed by using Jacobian as
where
.
 3.2 Extracting head and feet feature points from moving objects
Humans are roughly vertical while they stand or walk. However, because human walking is an articulated motion, the shape and height of the human vary in different walking phases. As shown in
Fig. 4
, at the phase which the two legs cross each other, height of the human we measured from the video sequence is highest, and is also the most appropriate height to represent human’s static height.
The height of human varies periodically during walking cycle
The phase at which the two feet cross each other (legcrossing) is of particular interest in that the feet position is relatively easy to locate and the shape is relatively insensitive to viewpoint. Thus, we aim to extract the head and feet locations at legcrossing phases. We first detect a walking human from a video sequence by change detection.Then, we extract the legcrossing phases by temporal analysis of the object shape. Finally, we compute the principal axes of the human’s body and locate the human’s head and feet positions at those phases.
To every single frame
t
, the head feature point
h^{t}_{i}
of the object
i
{
i
=1,2,...
N
} can be obtained using the following steps.
Step 1. Construct the target likelihood matrix
L^{t}_{i}
corresponding to the foreground blobs
B^{t}_{i}
(
w_{i}
,
h_{i}
), where
w_{i}
and
h_{i}
denote the width and height of foreground blob
B^{t}_{i}
, respectively.
Step 2. Compute the covariance matrix
C^{t}_{i}
of target likelihood matrix
L^{t}_{i}
. The covariance matrix
C^{t}_{i}
can computed as
Where
L^{t}_{i}
(
m
) and
L^{t}_{i}
(
n
) denote the
m
and
n
column of foreground target matrix at frame
t
.
Step 3. Compute the first eigenvectors
of covariance matrix
C^{i}_{t}
.The centroid and
of the blob give the principal axis
P^{i}_{t}
of the target’s body. The head feature point is assumed to be located on the principal axis.
Step 4. Project target blob
B^{i}_{t}
onto its corresponding principal axis
P^{i}_{t}
. Locate the head feature point
h^{i}_{t}
by finding the first end point along the principal axis whose projection count is above a threshold along the principal axis from the top to the bottom.
Humans are roughly vertical at different phase of a walking cycle. This means that the head feature point, the feet feature points and vanishing point are collinear. We obtain the feet feature points of target
f_{i}
by applying the collinear constraint. The
f_{i}
can be computed as
f_{i}
=(
h_{i}
×
V_{Y}
)×
l
_{b(i)}
.
h_{i}
denotes head feature point of object i.
V_{Y}
denotes the vertical vanishing point.
l
_{b(i)}
denotes the bottom line of blob.
 3.3 Multiple frame fusion
The measurements from multiple frames include outlier observations due to bad tracking errors, articulated motion, and occlusions. It makes using mean of the multiframe estimates not robust. The RANSAC technique has the wellknown property of being less sensitive to outliers. Thus, in this paper, we use RANSAC algorithm to estimate the height from a set of data contaminated by outliners.
4. Experimental Results and Analysis
In order to show the robustness of the height measurement algorithm discussed in this paper, we conducted several experiments with the data that we collected from stationary cameras under different environments. Moving objects include vehicles and humans. Given the limited space, in this section we only list two of them to show the experimental results and the forms of data statistics. The number of particles used for our method is 300 for all experiments.
The implementation of the algorithm is based on Windows 7 operating system and using MATLAB as the software platform. The configuration of the computer is AMD Athlon (TM) X2DualCore QL62 2.00GHz,1.74GB memory.
The results of height measurements for test video 1 are shown in
Fig. 5
.
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in
Table 1
.
Statistics of the measured value for test video 1
Statistics of the measured value for test video 1
The results of height measurement for test video 2 are shown in
Fig. 6
. The tracking blobs are object 1, object 2, object 3 from right to left respectively. The heights of objects are shown on the top of the blobs.
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in
Table 2
.
Statistics of the measured value for test video 2
Statistics of the measured value for test video 2
From the experimental results, we can see that our algorithm shows better accuracy and robustness than algorithm proposed in
[10]
.
The results of height measurement for test video 3 are shown in
Fig. 7
.
The results of height measurement for test video 1
Statistics of the measured value for test video 1 are shown in
Table 3
.
Statistics of the measured value for test video 3
Statistics of the measured value for test video 3
From the experimental results, we can see that our proposed algorithm does not require calibrating the camera, and can track the multiple moving objects when occlusion occurs. Therefore, it reduces the complexity of calculation and improves the accuracy of measurement simultaneously.
5. Conclusion
We have presented a new algorithm for estimating height of multiple moving objects. We first compute the vanishing line of the ground plane and the vertical vanishing point. Secondly, detect and track multiple moving objects. Then, the head feature points and the feet feature points are extracted in each frame of video sequences. The height measurements of multiple objects are obtained according to the projective geometric constraint. Finally, the multiframe measurements are fused by using RANSAC algorithm. The experimental results demonstrate that our method is effective and robust to the occlusion.This is a preliminary study and further work is required to do.
BIO
Mingxin Jiang received the M.S. degree in communication and information system from Jilin University,Changchun,China,in 2005. She received the Ph.D. in Dalian University of Technology in 2013. Her research interests include multiobject tracking, video content analysis and visual metrology.
Hongyu Wang received his B.S. degrees from Jilin University of Technology, Changchun, China, in 1990,and M.S. degrees fromGraduate School of Chinese Academy of Sciences, Changchun, China, in 1993, both in Electronic Engineering. He received the Ph.D. in Precision Instrument and Optoelectronics Engineering from Tianjin University, Tianjin, China, in 1997. Currently, he is a Professor in the institute of Information Science and Communication Engineering, Dalian University of Technology, China. His research interests include computer vision, video coding and wireless video sensor networks.
Cai J.
,
Walker R.
2010
“Height estimation from monocular image sequences using dynamic programming with explicit occlusions”
IET Comput. Vis.
Article (CrossRef Link)
4
(4)
149 
161
DOI : 10.1049/ietcvi.2009.0063
Criminisi A.
,
Distinguished dissertation.
2001
“Accurate visual metrology from single and multiple uncalibrated images”
SpringerVerlag
New York
Distinguished dissertation.
Hu W.
,
Tan T.
,
Wang L.
,
Maybank S.
2004
“A survey on visual surveillance of object motion and behaviors”
IEEE Trans. on Systems, Man, and Cybernetics—Part C: Applcations and Reviews
Article (CrossRef Link)
34
(3)
334 
352
DOI : 10.1109/TSMCC.2004.829274
Criminisi A.
,
Reid I.
,
Zisserman A.
2000
“Single view metrology”
Int. J. Comput. Vis.
Article (CrossRef Link)
40
(2)
123 
148
DOI : 10.1023/A:1026598000963
Reid I.
,
Zisserman J. A.
1996
“Goalderected video metrology”
in Proc. of 4th European Conference on Computer Vision (ECCV)
April 15–18
647 
658
Zhang Z.
2000
“A flexible new technique for camera calibration”
IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI)
Article (CrossRef Link)
22
(11)
1330 
1334
DOI : 10.1109/34.888718
Guo F.
,
Chellappa R.
2010
“Video metrology using a single camera”
IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI)
Article (CrossRef Link)
32
(7)
1329 
1335
DOI : 10.1109/TPAMI.2010.26
Renno J.
,
Orwell J.
,
Jones G.
2002
“Learning surveillance tracking models for the selfcalibrated ground plane”
in Proc. of British Machine Vision Conf.
607 
616
Bose B.
,
Grimson E.
2003
“Ground plane rectification by tracking moving objects”
in Proc. of Joint IEEE International Workshop on Visual Surveillance and Performence Evaluation of Tracking and Surveillance
Shao J.
,
Zhou S. K.
,
Chellappa R.
2010
“Robust height estimation of moving objects from uncalibrated videos”
IEEE Trans. On Image Processing
Article (CrossRef Link)
19
(8)
2221 
2232
DOI : 10.1109/TIP.2010.2046368
Kim K.
,
Chalidabhongse T.H.
,
Harwood D.
,
Davis L.S.
2005
“RealTime foregroundbackground segmentation using codebook model”
RealTime Imaging
11
(5)
167 
256
Bao B. K.
,
Liu G.
,
Xu C.
,
Yan S.
2012
“Inductive Robust Principal Component Analysis”
IEEE Transactions on Image Processing
Article (CrossRef Link)
21
(8)
3794 
3800
DOI : 10.1109/TIP.2012.2192742
Khan Z.H.
,
Gu. I.Y.H
2013
“Nonlinear dynamic model for visual object trackingon grassmann manifolds with partial occlusion handling”
IEEE Trans. on Cybernetics
Article (CrossRef Link)
43
(6)
2005 
2019
DOI : 10.1109/TSMCB.2013.2237900
Khatoonabadi S.H.
,
Bajic. I.V.
2013
“Video object tracking in the compressed domain using spatiotemporal markov random fields”
IEEE Trans. On Image Processing
Article (CrossRef Link)
22
(1)
300 
313
DOI : 10.1109/TIP.2012.2214049
Du M.
,
Nan X. M.
,
Guan L.
2013
“Monocular Human Motion Tracking by Using DEMC Particle Filter”
IEEE Trans. On Image Processing
Article (CrossRef Link)
22
(10)
3852 
3865
DOI : 10.1109/TIP.2013.2263146
Comaniciu D.
,
Meer P.
2002
“Mean shift: A robust approach toward feature space analysis”
IEEE Trans. Pattern Anal. Mach. Intell.
Article (CrossRef Link)
24
(5)
603 
619
DOI : 10.1109/34.1000236
Wang L. F.
,
Yan H. P.
,
Wu H. Y.
,
Pan C. H.
2013
“Forward–backward meanshift for visual tracking with localbackgroundweighted histogram”
IEEE Trans. On Intelligent Transportation Systems
Article (CrossRef Link)
14
(3)
1480 
1489
DOI : 10.1109/TITS.2013.2263281
Comaniciu D.
,
Ramesh V.
,
Meer P.
2003
“Kernel based object tracking”
IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI)
Article (CrossRef Link)
25
(5)
564 
577
DOI : 10.1109/TPAMI.2003.1195991
Hartley R.
,
Zisserman. A.
2003
Multiple view geometry in computer vision.
2nd Edition
Cambridge University Press
Cambridge