Recent years have witnessed a growing interest in the fields of video surveillance and mobile object tracking. This paper proposes a mobile object tracking algorithm. First, several parameters such as object window, object area, and expansion-contraction (E-C) parameter are defined. Then, a modified E-C algorithm for multiple-object tracking is presented. The proposed algorithm tracks moving objects by expansion and contraction of the object window. In addition, it includes methods for updating the background image and avoiding occlusion of the target image. The validity of the proposed algorithm is verified experimentally. For example, the first scenario traces the path of two people walking in opposite directions in a hallway, whereas the second one is conducted to track three people in a group of four walkers.
1. Introduction
In recent years, an increasing number of studies have investigated video surveillance and mobile object tracking algorithms. The application areas of object tracking include
-
Motion-based recognition of humans,
-
Automated surveillance for monitoring a scene to detect suspicious activities or unlikely events, and
-
Traffic monitoring for real-time collection of traffic statistics in order to direct traffic flow.
To detect a mobile object, the target object should be separated from the background. This can be done by using the background subtraction method or the frame differencing method for adjacent frames. The method used for object tracking depends on the representation of the target object as a point, silhouette, etc.
[1]
. Typical object tracking methods are pointbased tracking, kernel tracking and silhouette tracking
[1
,
2]
. Recent years have witnessed the growing use of probabilistic approaches, such as the use of a probability distribution to represent the position and color distribution of an object, for object tracking
[3]
.
Several multiple-object tracking algorithms such as Kalman filter
[4]
, particle filter
[5
-
8]
, and mean shift
[9
,
10]
are also available. Furthermore, a vector Kalman predictor
[11]
has been proposed for tracking objects. In this paper, separate methods for occlusion and merging are applied to resolve ambiguous situations. Moreover, states of the corresponding moving objects are searched using a spiral searching technique prior to tracking. Recently, Czyzewski and Dalka
[12]
used a Kalman filter with an RGB color-based approach to measure the similarity between moving objects. Zhang et al.
[13]
presented a particle swarm optimizationbased approach for multiple-object tracking based on histogram matching. Jiang et al.
[14]
suggested a linear programming approach, whereas Huang and Essa
[15]
presented an algorithm for tracking multiple objects through occlusions.
The basic expansion-contraction (E-C) algorithm has been presented in previous papers
[16
,
17]
. The problems discussed in these papers include
-
Changes in lighting conditions,
-
Failure to track fast-moving objects, and
-
Difficulty in separating adjacent objects.
In this paper, a modified E-C algorithm for multiple-object tracking is presented. Modifications are made to the method of expansion and contraction for an object window in order to separate the target object from the surrounding objects and the background. The proposed algorithm includes a method for avoiding occlusion of the target image. Finally, the validity of the proposed algorithm is verified through several experiments.
2. Problem Formulation and Definitions
- 2.1 Summary of Some Definitions Proposed by in[18,19]
Several parameters such as object window, object area, and expansion and contraction parameter defined in are reintroduced in this paper. The binary image is denoted by
I
, and
Ix
and
Iy
are defined as
where
Ix
and
Iy
represent the density of non-zero pixels in the x-direction and y-direction, respectively. The object window is defined as a minimized image box that includes a target object. The object area can be computed as
The center position
xp
,
yp
can be calculated as
where
xp
, is the center of mass in the
x
-direction and
yp
is thecenter of mass in the
y
-direction.
In case of object tracking with a video stream, the size of the target object changes according to its distance from the camera. Thus, the size of the object window must be changed depending on the size of the target object. To carry out this operation, the expansion and contraction parameter is defined as
which is the ratio of the object window to the target object. Note that the object window must include the target object, and
ECpar
must be greater than 1.
- 2.2 Separable, Partially Separable, and Inseparable Objects
It is important to separate the target object from other objects, in order to ensure that the resulting object window contains only the target object.
Figure1
shows a group of people walking together (left), and its corresponding
Ix
(top-right) and
Iy
(bottom-right). As shown in the left figure, it is difficult to separate the encircled person entirely as a vertical strip or horizontal strip. However, as shown in the top-right figure, the encircled person may be separated as a vertical strip i.e., partially separable on the x-coordinate. However, a woman indicated by the white arrow cannot be separated on any coordinate because its object area is relatively small and is thus absorbed in a different object’s area in the course of the operation of
Ix
and
Iy
.
Even in this case, the target object lies between 100-220 pixels on the
y
- axis and the finally separated object window is shown in
Figure1
(b). Further, the corresponding
Ix
and
Iy
are shown in the center figure and the bottom figure respectively. As shown in the center figure of
Figure 1
(b), the target object window is separated well and it contains the target object.
Let us now consider another example where the aim is to separate the encircled image as shown in the top-left figure of
Figure 2
. As shown in the middle and bottom figures, the target object (people) is partially separable on the
x
coordinate but is inseparable on the y coordinate. Thus, from the information obtained the middle figure, i.e., the target object lies in 120~160 pixels across, the image can be separated into the strip image, which contains 120~160 pixels across in the x-direction and all pixels in the y-direction. The resulting strip is shown by the strip box in the top-right figure of
Figure 3
(a). The next step is to recalculate
Ix
and
Iy
for the strip image obtained previously, which is shown in the top-right and second right
(a). Group of peoplewalking together (left), corresponding Ix (top-right) and Iy (bottom-right). (b). The stripped image on y-coordinate (top), corresponding Ix (center) and Iy (bottom).
figures in
Figure 3
(b). The top-left figure in
Figure 3
(b) showsthe strip image, the top-right figure is
Ix
, and the second-right figure is
Iy
. The strip image i.e., the top-left figure shows that there is some noise at the top of the strip image, which cannot be separated on the
x
coordinate anymore. However, as shown in the strip image or second-right figure, the target object can be separated from the noise on the
y
axis.
3. Modified E-C Method
The entire process of object tracking is described in this section. This section describes the overall system flow and suggests an algorithm for updating the background image. A method for
(a) The original image frame (top-left), binary image (topright), Ix (middle) and Iy (bottom). (b) The strip image (top-left), the final objectwindow(bottom-left), Ix (top-right) and Iy (second-right) for strip image and Ix (third-right) and Iy (bottom-right) for the final object window.
expansion and contraction of the object window and the process of selecting an object by color information are also described in this section.
- 3.1 Object Tracking Procedure
The overall process of object tracking is shown in
Figure 3
. The first step in object tracking is the initialization process. This step involves
-
Computation of the initial position of the target object,
-
Selection of an extended initial object window,
-
Selection of Δp0(Δx0, Δy0), which is the initial value of the variation of the center of mass point of the target object, and
-
Computation of the predicted center of mass position
Go to the first frame. Extracting the sub-image from the background frame and the current frame is the second step in this process. In this step, the predicted center of mass position
is considered as the center and the size of the window is three or four times greater than that of the object window that was previously selected. In the next step, the absolute difference between the two sub-images obtained earlier is calculated and converted into a binary image using a threshold operation. The fourth step involves calculating diag(
IIT
) and diag(
ITI
), contracting the extended object window, and extracting the target object. The area of the target object, the actual center of mass position
p
1
(
x
1
,
y
1
), and the expansion and contraction parameter
ECpar
are calculated in this step. In the final step, the predicted center of mass position
is computed. Go to the next frame.
The target tracking process described above can be summarized as three key-stages, prediction - operation - update. In prediction stage, the predicted center of mass position of the target objects are computed by using informations obtained previous frames, and expanded object window, centered at the predicted center of mass and sized three or four times larger than target object, is selected for each target. The primary role of operation stage is extraction of the target objects. This stage includes extraction of sub-image, conversion of sub-image into binary image, calculation of
Ix
and
Iy
, and contraction of object window. If it is required to separate target object from other objects, then the separation process described in Section 2.2 is performed. In update stage, the actual center of mass position for each target and
ECpar
are computed.
- 3.2 Expansion and Contraction of Object Window
The center of mass position
pk
(
xk
,
yk
) for the
k
th
frame is described by
where,
η
x
,
η
y
are noise terms.
For the (
k
+1)
th
frame, the predicted center of mass position
Overview of the system flow.
is
is then selected as the three-step weighted average, i.e.,
Eqs. (8a) and (8b) are described, in terms of measured values, as follows:
For the case of multiple-target tracking, the predicted position of the j
th
object is
The calculations used in this paper to predict the center of mass point of a target object are very simple and adequate for target
Background image (top-left), base frame (top-right), kth frame (bottom-left), and (k+1)th frame (bottom-right) of the expansion and contraction procedure of a person walking at a 60-frames interval are shown above.
tracking in an indoor environment. Of course, the Kalman filtering method or the particle filter algorithm is also available instead of Eq. (5).
The expansion and contraction procedure, a part of the main result of this paper, is shown in
Figures 4
and
5
. For comparison with other studies, all video materials are borrowed from context aware vision using image-based active recognition (CAVIAR)
[17]
.
Figure 4
shows how to extend and contract the object window. In this figure, the first image is the background image and the remaining three images show a woman walking at 60-frame intervals. The top-right image in this figure shows the expansion and contraction procedure of an object window. The first step is calculating
p
0
(
x
0
,
y
0
) by reducing the initially selected objected window and using Eq. (3) described in the topleft and top-right figures in
Figure 5
. Then, the predicted center of mass
described in the top-right figure of
Figure 4
is computed. In the current (
k
th
) frame, obtain sub-images by extracting the background and the
k
th
frame and calculate the binary image shown in the mid-left figure of
Figure 5
. Then, obtain the object window by contraction (white arrow). Then, compute the predicted center of mass
The operation of expansion and contraction of the object window is very simple as the actual operation is performed on the
Ix
and
Iy
axis and not on the image frame expansion. These operations are shown in the middle and bottom figures, respectively. The operation procedure is a two- step process that involves extending and contracting the object window first
Expansion and contraction of object window (top-right), same operation on Ix (middle), and same operation on Iy (bottom).
on the
Ix
axis and then on the
Iy
axis.
The expansion and contraction parameter, (
ECpar
), plays an important role in the contraction operation. Initially, the value of this parameter is greater than 1. It becomes 2 when the ratio of the object area to the total area of the object window is 50%. Further, the value becomes 3 when the ratio becomes 30%. If the expansion and contraction parameter tends to 1, this implies that the object is too large compared to the object window. When the parameter takes a value approximately 3 or 4, it implies that the object is very small compared to the object window. Thus, it is reasonable that the value of the
ECpar
variable is maintained around 2. When the value of
ECpar
is close to 1, the object window must be extended, and when it is much greater than 2, the object window must be contracted. In order to maintain the performance of the system, the appropriate
ECpar
value is around 1.5 to 2.
- 3.3 Selection of Object by Color Information[15]
If an occlusion has occurred, then the color information of the target object just before and after the occlusion is very useful. The tracking can be successfully continued if the two objects are not identical or have similar color. The study on the occlusion can be divided into two kinds. The one is using color and shape information
[15]
and other is movement information of target object by using particle filer or Kalman filter algorithm
[16
,
20]
. However, when both the objects have identical or similar colors, and are of similar shape, then the tracking may fail. Such a
When occlusion has occurred (a, b), just before (c) and after (d).
scenario requires further investigation.
In order to solve this problem, this paper uses both information, i.e., color and shape of the target object and velocity information.
Figure 6
shows an occlusion (a and b) occurring just before (c) and after (d). The middle and the bottom figure of (a) is
Ix
and
Iy
respectively. Middle and bottom figure of (b), (c) and (d) are
Ix
and
Iy
respectively, but each of which are computed by using color matrix, i.e., RGB matrices. Bottom figure of (b), (c) and (d) shows very similar pattern, even the position of two objects A and B are exchanged. But middle figure of (b), (c) and (d) shows different shape each other. Also two objects can be separate about 275 pixels for (c) and about 265 pixels for (d). Separated objects can be identified by using color distribution or shape.
4. Experimental Results and Discussion
To verify the validity of the algorithm presented in this paper, several experiments were performed using mobile images provided by CAVIAR
[17]
. The first experiment involved tracking one person walking from the bottom-right corner of the lobby towards the top left corner. The second experiment involved tracking two people walking in opposite directions and one person walking in a crowd. The last experimental scenario involves tracking three people walking together and another person walking in the opposite direction.
- 4.1 Scenario 1: Tracking One Person Walking in the Lobby
The first experiment involves tracking one person walking from the bottom right corner of the lobby towards the top left corner. The tracking results of this experiment are shown in
Figure 7
. Each frame in this figure is selected from the 10-frame steps. The calculated target positions for each frame are marked by “*”. As shown in
Figure 7
, the target tracking is performed successfully and the proposed algorithm works well.
- 4.2 Scenario 2: Tracking Two PeopleWalkingWith Other People
The second scenario consists of tracking two people walking in opposite directions and one person walking in a crowd. In
Figure 8
, the tracking procedure is shown by 13-frame intervals. In each image, the person walking in the upward direction is marked by a red cross. Further, the person walking with a group of three people from the center in the downward direction is marked by a yellow cross. As shown by the second row and fourth row, accurate tracking is performed even when these two people approach very closely.
- 4.3 Scenario 3: Tracking Three People Walking With Other People
The third experimental scenario is to track three people walking together and another person walking in the opposite direction. This scenario is the same as scenario 2, except that one person is added to the target. It is known by this scenario that the computational complexity increases in comparison with scenario 2. However, it does not significantly affect the run-time. This procedure is shown in
Figure 9
.
5. Conclusion
This paper investigated multi-human tracking in an indoor environment and presented a modified E-C method. The proposed algorithm provides the advantages of the mean-shift algorithm as well as the useful properties of particle swam optimization and filter-based algorithms for multi-object tracking. Some useful new variables were defined, such as object window, E-C parameter (i.e. the ratio of the object area to the object window area),
Ix
, defined as the distribution of non-zero pixels in the horizontal direction (x-direction), and
Iy
, defined as the distribution of non-zero pixels in the vertical direction (y-direction). The center of mass for a human object is computed using
Ix
Tracking result for one people. Each frame in this figure is selected from the 10-frame steps.
Tracking result for two people walking in opposite directions.
and
Iy
. To show that the proposed object tracking method can be efficiently applied to a variety of environment, several experiment were carried out. As stated in the experimental section, the proposed method works well for every scenario. As the computational load is very low, the proposed method will be useful for more complex environments as well. However, in
Tracking results for three people when four people are walking. Three people are walking together, but one is walking in the opposite direction.
case of two objects having identical or similar colors, and similar shape, the tracking may fail, and such a scenario requires further research.
- Conflict of Interest
No potential conflict of interest relevant to this article was reported.
Acknowledgements
This research was supported by the 2013 Scientific Promotion Program funded by Jeju National University.
Comaniciu D.
,
Ramesh V.
,
Meer P.
2003
“Kernel-based object tracking”
IEEE Transactions on Pattern Analysis and Machine Intelligence
25
(5)
564 -
577
DOI : 10.1109/TPAMI.2003.1195991
Takala V.
,
Pietikäinen M.
2007
“Multi-object tracking using color, texture and motion”
in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Minneapolis, MN
June 17-22
article number 4270504.
DOI : 10.1109/CVPR.2007.383506
Khan S. M.
,
Shah M.
2009
“Tracking multiple occluding people by localizing on multiple scene planes”
IEEE Transactions on Pattern Analysis and Machine Intelligence
31
(3)
505 -
519
DOI : 10.1109/TPAMI.2008.102
Arulampalam M.S.
,
Maskell S.
,
Gordon N.
,
Clapp. T.
2002
“A tutorial on particle filters for online nonlinear non-Gaussian Bayesian tracking”
IEEE Transactions on Signal Processing
50
(2)
174 -
188
DOI : 10.1109/78.978374
Hue C.
,
Le Cadre J. P.
,
Pérez P.
2002
“Tracking multiple objects with particle filtering”
IEEE Transactions on Aerospace and Electronic Systems
38
(3)
791 -
812
DOI : 10.1109/TAES.2002.1039400
Maskell S.
,
Gordon N.
2001
“A tutorial on particle filters for on-line nonlinear/non-Gaussian Bayesian tracking” IEE Target Tracking: Algorithms and Applications (Ref No. 2001/174)
2/1 -
2/15
DOI : 10.1049/ic:20010246
Kwon J.
,
Lee K.M.
,
Park F.C.
2009
“Visual tracking via geometric particle filtering on the affine group with optimal importance functions”
in IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Miami, FL
June 20-25
991 -
998
DOI : 10.1109/CVPRW.2009.5206501
Comaniciu D.
,
Meet P.
1999
“Mean shift analysis and applications”
in Proceedings of the 1999 7th IEEE International Conference on Computer Vision
Kerkyra, Greece
September 20-27
1197 -
1203
DOI : 10.1109/ICCV.1999.790416
Comaniciu D.
,
Ramesh V.
2000
“Mean shift and optimal prediction for efficient object tracking”
in International Conference on Image Processing
Vancouver, Canada
September 10-13
[d]70 -
73
DOI : 10.1109/ICIP.2000.899297
Vigus S. A.
,
Bull D. R.
,
Canagarajah C. N.
2001
“Video object tracking using region split and merge and a Kalman filter tracking algorithm”
in Proceedings of the International Conference on Image Processing
October 7-10
650 -
653
DOI : 10.1109/ICIP.2001.959129
Czyzewski A.
,
Dalka P.
2008
“Examining Kalman Filters Applied to Tracking Objects in Motion”
in 9th International Workshop on Image Analysis for Multimedia Interactive Services
Klagenfurt, Austria
May 7-9
175 -
178
DOI : 10.1109/WIAMIS.2008.23
Zhang X.
,
Hu W.
,
Maybank S.
,
Li X.
,
Zhu M.
2008
“Sequential particle swarm optimization for visual tracking”
in Proceedings of the 26th IEEE Conference on Computer Vision and Pattern Recognition
Anchorage, AK
June 23-28
article number 4587512.
DOI : 10.1109/CVPR.2008.4587512
Jiang H.
,
Fels S.
,
Little J. J.
2007
“A linear programming approach for multiple object tracking”
in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Minneapolis, MN
June17-22
article number 4270205.
DOI : 10.1109/CVPR.2007.383180
Huang Y.
,
Essa I.
2005
“Tracking multiple objects through occlusions”
in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
San Diego, CA
June 20-25
1051 -
1058
DOI : 10.1109/CVPR.2005.350
Ko K. E.
,
Park J. H.
,
Park S. M.
,
Kim J. Y.
,
Sim K. B.
2012
“Occluded object motion estimation system based on particle filter with 3D reconstruction”
International Journal of Fuzzy Logic and Intelligent Systems
12
(1)
60 -
65
DOI : 10.5391/IJFIS.2012.12.1.60
“CAVIAR: Context Aware Vision using Image-based Active Recognition”
http://homepages.inf.ed.ac.uk/rbf/CAVIAR/
Kang J. S.
2013
“A new mobile object tracking approach in video surveillance. Part I: Indoor environment”
in The 14th International Symposium on Advanced Intelligence Systems
Daejeon, Korea
November 13-16
1097 -
1102
Kim S. W.
,
Kang J. S.
2013
“A new mobile object tracking approach in video surveillance. Part II: Outdoor environment”
in The 14th International Symposium on Advanced Intelligence Systems
Daejeon, Korea
November 13-16
1103 -
1108
Park S. M.
,
Park J. H.
,
Kim H. B.
,
Sim K. B.
2011
“Specified object tracking problem in an environment of multiple moving objects”
International Journal of Fuzzy Logic and Intelligent Systems
11
(2)
118 -
123
DOI : 10.5391/IJFIS.2011.11.2.118