Advanced
A real-time multiple vehicle tracking method for traffic congestion identification
A real-time multiple vehicle tracking method for traffic congestion identification
KSII Transactions on Internet and Information Systems (TIIS). 2016. Jun, 10(6): 2483-2503
Copyright © 2016, Korean Society For Internet Information
  • Received : July 17, 2015
  • Accepted : April 20, 2016
  • Published : June 30, 2016
Download
PDF
e-PUB
PubReader
PPT
Export by style
Article
Author
Metrics
Cited by
About the Authors
Xiaoyu, Zhang
School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai, 200240, China
Shiqiang, Hu
School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai, 200240, China
Huanlong, Zhang
College of electric and information engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, China
Xing, Hu
School of Aeronautics and Astronautics, Shanghai Jiao Tong University, Shanghai, 200240, China

Abstract
Traffic congestion is a severe problem in many modern cities around the world. Real-time and accurate traffic congestion identification can provide the advanced traffic management systems with a reliable basis to take measurements. The most used data sources for traffic congestion are loop detector, GPS data, and video surveillance. Video based traffic monitoring systems have gained much attention due to their enormous advantages, such as low cost, flexibility to redesign the system and providing a rich information source for human understanding. In general, most existing video based systems for monitoring road traffic rely on stationary cameras and multiple vehicle tracking method. However, most commonly used multiple vehicle tracking methods are lack of effective track initiation schemes. Based on the motion of the vehicle usually obeys constant velocity model, a novel vehicle recognition method is proposed. The state of recognized vehicle is sent to the GM-PHD filter as birth target. In this way, we relieve the insensitive of GM-PHD filter for new entering vehicle. Combining with the advanced vehicle detection and data association techniques, this multiple vehicle tracking method is used to identify traffic congestion. It can be implemented in real-time with high accuracy and robustness. The advantages of our proposed method are validated on four real traffic data.
Keywords
1. Introduction
W ith the high-speed development of city construction, modern societies with well-planned road management systems and sufficient infrastructures for transportation still face the problem of traffic congestion. This results in loss of travel time, huge societal and economic costs. It has been recently realized [1] that mere infrastructure expansion cannot provide a complete solution for those problems because of economic and environmental reasons or in metropolitan areas, simply due to lack of space. An alternative would be to develop an intelligent transportation system which more efficiently uses of the existing infrastructure [2 , 3] . Many approaches have emerged with the aim of limiting traffic congestion. One of approaches amounts to provide aggregate traffic information (such as velocity, the intensity of traffic volumes and lane occupancy rates) to transportation authorities, which in turn feed this information into advanced traffic management systems to take effective measures [4] .
Detection via video processing technique is one of the most attractive new technologies as it offers opportunities for performing substantially more complex tasks and providing more information [5] . For example, video based traffic monitoring systems are easy to install and upgrade, which offer the flexibility to redesign the system and their functionality by simply changing the system algorithms. Furthermore, they provide a rich information source for human understanding. Therefore, developing real-time traffic parameter surveillance systems based on video aiming to derive reliable and robust traffic state information has attracted a lot of attention during the past decade [6 , 7] .
At present traffic congestion can be identified by classification method. Anto B. Chan and Nuno Vasconecelos [8] proposed a framework for the classification of visual processes that are best modeled with spatio-temporal autoregressive models. The proposed framework combines the modeling power of a family of models known as dynamic textures and the generalization guarantees, for classification, of the support vector machine classifier. This combination is achieved by the derivation of a new probabilistic kernel based on the Kullback-Leibler divergence between Gaussian-Markov processes. The kernels cover a large variety of video classification problems, including the cases where classes can differ in both appearance and motion and the cases where appearance is similar for all classes and only motion is discriminant. In addition, Anto B. Chan and Nuno Vasconecelos [9] proposed to model the entire motion field as a dynamic texture, which is an auto-regressive stochastic process with both a spatial and a temporal component. They utilize this representation to realize better performance under environmental conditions such as variable lighting and shadows without the requirement of segmentation and tracking. A drawback of those approaches is the large computational load in fitting the model might make this method impractical for application to real-time traffic monitoring [10] . Based on [9] , Derpanis and Wildes [10] proposed a system, which achieves good performance, is amenable to computationally efficient realization. Alternatively, most extant approaches classify traffic videos using a combination of segmentation and tracking. In this paper, we focus on tracking all vehicles from entering to leaving on the road. Based on those requirements, the adopted methods must have the ability in track initiation and termination. Up to now, these methods can be grouped into two sets: vehicle recognition before tracking (VRT) methods and association based tracking (ABT) methods.
The VRT methods require recognition each vehicle on the road, then tracking them respectively. Sivaraman et al. [11] proposed a general active-learning framework for on-road vehicle recognition and tracking method. After each vehicle is recognized, an extended condensation algorithm [12] is utilized to track multiple vehicles with vehicle entering and leaving in the field of view. Xia et al. [13] proposed an improved CAMShift approach to realize multi-vehicle tracking using video Cameras. It uses a background subtraction method for automatically initializing vehicle tracking. Then an improved CAMShift method is used to track each potential vehicle. Rad and Jamzad [14] proposed a real-time classification and tracking of multiple vehicles method in highways. They utilize sequence frame differencing, moving edge detection, hybrid approach that combines features from both contour based and region based methods to achieve vehicle recognition. Then Kalman filter is used to track each recognized vehicle. Those methods can perform well once the initializing tracks are right. However, the drawback of [11] lies in it needs off-line training step. In real world traffic scenes, the vehicles passing by are not only with various types of color, shape and style, but also with different views according to their distance from the processor. It is difficult to construct a detector that can detect all vehicles in real scenes. That is to say, not all of vehicle passing by can be tracked by this method. The drawback of [13 , 14] are that they might track background which is detected by the detector as a vehicle. The reason behind them are that with the development of detection techniques, such as background subtraction algorithms [15 - 17] , deformed part models [18] , and histograms of sparse codes for object detection [19] , perfect detection is still a challenging work in real application. In other words, miss detection and false alarm are inevitable. If the initialization is imprecise or even a false alarm, the adopted tracker would be problematic.
The ABT methods usually first localize objects in each frame and then link these object hypotheses into trajectories without any initialization labeling. Semertzidis et al. [20] proposed a video sensor network for real-time traffic monitoring and surveillance. In tracking unit, multiple hypothesis tracking (MHT) [21 , 22] algorithm is implemented to realize vehicle recognizing and tracking from detection results. MHT has the mechanism of track initiation, maintenance and pruning. It allows the use of measurements that arrive in the future to resolve the uncertainty in the correct association of measurements and targets at present. However, the number of hypotheses in them grows exponentially over time and the required computational costs could render the implementation of MHT infeasible. Garcia et al. [23] proposed a visual feature tracking based on PHD filter. Comparing with the widely used MHT algorithm in multiple target tracking, Panta et al. [24] showed that the PHD filter outperforms MHT in tracking accuracy and computation. However, one limitation of PHD filter is that it is insensitive to birth target with unknown position [25 , 26] . In other words, it needs a large time delay to initialize track of the birth target or even cannot initialize track them as the birth targets moving across the scene. Detection guided GM-PHD (D-GMPHD) filter [25] is proposed to relieve the insensitive of the PHD filter to birth target. It uses measurement information forming many track hypotheses at first. Then it confirms or deletes the track by a score based track initiation method. At last, the confirmed track hypothesis is used to give the state of birth target to GM-PHD filter.
Based on the characteristics of the D-GMPHD filter stated above, we introduce it into an image based multiple vehicle tracking. Motion of vehicle usually obeys constant velocity (CV) model. This property is used to identify the vehicle out of noisy measurements. Subsequently, the state of the vehicle is transmitted to GM-PHD filter as the birth target. Through this way, vehicles on the road can be tracked more timely than GM-PHD filter without vehicle recognition aided. The vehicle early tracking ability enables the traffic congestion identification in time, which can provide advanced traffic management with necessary traffic information.
The rest of the paper is organized as follows. Section 2 gives brief reviews of GM-PHD filter and track score based track initiation method. Section 3 presents the overall system for traffic congestion identification. Experimental results of the proposed system on real scenes are provided in Section 4. For illustration, Section 4 also discusses the performance of the proposed method, as compared with the original version of GM-PHD filter and Boosted particle filter.
2. GM-PHD filter and track score based track initiation technique
- 2.1 GM-PHD filter
In a multiple-target environment, the number of targets changes due to targets appearing and disappearing. Usually, not all of the existing targets are detected by the sensor. Moreover, the sensor also receives clutter not originating from any target. The objective of multiple-target tracking is to jointly estimate, at each time step, the number of targets and their states from a sequence of noisy and cluttered observation sets. The random finite sets (RFS) approach to multi-target tracking is an emerging and promising alternative to traditional association-based methods [27 , 28] . In the RFS formulation, the collection of individual targets is treated as set-valued state, and the collection of individual observations is treated as a set-valued observation. Modeling set-valued states and set-valued observations as RFSs allows the problem of dynamically estimating multiple targets in the presence of clutter and association uncertainty to be cast in a Bayesian filtering framework. In order to alleviate the computational intractability in multi-target Bayes filter, PHD filter, a recursion that propagates the first-order statistical moment of the RFS of states in time is proposed [27] . Under the linear and Gaussian assumptions on the target dynamics and birth process, GM-PHD filter shows its excellent state estimation ability [28] .
For an RFS X on X with a probability distribution P , its first order moment is a nonnegative function v on X, called the intensity, such that for each region S X ,
PPT Slide
Lager Image
Let vk and v k | k -1 denote posterior intensity and predicted intensity, respectively. Under the following three assumptions A.1-A.3, it can be shown that vk can be propagated in time via the PHD recursion by Eqs. (2) and (3).
A.1: Each target evolves and generates observations independently of one another.
A.2: Clutter is Poisson and independent of target-originated measurements.
A.3: The predicted multiple-target RFS governed by predicted density is Poisson.
PPT Slide
Lager Image
PPT Slide
Lager Image
Where γk (·) is the intensity of the birth RFS at time k ; β k | k -1 (·| ζ ) is the intensity of the spawned RFS spawned at time k by a target with previous state ζ ; pS,k ( ζ ) is the probability that a target still exists at time k given that its previous state is ζ ; PD,k ( x ) is the probability of detection given a state x at time k ; κk (·) is the intensity of clutter RFS at time k .
Under the assumption of linear Gaussian multiple-target model, Vo and MA [28] derived a closed-form recursions for the weights and states of the constituent Gaussian components of the posterior intensity. It utilizes the state of each target to construct a Gaussian component with each measurement. If only one state of two targets is known, the state will be updated by two measurements respectively. If the derived state is in the vicinity of the target with unknown state, then the weight of the derived state will rise quickly. Otherwise, the weight of derived state will rise slowly. As the position of the birth target is not necessarily near the existing state of the target, GM-PHD filter is insensitive to birth target.
- 2.2 Score based track initiation method
In order to reduce the time needed to track all targets in the field of view, track initiation technique is used to boost the sensitivity of GM-PHD filter to birth target in the presence of spurious measurements. One of track initiation techniques named after detection guided GM-PHD (D-GMPHD) filter [25] is to construct or update track hypotheses with track score and use the track score to confirm, keep or delete each track hypothesis. The confirmed track hypothesis becomes the state of birth target to GM-PHD filter.
- 2.2.1 Track score in point target tracking
Track score L proposed by [29] is the log likelihood ratio of PT to PF . Where PT is the probability of the received data is a true target, PF is the probability of the received data is false alarm. L can be partitioned into a product of three terms, L 0 , LK and LS , which represent priori, kinematic and signal-related contributions, respectively. Given Κ scans of data, L ( K ) can be given by
PPT Slide
Lager Image
First, considering the kinematic term, assume a Gaussian distribution for the true target and a uniform distribution over the measurement volume, Vc , for the false target. Then,
PPT Slide
Lager Image
PPT Slide
Lager Image
Where M is the measurement dimension. S ( k ) is the measurement residual covariance matrix at time k , d ( k ) 2 is the normalized statistical distance for the measurement defined in terms of S ( k ) and the measurement residual vector
PPT Slide
Lager Image
at time k . F is the state transition matrix, H is the measurement matrix, P ( k –1) is the estimated process noise covariance at time k -1, Q is the processs noise covariance, R is the observation noise covariance.
Next, considering the signal-related term, if a detection is received at time k , then
PPT Slide
Lager Image
Using Eq. (5) and Eq. (7) gives a recursive form for the computation of the track score, defined by Eq. (4):
PPT Slide
Lager Image
Where
PPT Slide
Lager Image
- 2.2.2 Score-based track confirmation and deletion
Sequential probability ratio test (SPRT), which is strictly proved by Wald [30] that needs a much smaller number of observations than all competitors, is used to classify track hypotheses according to their track scores. Given the upper and lower thresholds η 2 and η 1 , the alternatives to confirm the track hypothesis, delete the track hypothesis, or continue the track hypothesis at time k are shown below:
PPT Slide
Lager Image
Following the standard SPRT formulation, the thresholds are defined as
PPT Slide
Lager Image
,
PPT Slide
Lager Image
. Where α and β are the probability of false track confirmation and true track deletion, respectively. Note that the SPRT threshold values are chosen on the assumption that the initial track score is zero. The initial track score is entirely based on the first observation in the track. Thus, there is no kinematic contribution to the initial track score. As the signal-related datum is that a detection (or a miss) occurred, then the L at track initiation is
PPT Slide
Lager Image
Where P 0 ( H 1 ) is the prior probability of measurement is true target. Thus, if an initial track score value is other than zero, this initial value should be add to η 1 and η 2 .
- 2.2.3 Vehicle recognition in image based tracking
In image based tracking, assume that there are Nk measurements at time k . The centroids of Nk measurements at time k are treated as the measurement set
PPT Slide
Lager Image
. Hence the kinematic term in image based tracking is the same as point targets tracking. In radar application, PD is defined the probability of a target that is declared, when target is in fact present. PFA is defined as the probability of a target that is declared, when target is in fact not present [31] . Hence, PD and PFA in image based tracking can be obtained by (12)
PPT Slide
Lager Image
Where the number of true positives ( TP ), counts the number of correctly detected foreground pixels. The number of false positives ( FP ) counts the number of background pixels incorrectly classified as foreground. The number of true negatives ( TN ) counts the number of correctly classified background pixels. The number of false negatives ( FN ) counts the number of foreground pixels incorrectly classified as background.
Generally speaking, motions of vehicles on the road match to CV model, while the spurious measurements match to the random motion model. We form a set of track hypotheses based on CV model and measurements. Instead of prior information of vehicle shape and color, we recognize vehicle from the noised measurements by setting H and F in (6) according to CV model. Before describing the forming and updating mode of a track hypothesis, we give the state transition and measurement models.
Assume that the state x ( k ) = [ px ( k ), py ( k ), x ( k ), y ( k )] of each vehicle at time k consists of position ( px ( k ), py ( k )) and velocity ( x ( k ), y ( k )), while the measurements of vehicles are a noisy version of the positions. Each vehicle follows a linear Gaussian dynamical model and the measurement model is linear Gaussian, i.e.,
PPT Slide
Lager Image
where F is the state transition matrix, w ( k -1) is the zero-mean Gaussian white process noise with standard deviation σw , H is the measurement matrix, and v ( k ) is the zero-mean Gaussian white observation noise with standard deviation σv. F, G, H, Q, R are given by
PPT Slide
Lager Image
where In and 0 n denote, respectively, the n × n identity and zero matrices, Δ is the sampling period.
Given track hypotheses set
PPT Slide
Lager Image
surviving from time k -1, measurements
PPT Slide
Lager Image
that use for updating track hypotheses set, measurements
PPT Slide
Lager Image
that do not use for updating track hypotheses set at time k , the vehicle recognition method can be described as follows:
Step 1. Update
PPT Slide
Lager Image
to form new surviving track hypotheses set
PPT Slide
Lager Image
by Eq.(15). Meanwhile update track score set
PPT Slide
Lager Image
to form a new surviving track score set
PPT Slide
Lager Image
according to Eq. (8).
PPT Slide
Lager Image
  • Ifthere iszi(k) that falls into the ellipsoidal gate[29]ofTj(k-1)
  • Tj(k-1) is updated according to Eq. (16)
PPT Slide
Lager Image
  • Else
  • Tj(k-1) is updated according to Eq. (17)
PPT Slide
Lager Image
  • EndIf
Step 2. Create new track hypotheses set
PPT Slide
Lager Image
from
PPT Slide
Lager Image
and
PPT Slide
Lager Image
by two-point differencing method [32] .
PPT Slide
Lager Image
at time k is formed according to Eq. (18). The corresponding new track score set
PPT Slide
Lager Image
.
PPT Slide
Lager Image
Step 3. Combining updated track hypotheses set and new formed track hypotheses set, track hypotheses set
PPT Slide
Lager Image
at time k can be obtained by Eq. (19). Combining updated track scores and new formed track scores, track scores
PPT Slide
Lager Image
at time k can be obtained by Eq.(20)
PPT Slide
Lager Image
PPT Slide
Lager Image
Step 4. Track hypotheses confirm, keep and delete according to Eq. (10). Only the track hypotheses kept are continuing tests. Together with track scores they are represented as follows:
PPT Slide
Lager Image
Step 5. The state and covariance of confirmed track hypotheses in step 4 are recognized as vehicles.
3. Traffic congestion identification method
In order to realize traffic congestion identification via video processing technique, the whole system mainly consists of four parts: a projective transformation, vehicle detection, multiple vehicle tracking, and traffic congestion identification. The block diagram of the proposed system in traffic congestion identification is shown in Fig. 1 .
PPT Slide
Lager Image
Traffic congestion identification system
Traffic activity collected by a camera is a projection of three-dimensional plane space scenes in two-dimensional plane space. It is necessary to obtain the actual movement of vehicle to estimate the velocity of the vehicle. Projective transformation [33 , 34] makes it possible to use the pixel distance to reflect the true distance in the real scene.
The vehicle detection method used here is ViBe [17] , followed by morphological filtering. In this way, the need of prior information about the shape, color and other features of vehicles is avoided.
Vehicle recognition by their motion model is utilized to assist GM-PHD filter in tracking new entering vehicle earlier.
As D-GMPHD filter does not have the mechanism of trajectory, ellipsoidal gate [29] is used to form the trajectory for each vehicle. Ellipsoidal gate is a technique to provide multiple-target data association with a reliable basis. The major axis is direction of estimated movement. The ellipsoidal validation gate is optimal for a linear observation model with additive noise. The validity of a measurement z ( k ) to state x ( k -1) is determined from its residual with the predicted observation. The residual vector is shown in Eq. (6). If measurement z ( k ) meets with Eq. (22), then z ( k ) is used to associate with x ( k -1).
PPT Slide
Lager Image
The threshold γ , for a measurement dimension M , can be computed efficiently since the d ( k ) 2 follows a chi-square probability density function. Thus, for the probability that c % of true association is accepted, γ is obtained from Eq. (23).
PPT Slide
Lager Image
Where
PPT Slide
Lager Image
is the incomplete gamma function.
The velocity of the i th vehicle vsi ( k ) at time k is obtained based on its trajectory, i.e., the Euclidean distance of estimated position between two continues frames times with frame rate Fn . It is shown in Eq. (25).
PPT Slide
Lager Image
The traffic congestion is identified by the avearge traffic speed in the detection area. In order to timely identify the traffic congestion, we only identify traffic congestion in the area, which is the exit road. Assume that there are N vehicles in the detection area and T 1 is the threshold to distinguish traffic congestion, the traffic congestion can be identified as follows:
PPT Slide
Lager Image
4. Experimental results and analysis
This section reports on a set of qualitative and quantitative experiments comparing D-GMPHD filter with GM-PHD filter and Boosted Particle Filter (BPF) [35] . BPF combines the strengths of Adaboost for object detection with those of mixture particle filters for multiple-object tracking. The combination is achieved by forming the proposal distribution for the particle filter from a mixture of the Adaboost detection in the current frame and the dynamic model predicted from the previous time step. As Adaboost detection needs many prior information about the vehicle, we use motion detection as an alternative. When one or more vehicles enter into the scene, they are detected by their motion and automatically initialized with an observation model. When one leaves the scene, the observation likelihood of it will drop rapidly. Hence the tracking will be terminated. The number of particles for each vehicle is 30. The size of all test images is resized to 320 × 240.
Four video sequences representing traffic with and without congestion are used to test the effectiveness of the proposed method in traffic congestion identification. The first three test videos collected from a single stationary traffic camera in Shanghai, China. The fourth test video comes from NGSIM database [36] . Some important parameters used in all experiments are presented in Table 1 . Where PD and PFA used in each test are obtained by Eq. (12). The measurement dimension of our tests is only the position information of the vehicle, so 2 is set. Other parameters can have a slight fluctuation. α and β determine the upper and lower thresholds for the track is confirmed, continued or deleted. In our tests, we do not hope that the clutter to be confirmed, α is set very low. We hope that the true track is confirmed instead of deleted. β is set low. More details can refer to [29] . σw is chosen according to the difference between the assumed motion model and the actual movement of vehicles. we set σw to 5 in both directions to ensure that the constant velocity model can adapt to the deceleration, acceleration and stop cases. σv is chosen according to the difference between measurements and actual positions of vehicles. σv can be set near to 0 so long as the detection results are good enough. However, this is often not true. σv is set to 15 in both directions here. As the tracked vehicle often exists in the next frame, PS is set very high.
The important parameter used in experiments
PPT Slide
Lager Image
The important parameter used in experiments
- 4.1 Qualitative evaluation
In this section, we use tracking results at some typical frames and trajectories of tracked vehicles for a period of time to show the tracking ability of three compared methods.
The first test is a traffic video without congestion. The surviving target is supposed at the top-center field of view for D-GMPHD filter and GM-PHD filter. The tracking results of the 24 th frame are shown in Fig. 2 with different colored stars indicating the position estimations of veihicle centers, yellow rectangles indicating the regions where the targets are detected, the red number near vehicle indicating the vehicle speed and the yellow words at the top-left of the figure indicating the number of vehicles on the traffic. We only interest in the area which is within the red polygon, i.e., the vehicles only can be detected in the red polygon. In order to timely identify traffic congestion, we choose the exit road, which is marked with the green polygon to identify traffic congestion. In addition, we mark the jam in red and no traffic jam in green nearby the traffic congestion identification area to indicate the road condition. The colored rectangles in Fig. 2 (c) indicate tracking targets. It is obvious to see that six targets are false. From Fig. 2 (a) and (b), we can see that the D-GMPHD filter gives more complete trajectories of tracked vehicles than GM-PHD filter. In addition, the trajectory of the last vehicle is initialized and tracked by the D-GMPHD filter, while it is still treated as clutter by GM-PHD filter. Fig. 2 (c) indicates that BPF not only tracks all vehicles on the road, but also tracks some background areas which are treated as vehicles.
PPT Slide
Lager Image
Test 1: Tracking results of three methods at the 24th frame
Fig. 3 gives the position measurement results of the vehicle detection method, the ground truth positions of vehicles and track results of D-GMPHD filter, GM-PHD filter and BPF. For the vehicle appears at the 5 th frame, the D-GMPHD filter can succeed in initializing track of the vehicle at the 9 th frame, while the GM-PHD filter initializes track of it at the 10 th frame. BPF initializes track of it at the 8 th frame. For the vehicle appears at the 7 th frame, D-GMPHD filter initializes track of it at the 9 th frame. while the GM-PHD filter initializes track of the vehicle at the 11 th frame. BPF initializes track of the vehicle at the 9 th frame. For the vehicle appears at the 15 th frame, D-GMPHD filter initializes track of it at the 18 th frame. While GM-PHD filter initializes track of the vehicle at the 33 th frame. BPF initializes track of the vehicle at the 17 th frame. In addition, there are false positives tracked by BPF. For the vehicle appears at the 28 th frame, D-GMPHD filter initializes track of it at the 32 th frame. However, GM-PHD filter fails to track of it. BPF initializes track of it at the 36 th frame with great error. For the vehicle appears at the 36 th frame, D-GMPHD filter initializes track of it at the 38 th frame. However, GM-PHD filter fails to track it. BPF only initializes track of it at the 40 th frame. In this case, all methods does not detect traffic congestion.
PPT Slide
Lager Image
Test 1: Tracking results of D-GMPHD, GM-PHD filter and BPF
The second test is a traffic video with congestion, where the surviving target is supposed at the top-center field of view. All of three methods identify the traffic congestion except BPF. Fig. 4 shows that GM-PHD still has a less complete trajectory than D-GMPHD filter, i.e., D-GMPHD filter initializes track of the white vehicle, while other methods still treat it as clutter. From Fig. 5 , we can see that for the trajectory that starts from the 222 th frame, D-GMPHD filter initializes track of it at the 225 th frame. GM-PHD filter only initializes track of it at the 237 th frame. BPF initializes track of it at the 227 th frame. For the vehicle that appears since the 242 th frame, D-GMPHD filter initializes track of it at the 245 th frame, while GM-PHD filter just initializes track of it at the 270 th frame. BPF initializes track of it at the 254 th frame with a large tracking error. For the vehicle that appears at the 252 th frame, D-GMPHD filter initializes track of it since the 255 th frame, while GM-PHD filter initializes track it since the 271 th frame. BPF initializes track of it since the 257 th frame.
PPT Slide
Lager Image
Test 2: tracking results of three methods at the 250th frame
PPT Slide
Lager Image
Test 2: tracking results D-GMPHD, GM-PHD filter and BPF
The third test is also a traffic video with congestion, where the surviving target is supposed at the center-bottom field of view. Fig. 6 shows that D-GMPHD filter has already identified the traffic congestion by the red words "jam" marked aside the parked vehicle, while GM-PHD filter still treats the vehilce as clutter. BPF produces many false targets while tracks the vehicle. According to Eq.(26), only D-GMPHD filter identifies the traffic congestion.Traffic congestion is identified based on the trajectory of it shown in Fig. 7 . From Fig. 7 , we can see that for the vehicle that appears at the 1 th frame, D-GMPHD filter initializes track of it at the 5 th frame, while GM-PHD filter just initializes track of it at the 20 th frame. BPF can succeed tracking the vehicle appears since the 1 th frame. However, it tracks the false positives since the 3 th frame as well.
PPT Slide
Lager Image
Test 3: tracking results of three methods at the 19th frame
PPT Slide
Lager Image
Test 3: tracking results of D-GMPHD, GM-PHD filter and BPF
The fourth test is a traffic video with congestion, where the surviving target is supposed at the center of the intersection. Fig. 8 shows that all three methods can identify the traffic congestion. However, GM-PHD filter only tracks one vehicle on the road, while D-GMPHD filter tracks all of the vehicles in the interested area of the road. In other words, GM-PHD filter can not track the vehicles with No. 144 and No. 159. In this case, BPF performs even better than other methods. It does not track the false positives that cannot be terminated. In addition, it initializes track of vehicles as the detection results are obtained. From Fig. 9 , we can see that there are clutter but no miss detection. The clutters appear as the vehicle detection method treats some parts of one vehicle to be background. As they are separated from each other, one vehicle will cause several measurements. For example, the detector treats the vehicle appeared from the 553 th frame as 2 and 3 vehicles at the 588 th , 591 th , 592 th and 594 th frames, respectively. However, the D-GMPHD filter still treats them as one and has a good tracking results. For the vehicle appears at the 553 th frame, D-GMPHD filter initializes track of it at 557 th frame. While GM-PHD filter initializes track of it at 594 th frame. BPF shows its advantage as it initializes track of it 1 frame later. For the vehicles appear at the 603 th and 635 th frame, GM-PHD filter fails to initialize track of them until the 649 th frame. BPF can initialize track of them once they appear. For the vehicle appears at the 603 th frame, the D-GMPHD filter initializes track of it at the 604 th frame. For the vehicle appears at the 635 th frame, the D-GMPHD filter initializes track of it at the 638 th frame.
PPT Slide
Lager Image
Test 4: tracking results of three methods at the 649th frame
PPT Slide
Lager Image
Test 4: tracking results of D-GMPHD, GM-PHD filter and BPF
- 4.2 Quantitative evaluations
In this section, the overall performances for all three methods are summarized by early tracking ability, performance evaluation and time complexity analysis.
- 4.2.1 Early tracking ability
The quantitative comparison of D-GMPHD filter with GM-PHD filter and BPF in track initiation of each vehicle is given in Table 2 . In each test, the values indicate the frames needed to initialize tracking of the vehicles after they appear. From Table 2 , we can see that GM-PHD filter needs more time delay in tracking the birth target than D-GMPHD filter. The biggest difference between D-GMPHD filter and GM-PHD filter can reach to 37 frames. In test 2 the biggest difference between them is 24 frames, considered the frame rate of this video is 8 frames/second. Hence D-GMPHD filter can even obtain 3 seconds earlier than GM-PHD filter in tracking vehicles in the traffic. That will leave more time for advanced traffic management systems to take effective action. From Table 2 , BPF performs better in some tests. The reason behind this is that BPF treats detection results, which are not overlapped with existing tracking, directly as new entering vehicles. The result why it does not perform well in all tests is that some vehicles have appearance changes inter-frame, which cause the observation likelihood of them drop rapidly. The track can persist only when the appearance of the vehicle is stable.
Frames needed by three methods to track the vehicle after it appears
PPT Slide
Lager Image
Frames needed by three methods to track the vehicle after it appears
- 4.2.2 Performance evaluation
In order to give a more convinced comparison, two intuitive metrics [37] described below are used.
(1) The multiple object tracking precision (MOTP):
PPT Slide
Lager Image
Where ck is the number of matches found for time k . A matching is defined as the distance between the object and tracker hypothesis within the threshold T . The distance
PPT Slide
Lager Image
between the object and its corresponding hypothesis is calculated based on each of these matches. The MOTP is the total error in estimated position for matched object-hypothesis pairs over all frames, averaged by the total number of matches made. It shows the ability of the tracker to estimate precise object positions, independent of its skill at recognizing object configurations, keeping consistent trajectories, and so forth. The performance will get better with the value decreases.
(2) The multiple object tracking accuracy (MOTA):
PPT Slide
Lager Image
Where mk , fpk , gk and mmek are the number of misses, of false positives, and of mismatches, respectively, for time k . The MOTA accounts for all object configuration errors made by the tracker, false positives, mismatches, over all frames. It gives a very intuitive measure of the tracker’s performance at detecting objects and keeping their trajectories, independent of the precision with which the object locations are estimated. The performance will get better with the value increases.
The MOT comparison among the three methods is shown in Table 3 . Under MOTP metric, GM-PHD filter performs best except for test 4. However, this superiority of GM-PHD filter to D-GMPHD filter can be ignored as the maximum difference between them cannot reach to 1 pixel for all tests. Under MOTA metric, D-GMPHD filter performs best as the minimum value for all tests is still above 72%. This is all due to the earlier tracking ability of D-GMPHD filter. The values of other methods except BPF at test 4, are all below 60%. Though BPF shows best in test 4, the superiority of BPF to D-GMPHD filter is not obvious as the difference between them only 3%. As BPF treats every measurement that is not overlapped with existing tracking as new trajectory, it performs well in test 4 in which fewer noisy measurements and appearance changes exist. Overall, the superiority of D-GMPHD filter is the best choice for multiple vehicle tracking on the road among the comparison methods.
Evaluating performance (MOTP in pixels)
PPT Slide
Lager Image
Evaluating performance (MOTP in pixels)
In order to further improve the D-GMPHD filter in tracking with the big miss detection or large number of vehicle tracking, the GMPHD filter in the D-GMPHD filter can be seamlessly replaced by the GM-CPHD filter. The PHD recursion propagates cardinality information with only a single parameter (the mean of the cardinality distribution), and thus, it effectively approximates the cardinality distribution by a Poisson distribution. Since the mean and variance of a Poisson distribution are equal, when the number of targets present is high, the PHD filter estimates the cardinality with a correspondingly high variance. To address this problem, Cardinalized PHD is proposed by Mahler [38] and implemented by [39 , 40] . However, PHD has the computational complexity of O ( mn ), while CPHD has the computational complexity of O ( m 3 n ) [41] . n is the number of targets in the scene and m is the number of observations in the current measurement set. Hence if the tracking performance of GM-PHD filter can meet with the requirement of the application, there is no need to use GM-CPHD filter, and vice versa.
- 4.2.3 Time complexity analysis
The proposed system contains four parts: a projective transformation, vehicle detection, multiple target tracking and traffic congestion identification. Processing time mainly costs on vehicle detection and multiple target tracking.
  • (1) Vehicle detection results are achieved by ViBe[17]. Barnich et al. state that it can nearly reach 200 frames per second for 640 × 480 pixels wide images with three color channels on the platform (2.67 GHZ Core i7 CPU, 6GB of RAM, C implementation).
  • (2)Table 4shows frame rate on a Pentium Dual-Core CPU @ 2.6GHZ with Matlab implementation. The size of all test images is normalized 320 × 240 to pixels for fair comparing with BPF. D-GMPHD filter can reach to 6.67 frames/second at least.
  • (3)
Frame rate (average frames per second) of three methods for all tests
PPT Slide
Lager Image
Frame rate (average frames per second) of three methods for all tests
5. Conclusion
This work presents a real-time method for video based traffic congestion identification. The proposed technique includes projective transformation, vehicle detection, multiple vehicle tracking, and traffic congestion identification. After all vehicles are detected on the road, the adopted tracking method should have the ability in tracking them from their appearance to their disappearance. GM-PHD filter can track an unknown and time-varying number of targets in the presence of data association uncertainty, clutter, noise, and detection uncertainty. However, GM-PHD filter is intensive to birth target with unknown position. Vehicle recognition based on its movement model is used to relieve the shortage of GM-PHD filter at first. Then GM-PHD filter and ellipsoidal gate are used to form the trajectory of each vehicle. And then the speed of each vehicle can be obtained through the trajectory of each vehicle. At last the traffic congestion can be identified by the average traffic speed. Experimental results on real data indicate the advantage of the proposed method in giving a more complete trajectory of each vehicle in the field of view. The identified traffic congestion can provide the advanced traffic management systems with earlier and reliable basis to take further measures.
BIO
Xiaoyu Zhang is a PhD Candidate in the School of Aeronautics and Astronautics at Shanghai Jiao Tong Univeristy, Shanghai, China. He received his BE degree in Metallic Materials Engineering and MS degree in Detection Technology and Automatic Equipment from Hebei Unviersity of Science and Technology, Shijia Zhuang, China, in 2006 and 2009, respectively. His research interests include object detection, visual tracking.
Shiqiang Hu received his BS degree from Hebei University of Science and Technology, Shijia Zhuang, China and MS and PhD degrees from Beijing Institute of Technology, Beijing, China, in 1991, 1998, and 2002, respectively, all in Electronics and Information Technology. Currently, he is a professor at the School of Aeronautics and Astronautics, Shanghai JiaoTong University, China. His research interests include intelligent information processing, information fusion, nonlinear filtering, and aerospace control.
Huanlong Zhang received his BS degree in Automation and MS degree in Computer Application from Henan University, Kaifeng, China, in 2004, 2007, respectively. He received his PhD degree in Control Theory and Control Engineering from Shanghai Jiao Tong University, Shanghai, China, in 2015. Currently, he is a techer lecturer in Zhengzhou University of Light Industry. His research interests include pattern recognition and computer vision.
Xing Hu is a Ph. D Candidate in the School of Aeronautics and Astronautics at Shanghai Jiao Tong Univeristy. He received his BE degree in computer science and technology from Anhui University of Technology, Maanshan, China, in 2007. He received his MS degree in computer science and technology from Guangxi Normal University, Guilin, China, in 2010. His research interests include video anomaly detection and video scene analysis.
References
Papageorgiou M. , Kotsialos A. 2002 "Freeway ramp metering: an overview," IEEE Transactions on Intelligent Transportation Systems 3 (4) 271 - 281    DOI : 10.1109/TITS.2002.806803
Baskar L. D. , Schutter B. , Hellendoorn J. , Papp Z. 2011 "Traffic control and intelligent vehicle highway systems: A survey," IET Intelligent Transport Systems 5 (1) 38 - 52    DOI : 10.1049/iet-its.2009.0001
Coifman B. , Beymer D. , McLauchlan P. , Malik J. 1998 "A real-time computer vision system for vehicle tracking and traffic surveillance," Transportation Research Part C: Emerging Technologies 6 (4) 271 - 288    DOI : 10.1016/S0968-090X(98)00019-9
Marfia G. , Roccetti M. 2011 "Vehicular congestion detection and short-term forecasting: a new model with results," IEEE Transactions on Vehicular Technology 60 (7) 2936 - 2948    DOI : 10.1109/TVT.2011.2158866
Wu B.-F. , Juang J.-H. 2011 "Real-time vehicle detector with dynamic segmentation and rule-based tracking reasoning for complex traffic conditions," KSII Transactions on Internet and Information Systems (TIIS) 5 (12) 2355 - 2373
Wang G. , Xiao D. , Gu J. "Review on vehicle detection based on video for traffic surveillance," in Proc. of IEEE Conference on Automation and Logistics September 1-3, 2008 2961 - 2966
Wang Y. , Hu S. 2014 "Chaotic Features for Traffic Video Classification," KSII Transactions on Internet and Information Systems (TIIS) 8 (8) 2833 - 2850
Chan A. B. , Vasconcelos N. "Probabilistic kernels for the classification of auto-regressive visual processes," in Proc. of IEEE Conference on Computer Vision and Pattern Recognition June 20-25, 2005 846 - 851
Chan A. B. , Vasconcelos N. "Classification and retrieval of traffic video using auto-regressive stochastic processes," in Proc. of IEEE Intelligent Vehicles Symposium June 6-8, 2005 771 - 776
Derpanis K. G. , Wildes R. P. "Classification of traffic video based on a spatiotemporal orientation analysis," in Proc. of IEEE Workshop on Applications of Computer Vision January 5-7, 2011 606 - 613
Sivaraman S. , Trivedi M. M. 2010 "A general active-learning framework for on-road vehicle recognition and tracking," IEEE Transactions on Intelligent Transportation Systems 11 (2) 267 - 276    DOI : 10.1109/TITS.2010.2040177
Koller-Meier E. B. , Ade F. 2001 "Tracking multiple objects using the condensation algorithm," Robotics and Autonomous Systems 34 (2) 93 - 105    DOI : 10.1016/S0921-8890(00)00114-7
Xia J. , Rao W. , Huang W. , Lu Z. 2013 "Automatic multi-vehicle tracking using video cameras: An improved CAMShift approach," KSCE Journal of Civil Engineering 17 (6) 1462 - 1470    DOI : 10.1007/s12205-013-0263-7
Rad R. , Jamzad M. 2005 "Real time classification and tracking of multiple vehicles in highways," Pattern Recognition Letters 26 (10) 1597 - 1607    DOI : 10.1016/j.patrec.2005.01.010
Stauffer C. , Grimson W. E. L. "Adaptive background mixture models for real-time tracking," in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition June 23-25,1999 246 - 252
Kim K. , Chalidabhongse T. H. , Harwood D. , Davis L. 2005 "Real-time foreground-background segmentation using codebook model," Real-time imaging 11 (3) 172 - 185    DOI : 10.1016/j.rti.2004.12.004
Barnich O. , Van Droogenbroeck M. 2011 "ViBe: A universal background subtraction algorithm for video sequences," IEEE Transactions on Image Processing 20 (6) 1709 - 1724    DOI : 10.1109/TIP.2010.2101613
Felzenszwalb P. F. , Girshick R. B. , McAllester D. , Ramanan D. 2010 "Object detection with discriminatively trained part-based models," Pattern Analysis and Machine Intelligence, IEEE Transactions on 32 (9) 1627 - 1645    DOI : 10.1109/TPAMI.2009.167
Ren X. , Ramanan D. "Histograms of sparse codes for object detection," in Proc. of Conference on Computer Vision and Pattern Recognition Portland, Oregon, USA 2013 3246 - 3253
Semertzidis T. , Dimitropoulos K. , Koutsia A. , Grammalidis N. 2010 "Video sensor network for real-time traffic monitoring and surveillance," IET Intelligent Transport Systems 4 (2) 103 - 112    DOI : 10.1049/iet-its.2008.0092
Reid D. 1979 "An algorithm for tracking multiple targets," IEEE Transactions on Automatic Control 24 (6) 843 - 854    DOI : 10.1109/TAC.1979.1102177
Blackman S. S. 2004 "Multiple hypothesis tracking for multiple target tracking," IEEE Aerospace and Electronic Systems Magazine 19 (1) 5 - 18    DOI : 10.1109/MAES.2004.1263228
Garcia F. , Prioletti A. , Cerri P. , Broggi A. , de la Escalera A. , Armingol J. "Visual feature tracking based on PHD filter for vehicle detection," in Proc. of IEEE Conference on Information Fusion July 7-10, 2014 1 - 6
Panta K. , Vo B.-N. , Singh S. , Doucet A. "Probability hypothesis density filter versus multiple hypothesis tracking," in Proc. of Defense and Security April 12, 2004 284 - 295
Wang Y. , Jing Z. , Hu S. , Wu J. 2012 "Detection-guided multi-target Bayesian filter," Signal Processing 564 - 574    DOI : 10.1016/j.sigpro.2011.09.002
Ristic B. , Clark D. , Vo B.-N. , Vo B.-T. 2012 "Adaptive target birth intensity for phd and cphd filters," IEEE Transactions on Aerospace and Electronic Systems 48 (2) 1656 - 1668    DOI : 10.1109/TAES.2012.6178085
Mahler R. P. S. 2003 "Multitarget Bayes filtering via first-order multitarget moments," IEEE Transactions on Aerospace and Electronic Systems 39 (4) 1152 - 1178    DOI : 10.1109/TAES.2003.1261119
Vo B. N. , Ma W. K. 2006 "The Gaussian mixture probability hypothesis density filter," Signal Processing, IEEE Transactions on 54 (11) 4091 - 4104    DOI : 10.1109/TSP.2006.881190
Blackman S. S. , Popoli R. 1999 Design and analysis of modern tracking systems. Artech House Boston
Wald A. 1945 "Sequential tests of statistical hypotheses," The Annals of Mathematical Statistics 117 - 186    DOI : 10.1214/aoms/1177731118
Echard J. D. 1991 "Estimation of radar detection and false alarm probability," IEEE Transactions on Aerospace and Electronic Systems 27 (2) 255 - 260    DOI : 10.1109/7.78300
Bar-Shalom Y. , Li X. R. , Kirubarajan T. 2001 Estimation with applications to tracking and navigation: theory algorithms and software. Wiley-Interscience Canada
Carlbom I. , Paciorek J. 1978 "Planar geometric projections and viewing transformations," ACM Computing Surveys (CSUR) 10 (4) 465 - 502    DOI : 10.1145/356744.356750
Sonka M. , Hlavac V. , Boyle R. 2008 Image processing, analysis, and machine vision 3rd ed. Thomson Toronto
Okuma K. , Taleghani A. , De Freitas N. , Little J. J. , Lowe D. G. "A boosted particle filter: Multitarget detection and tracking," in Proc. of European Conference on Computer Vision May 11-14, 2004 28 - 39
NGSIM NGSIM [Online]. Available:
Keni B. , Rainer S. 2008 "Evaluating multiple object tracking performance: the CLEAR MOT metrics," EURASIP Journal on Image and Video Processing 2008 1 - 10
Mahler R. 2007 "PHD filters of higher order in target number," IEEE Transactions on Aerospace and Electronic Systems 43 (4) 1523 - 1543    DOI : 10.1109/TAES.2007.4441756
Vo B. T. , Vo B. N. , Cantoni A. 2007 "Analytic implementations of the cardinalized probability hypothesis density filter," IEEE Transactions on Signal Processing 55 (7) 3553 - 3567    DOI : 10.1109/TSP.2007.894241
Ulmke M. , Erdinc O. , Willett P. "Gaussian mixture cardinalized PHD filter for ground moving target tracking," in Proc. of IEEE Conference on Information Fusion 2007 1 - 8
Mahler R. P. 2007 Statistical multisource-multitarget information fusion Artech House Boston vol. 685