Background subtraction is the first processing stage in video surveillance. It is a general term for a process which aims to separate foreground objects from a background. The goal is to construct and maintain a statistical representation of the scene that the camera sees. The output of background subtraction will be an input to a higherlevel process. Background subtraction under dynamic environment in the video sequences is one such complex task. It is an important research topic in image analysis and computer vision domains. This work deals background modeling based on modified adaptive Gaussian mixture model (GMM) with three temporal differencing (TTD) method in dynamic environment. The results of background subtraction on several sequences in various testing environments show that the proposed method is efficient and robust for the dynamic environment and achieves good accuracy.
1. Introduction
Background modeling is often used in different applications to model the background and then detect the moving objects in video surveillance, optical motion capture, human computer interaction, multimedia, and content based video coding. One of the most popular methods for extracting moving objects from a video frame is background subtraction. Background subtraction is one of the first lowlevel processing operations in any intelligent video surveillance system. It is the operation of identifying and segmenting moving objects in video frames by separating the still areas, called the background, from the moving objects, called foreground. Any background subtraction algorithm first constructs a representation of the background called the background model and then, each subsequent frame is subtracted from this background model to give the resulting foreground. Adaptive background subtraction algorithms will also update the model along the sequence to compensate for eventual changes in the background. There are many challenges in background subtraction like dynamic background, illumination changes, motion changes, high frequency background objects, shadow, camouflage, nonstationary background, etc. To take into account the problems and challenges in background subtraction, many background modeling methods have been developed.
The background modeling methods can be classified in the different categories; basic background modeling, statistical background modeling, fuzzy background modeling and background estimation. All these modeling approaches are used in background subtraction context which presents the following steps like background modeling, initialization, maintenance and foreground detection. Many works have been proposed in the literature for background subtraction over the past decades as a solution of a reliable and efficient method. Mixture of Gaussians is a widely used approach for background modeling to detect moving objects in the video sequences taken from static cameras. A Gaussian mixture model (GMM) was proposed by Friedman and Russell
[1]
and it was refined for realtime tracking by Stauffer and Grimson
[2

4]
. Numerous improvements of the original method developed by Stauffer and Grimson have been proposed over the recent years. Friedman and Russel
[1]
proposed to model each background pixel using a mixture of three Gaussians corresponding to road, vehicle and shadows in a traffic surveillance system. Then, the Gaussians are manually labeled. The darkest component is labeled as shadow; in the remaining two components, the one with the largest variance is labeled as vehicle and the other one as road. This remains fixed for all the process giving lack of adaptation to changes over time. Then for the foreground detection, each pixel is compared with each Gaussian and is classified according to it corresponding Gaussian. The maintenance is made using an incremental expectationmaximization (EM) algorithm for real time consideration. An important problem of the EM algorithm is that it can converge to a poor local maximum if not properly initialized. Stauffer and Grimson generalized this concept by modeling the recent history of the color features of each pixel by a mixture of K Gaussians
[3]
. McKenna et al. proposed a derivation for training mixture models to track video objects. However, their method required memory proportional to the temporal window size, making it unsuitable for applications where mixture modeling is pixelbased and over a long temporal window
[5]
. Sato and Ishii proposed a mechanism for adding, deleting, and splitting Gaussians to handle dynamic distributions similar to Gaussian reassignments and suggested temporal adaptation could be achieved by manipulating a discount factor. However, the proposed method is unclear how to define the schedule of the discount factor to implement the behavior required in surveillance applications
[6]
.
The adaptive GMM has few problems under special conditions. It often suffers from a trade off between robustness to background changes and sensitivity to foreground abnormalities. It is inefficient in managing the trade off for various surveillance scenarios. When the likelihood factor ρ has a small value, that causes slow parameter adjustment and that lead to low precision at primary frames. The GMM does not distinguish between background elements and their shadows. To overcome these problems, modification in adaptive GMM is necessary. This work deals background subtraction based on modified adaptive GMM with TTD. The results of background subtraction on several sequences in various testing environments show that the proposed method is efficient and robust for the dynamic environment and achieves good accuracy. The related works are described in Section 2. The proposed method is applied to background subtraction is described in Section 3 and modified adaptive GMM with TTD is described in Section 4. Experimental results are described in Section 5 and the conclusions are drawn in Section 6.
2. Related Works
For foreground object detection in video surveillance system, background modeling plays a vital role. A simple method was proposed to represent the gray level or color intensity of each pixel in the image as an independent and unimodal distribution
[7

10]
. Hung et al proposed a region based background modeling using partial directed Hausdorff distance and MRFs
[11
,
12]
. Zivkovic
et al
proposed equations to constantly update the parameters and selected an appropriate number of components for each pixel
[13]
. Klare
et al
proposed method that use many image features with intensity to enhance background modeling techniques and a clear improvement had achieved compared to using color intensities in varying illuminations
[14]
. Claudio Rosito Jung has proposed a background subtraction algorithm with shadow identification suited for monochromatic video sequences using luminance information. The background image was modeled using robust statistical descriptors, and a noise estimate was obtained. Foreground pixels were extracted, and a statistical approach combined with geometrical constraints was adopted to detect and remove shadows
[15]
.
Lin
et al
have proposed a background subtraction method for grayscale video, motivated by criteria leading to what a general and reasonable background model should be, and realized by a practical classification technique
[16]
. But the proposed methods of Jung and Lin et al are efficient for grayscale video only and not suitable for RGB video sequences. The main difficulty in designing a robust background subtraction algorithm is the selection of a detection threshold. McHugh
et al
have proposed a background subtraction method to adapt this threshold to varying video statistics. They have suggested a foreground model based on small spatial neighbourhood to improve discrimination sensitivity and a Markov model to change labels to improve spatial coherence of the detections
[17]
. The motionbased perceptual grouping, a spatiotemporal saliency algorithm was proposed to perform background subtraction and it is applicable to scenes with highly dynamic backgrounds. Comparison of the algorithm with other techniques shows that the cost of a prohibitive processing time is long. It takes several seconds to processes a single frame. That’s why this algorithm is unsuitable for realtime applications
[18]
. Saliency detection techniques have been recently employed to detect moving objects in video sequences while effectively suppressing irrelevant backgrounds
[19]
,
[20]
. But these methods often fail to adapt quickly to various background motions.
Chiu
et al
have suggested an algorithm to extract primary color backgrounds from surveillance videos using a probabilitybased background extraction algorithm. The intrusive objects can be segmented by a robust object segmentation algorithm that investigates the threshold values of the background subtraction from the prior frame to obtain good quality while minimizing execution time and maximizing detection accuracy
[21]
. Cheng
et al
examined the problem of segmenting foreground objects in live video when background scene textures change over time. They proposed a series of discriminative online learning algorithms with kernels for modeling the spatialtemporal characteristics of the background. By exploiting the parallel nature of the proposed algorithms, they developed an implementation that can run efficiently on the highly parallel Graphics Processing Unit (GPU)
[22]
. A robust visionbased system for vehicle tracking and classification was proposed and tested with different kinds of weather conditions including rainy and sunny days by Luis Unzueta
[23]
.
Hati
et al
proposed an intensity range based object detection scheme for videos with fixed background and static cameras. They suggested two different algorithms; the first one models the background from initial few frames and the second algorithm extracts the objects based on local thresholding
[24]
. A multilayer codebookbased background subtraction (MCBS) model is proposed by Guo
et al
for video sequences to detect moving objects. Combining the multilayer blockbased strategy and the adaptive feature extraction from blocks of various sizes, the proposed method can remove most of the dynamic background. The pixelbased classification is adopted for refining the results from the blockbased background subtraction, which can further classify pixels as foreground, shadows, and highlights
[25]
. The classic Gaussian mixture model is based on the statistical information of every pixel and it is not robust to light changes. The method combining video coding and the Gaussian mixture model together is proposed by Huang
et al
. They use intra mode and motion vectors to find the foreground macro block, then add one overhead flag in the compressed video to indicate it. In the decoder, they decode possible foreground areas and detect moving objects in these areas
[26]
.
3. Background Subtraction
Moving object detection can be achieved in the video surveillance by building a background model. Any important change in an image region from the background model signifies a moving object.The most common paradigm for performing background subtraction is to build an explicit model of the background. Foreground objects are then detected by calculating the difference between the current frame and this background model. A binary foreground mask will be made by classifying the pixel with association of absolute distinction higher than a threshold as being from a foreground object. The basic plan is to classify pixel as background or foreground by finding the distinction between the background image and the current image. In the view of the importance of realtime computations in the surveillance systems, the background subtraction method is so significant.
The proposed method is explored in
Fig. 1
. Input video is first converted into frames during the processing. The input video frame is partitioned off into little blocks. A classification decision is made in each block whether the block has changes or not with respect to the background. At this point, the invariant features from the image blocks are extracted. Then the images blocks are classified into two categories as foreground and background based on these features. The background model and the classifier should be adaptive when the background is dynamic. Foreground objects can be obtained by comparing the background image with the current video frame. By applying this approach to each frame in a video sequence tracking of any moving objects can be effectively achieved. This technique is commonly known as background subtraction or change detection. It is a very useful technique in video surveillance applications. Under noisy conditions the object has the same color as the background. It is difficult to separate foreground objects and background. While processing the image, the noise can be removed by using suitable filters. In our project we used median filter. Background modeling by using GMM with TTD is explained in Section 4.
Proposed Method
4. Gaussian Mixture Model with TTD for Background Modeling
We use modified adaptive Gaussian mixture model (GMM) with Three Temporal Differencing (TTD) method for background modeling. A Gaussian mixture model was proposed by Friedman and Russell
[16]
and it was refined for realtime tracking by Stauffer and Grimson
[17

19]
. Adaptive GMM approach can handle challenging situations such as sudden or gradual illumination changes, slow lighting changes, longterm scene changes periodic motions from a cluttered background and repetitive motions in the clutter, etc. In this method a different threshold is selected for each pixel. These pixelwise thresholds are adapting by time. Objects are allowed to become part of the background without destroying the existing background model. By using median filtering the Gaussians are initialized. Assuming that the background is more likely to appear in a scene, we can use the median of the previous n frames as the background model.
Where, I(u,v,t)image at time t, B(u,v,t) background image at time t, and i={0,1,2……n1}
By applying a threshold to the absolute difference the foreground mask can be generated.
The values of a particular pixel are modeled as a mixture of adaptive Gaussians. Gaussians are evaluated in all the iteration to determine which ones are mostly likely to correspond to the background. Pixels that do not contest with the background Gaussians are classified as foreground. Foreground pixels are clustered using 2D connected component analysis. At any time t, what is known about a specific pixel, (u
_{0,,}
v
_{0}
), is its record:
This record is modeled by a mixture of K Gaussian distributions. The probability of an observed pixel with intensity value u
_{t}
at time t is modeled as:
Where, ω
_{i,t}
is the i
^{th}
Gaussian mixture weight and
ƞ
(U
_{t}
μ
_{i,t}
,∑
_{i,t}
) the component Gaussian densities stated by:
Kmeans approximation is used to update the Gaussians. If a new pixel value, U
_{t+1}
, can be matched to one of the existing Gaussians (within 2.5σ), that Gaussian's μ
_{i,t+1}
and
are updated as follows:
Where,
and α is a learning rate.
Prior weights of all Gaussians are adjusted as follows:
Where, 𝓜
_{i,t+1}
= 1  for the matching Gaussian and 𝓜
_{i,t+1}
= 0  for all the others.
If U
_{t+1}
do not match to any of the K existing Gaussians, the least probably distribution is replaced with a new one. New distribution has μ
_{t+1}
= U
_{t+1}
a high variance and a low prior weight. The Gaussians with the most supporting evidence and least variance should correspond to the background. With mean μ
_{i,t}
and covariance matrix Σ
_{i,t}
=
the weight parameter ω
_{i,t}
determines the time duration that i
^{th}
distribution exists in the background. The weights are positive and their sum is one. The K distributions are ordered based on the fitness parameter ω
_{k}
/σ
_{k}
and the number of active Gaussian components is calculated by assuming that the background includes B colors with the most probability. The Gaussians are ordered by the value of ω/σ. Then simply the first B distributions are selected as the background model.
Where, T is the minimum prior probability that the background is in the scene. Background subtraction process is performed by marking pixels that are more than 2.5σ away from any of the B distributions as the foreground moving objects.
But the adaptive GMM has few problems under special conditions. First problem, if the initial pixel value belongs to the foreground, there would be only one distribution with the weights equal unity. If next pixels belong to the background with the same color it takes log
_{1α}
(T) frames until adding this pixel to the background. Second problem, when the likelihood factor ρ has a small value that causes slow parameter adjustment; leading to low precision at primary frames. The third problem is that this method does not distinguish between background elements and their shadows. To overcome these problems, updating equations of the distribution parameters change as follows.
Where,
new weights,
mean at time t,
covariance at time and n
_{rs}
number of recent frames.
To model significant variance in the background, pixel intensity is modelled by a mixture of K Gaussian distributions. In most existing papers proposed for background subtraction K is a fixed number from 3 to 7. In our approach K is not a fixed number. The flexible modified adaptive GMM is proposed in this paper for background subtraction. To improve on the illumination changes and noise as the Gaussian mixture model does not vary greatly in this respect, we propose an algorithm based on linking GMM with TTD to meet good accuracy in background subtraction.
Three temporal differencing is the principle used in time subtracting image pixels continuously. The traditional method will cause the internal cavity temporal differencing case, thus the moving object shape is not complete for the follow up tracking and identifying moving objects will not be able to provide complete information. The traditional image subtraction method is determined by subtracting the previous image from the current image to obtain motion information. We use three consecutive image subtraction methods in this paper. If the three successive images were I
_{n1}
(u,v), I
_{n}
(u,v) and I
_{n+1}
(u,v),
To obtain the
I_{c}
(
u
,
v
) , we give a threshold; this threshold can remove noise, and can be set for different light conditions.
5. Experimental Results
We mainly evaluated the proposed background subtraction using the following public video datasets
Waving Trees
,
Fountain
,
Campus
, and
Water Surface
. True positive (TP) for a correctly classified foreground pixel, true negative (TN) for a correctly classified background pixel, false positive (FP) for a background pixel that was incorrectly classified as foreground and false negative (FN) for a foreground pixel that was incorrectly classified as background were calculated for each pixel in background subtraction method. After the classification of every pixel into one of those four groups had been done, sensitivity, precision, F1 and similarity were calculated by using Eqs. (17), (18), (19) and (20). Sensitivity (Recall) measures the section of actual positives which are correctly identified. Precision is used to describe and measure the estimate or predict. Recall, also known as detection rate, gives the percentage of detected true positives as compared to the total number of true positives in the ground truth where is the total number of true positives in the ground truth. Moreover, we considered F1 that is the weighted harmonic mean of precision and recall.
The experiment was conducted in this way for background subtraction by using different methods and our proposed method. Experimental results of the proposed method for different dataset test video sequences are represented in
Figs. 2
,
3
,
4
and
5
. The quantitative results of the proposed method with other background subtraction are shown in
Table 1
,
2
,
3
and
4
. The experimental results on several sequences in various environments shows that proposed method achieves over good accuracy and the proposed method is efficient and robust for the dynamic environment.
Results with Waving Trees test Sequences: (a) Background reference image; (b) Current frame; (c) Proposed method; (d) Ground truth image
Results with Fountain test Sequences: (a) Background reference image; (b) Current frame; (c) Proposed method; (d) Ground truth image
Results with Campus test Sequences (a) Background reference image (b) Current frame (c) Proposed method (d) Ground truth image
Results with Water Surface test Sequences (a) Background reference image (b) Current frame (c) Proposed method (d) Ground truth image
Performance comparisons using test sequenceWaving Trees
Performance comparisons using test sequence Waving Trees
Performance Comparisons Using Test SequenceFountain
Performance Comparisons Using Test Sequence Fountain
Performance Comparisons Using Test SequenceCampus
Performance Comparisons Using Test Sequence Campus
Performance Comparisons Using Test SequenceWater Surface
Performance Comparisons Using Test Sequence Water Surface
6. Conclusion
The modified adaptive GMM with TTD method is applied for detecting foreground objects in video sequences which can give a good and stable detection. Experimental results show that our algorithm can lead to a better background subtraction. The results on several sequences show that this algorithm is efficient and robust for the dynamic environment with new objects in it. When comparing background subtraction with traditional method in real world, the experimental result shows that proposed approach is more accurate than other classical algorithms. For future work, we intend to evaluate new background models with updating capacity, which will allow the system to adapt to luminosity changes or sudden scene configuration changes, shadow and camouflage. We intend to implement proposed background subtraction method in FPGA.
BIO
A. Niranjil Kumar received M.E. degree specialized in Applied Electronics from Anna University, Chennai, Tamil Nadu, India, in 2006. He is working as Assistant Professor in the department of ECE, P.S.R.Rengasamy College of Engineering for Women, Sivakasi, Tamil Nadu. He has published many papers in video surveillance, background subtraction, image quality measure and image segmentation. His research is mainly focused on video surveillance.
C. Sureshkumar received Ph.D. in Computer Science and Engineering in 2011, both from Anna University, Tamil Nadu, India. He is working as Principal in J. K. K. Nattraja College of Engineering and Technology, Namakkal, Tamil Nadu. He has published many papers in computer vision applied to automation, motion analysis, image matching, image classification and view based object recognition and management oriented empirical and conceptual papers in leading journals and magazines. His present research is focused on statistical learning and its application to computer vision and image understanding, problem recognition and video surveillance.
Friedman N.
,
Russell S.
1997
“Image segmentation in video sequences: a probabilistic approach,”
in Proc 13th Conf on Uncertainty in Artificial Intelligence
175 
181
Grimson W.
,
Stauffer C.
,
Romano R.
,
Lee L.
1998
“Using Adaptive Tracking to Classify and Monitor Activities in a Site,”
IEEE CVPR
Stauffer C.
,
Grimson W.
1999
“Adaptive Background Mixture Models for Realtime Tracking,”
IEEE Conf. Comput. Vis. Pattern Recognit.
246 
252
Stauffer C.
,
Grimson W.
2000
“Learning Patterns of Activity Using Realtime Tracking,”
IEEE Transactions on PAMI
22
(8)
747 
757
DOI : 10.1109/34.868677
McKenna S.J.
,
Raja Y.
,
Gong S.
1998
“Object Tracking Using Adaptive Color Mixture Models,”
Proc. Asian Conf. Computer Vision
1
615 
622
Sato M.A.
,
Ishii S.
1999
“Online EM Algorithm for the Normalized Gaussian Network,”
Neural Computation
12
407 
432
Haritaoglu I.
,
Harwood D.
,
Davis L. S.
2000
“W4: Realtime surveillance of people and their activities,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
22
(8)
809 
830
DOI : 10.1109/34.868683
Stringa E.
,
Regazzoni C. S.
2000
“Realtime videoshot detection for scene surveillance applications,”
IEEE Transactions on Image Processing
9
(1)
69 
79
DOI : 10.1109/83.817599
Gupte S.
,
Masoud O.
,
Martin R. F. K.
,
Papanikolopoulos N. P.
2002
“Detection and classification of vehicles,”
IEEE Transactions on Intelligent Transportation Systems
3
(1)
37 
47
DOI : 10.1109/6979.994794
Chien S.Y.
,
Ma S.Y.
,
Chen L.G.
2002
“Efficient moving object segmentation algorithm using background registration technique,”
IEEE Transactions on Circuits and Systems for Video Technology
12
(7)
577 
586
DOI : 10.1109/TCSVT.2002.800516
Huang S. S.
,
Fu L. C.
,
Hsiao P. Y.
2004
“A regionbased background modeling and subtraction using partial directed Hausdorff distance,”
IEEE Int. Conf. Robotics and Automation
Huang S. S.
,
Fu L. C.
,
Hsiao P. Y.
2005
“A regionlevel motionbased background modeling and subtraction using MRFs,”
IEEE Int. Conf. Robotics and Automation
Zivkovic Z.
,
der Heijden F. V.
2006
“Efficient adaptive density estimation per image pixel for the task of background subtraction,”
Pattern Recognit. Lett.
27
(7)
773 
780
DOI : 10.1016/j.patrec.2005.11.005
Klare B.
,
Sarkar S.
2009
“Background subtraction in varying illuminations using an ensemble based on an enlarged feature set,”
in IEEE Comput. Soc. Conf. on Comput. Vision and Pattern Recognit. Workshops
Jung Claudio Rosito
“Efficient Background Subtraction and Shadow Removal for Monochromatic Video Sequences,”
IEEE Transactions on Multimedia
11
(3)
April 2009 
2009
Lin HorngHorng
,
Liu TyngLuh
,
Chuang JenHui
“Learning a Scene Background Model via Classification,”
IEEE Signal Processing
57
(5)
May 2009 
2009
McHugh J. Mike
,
Konrad Janusz
,
Saligrama Venkatesh
,
Jodoin PierreMarc
2009
“ForegroundAdaptive Background Subtraction,”
IEEE Signal Processing Letter
16
(5)
Mahadevan V.
,
Vasconcelos N.
2010
“Spatiotemporal Saliency in Dynamic Scenes,”
IEEE Transactions on Pattern Analysis and Machine Intelligence
32
(1)
171 
177
DOI : 10.1109/TPAMI.2009.112
Guo C.
,
Zhang L.
2010
“A novel multi resolution spatiotemporal saliency detection model and its applications in image and video compression,”
IEEE Transactions on Image Processing
19
(1)
185 
198
DOI : 10.1109/TIP.2009.2030969
Mahadevan V.
,
Vasconcelos N.
,
Jacobson N.
,
Lee YL.
,
Nguyen T.Q.
2010
“A Novel Approach to FRUC using Discriminant Saliency and Frame Segmentation,”
IEEE Trans. Image Process.
19
(11)
2924 
2934
DOI : 10.1109/TIP.2010.2050928
Ku MinYu
,
Liang LiWey
2010
“A Robust Object Segmentation System Using a ProbabilityBased Background Extraction Algorithm,”
IEEE Transactions on Circuits and Systems for Video Technology
20
(4)
Cheng L.
,
Gong M.
,
Schuurmans D.
2011
“Realtime discriminative background subtraction,”
IEEE Trans. Image Process.
20
(5)
1401 
1414
DOI : 10.1109/TIP.2010.2087764
Unzueta Luis
,
Nieto Marcos
,
Cortés Andoni
,
Barandiaran Javier
,
Otaegui Oihana
,
Sánchez Pedro
2012
“Adaptive Multicue Background Subtraction for Robust Vehicle Counting and Classification”
IEEE Transactions on Intelligent Transportation Systems
13
(2)
Hati Kalyan Kumar
,
Sa Pankaj Kumar
,
Majhi Banshidhar
2013
“Intensity Range Based Background Subtraction for Effective Object Detection”
IEEE Signal Processing Letters
20
(8)
Guo JingMing
,
Hsia ChihHsien
,
Liu YunFu
,
Shih Min Hsiung
,
Chang ChengHsin
,
Wu JingYu
2013
“Fast Background Subtraction Based on a Multilayer Codebook Model for Moving Object Detection,”
IEEE Transactions on Circuits and Systems for Video Technology
23
(10)
Huang Zhenkun
,
Hu Ruimin
,
Wang Zhongyuan
2013
“Background Subtraction With Video Coding,”
IEEE Signal Processing Letters
20
(11)