Advanced
A Noisy Videos Background Subtraction Algorithm Based on Dictionary Learning
A Noisy Videos Background Subtraction Algorithm Based on Dictionary Learning
KSII Transactions on Internet and Information Systems (TIIS). 2014. Jun, 8(6): 1946-1963
Copyright © 2014, Korean Society For Internet Information
  • Received : January 23, 2014
  • Accepted : April 28, 2014
  • Published : June 28, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Huaxin Xiao
College of Information System and Management, National University of Defense Technology, Changsha, PR China
Yu Liu
College of Information System and Management, National University of Defense Technology, Changsha, PR China
Shuren Tan
College of Information System and Management, National University of Defense Technology, Changsha, PR China
Jiang Duan
School of Economic Information Engineering, Southwestern University of Finance and Economics, Chengdu, PR China
Maojun Zhang
College of Information System and Management, National University of Defense Technology, Changsha, PR China

Abstract
Most background subtraction methods focus on dynamic and complex scenes without considering robustness against noise. This paper proposes a background subtraction algorithm based on dictionary learning and sparse coding for handling low light conditions. The proposed method formulates background modeling as the linear and sparse combination of atoms in the dictionary. The background subtraction is considered as the difference between sparse representations of the current frame and the background model. Assuming that the projection of the noise over the dictionary is irregular and random guarantees the adaptability of the approach in large noisy scenes. Experimental results divided in simulated large noise and realistic low light conditions show the promising robustness of the proposed approach compared with other competing methods.
Keywords
1. Introduction
T he continuous improvement of equipment manufacturing and computer processing capability led to the wide application of intelligent video surveillance technology to industry, defense, transportation, and other fields. One of the goals of this technology is to simulate the function of the human visual system, such as object tracking, classification, and behavior understanding, in an arbitrary scene. However, these smart applications are based on motion detection that correctly detects moving targets and exactly segments them. Three methods of motion detection have been developed in previous literature: optic flow, frame difference, and background subtraction.
The optic flow method [1 , 23 , 24] assigns a velocity vector to each pixel of image that forms an optic flow field. The optic flow field is an approximate estimate of the true motion field that reflects the gray changing trend of pixels. This method can detect motion object without any information of scene or with a stationary camera. However, sensitivity of light and complexity of computation restrict its application in video surveillance systems [2] .
Frame difference is based on a threshold difference between the previous and the current frames. This method efficiently performs in computational terms and grants a prompt object motion detection between two frames [3] . Nevertheless, it suffers two well-known drawbacks [4] caused by frame rate and object speed: foreground aperture and ghosting. Moreover, it lacks the flexibility to handle the dynamic and complex scene.
Background subtraction [5 - 10 , 14 - 16 , 30] establishes a background model of the monitored scene through a suitable method and then calculates the difference between the current frame and the background model, which segments the foreground area from the scene. It can solve issues of frame difference and robustly perform in the dynamic scene by using the background update procedure. A large number of algorithms have been developed to represent the statistical model of the background [25] . These methods perform at the level of pixel and ignore the correlation of spatial information. Wren et al. [5] independently modeled the background at each pixel location with a Gaussian probability density function. Later, Friedman and Russell [6] used three Gaussian distributions that correspond to the road, shadow, and vehicle to model the traffic surveillance system. Stauffer and Grimson [7] extended this opinion by employing a mixture of multiple Gaussian distributions to model the pixels in the scene. It has been proved to be a popular solution to the modeling of complex background. When the assumptions imposed by the selected model in parametric methods fail, the non-parametric approaches are a better choice. In non-parametric approaches, a kernel is created around each of the previous samples, and the density is estimated using an average over the kernels. Elgammal et al. [8] proposed a normal kernel that can deal with arbitrary shapes of the density function. Unlike the methods of statistical background model, Oliver et al. [9] considered the spatial configuration that captured the Eigen backgrounds by eigenvalue decomposition based on the whole image. Later Monnet et al. [10] proposed an incremental principal component analysis (PCA) method to predict model states, which can be used to capture motion characteristics of backgrounds. In practical applications, a robust PCA approach [11 - 13] was proposed that is more effective than the incremental PCA method. The spatial correlated approaches can effectively deal with the brightness and other global changes. In addition, employing compressive sensing theory in solving background subtraction has been successful in recent years. Cevher et al. [14] assumed that the majority of the pixels in a frame belong to the background. Thus, the foreground is sparse after background subtraction. Subsequently, Zhao et al. [15] further developed this idea by adding an assumption that the background also has a sparse representation and learning a dictionary to characterize the background changes.
- 1.1 Contribution and Organization
The aforementioned methods are mainly for the complex and dynamic scene in the background, such as rain, snow, waves and shaking trees, without considering low light or noisy environment. Large noise, low value and small differences in grey level are the typical characteristics of low light images. Excessive large noise and low grey value would bring negative influence on detection, which leads to the existing motion detection methods perform inappropriately.
In this paper, we propose a robust background subtraction method based on dictionary learning and sparse coding to handle the large noise condition. Firstly, this paper formulates the background modeling step as a sparse representation problem and regards the background subtraction as the sparse projection over the dictionary. Then it detects the foreground as the difference between the reconstructed image and the background model.
Secondly, different from the assumptions of [14 , 15] , we put forward a significant assumption that statistical noise is typically distributed through the larger space anisotropically. Then analysis and certify this assumption in the latter section. Based on this assumption, the proposed method can remove the influence of large noise distinctly and perform robust under different large noise and low light environments as a result of sparse representation.
The rest of this paper is organized as follows: Section 2 describes the basic principle of the proposed method based on three assumptions. Section 3 presents the mathematical formulation of the proposed method. Section 4 shows the comparison of the experimental results with those of the existing methods on public testing datasets with simulated noise and realistic low light videos. Section 5 concludes and discusses future possible research direction.
2. Basic principle
According to the approximate description of the proposed method in Section 1, the proposed approach can be divided into three parts: background modeling by dictionary learning and sparse coding, sparse representation of the current frame and foreground detection. This Section will introduce the principles of these parts based on the following assumptions that theoretically provide a reasonable explanation of the proposed method.
In the framework of background subtraction, the current frame I can be linearly decomposed as follows:
PPT Slide
Lager Image
where IB is the background model and IF is the foreground candidate.
The background model IB is the most critical step to the success of background subtraction approach. This model is established with the linear and sparse combination of the atoms in the dictionary D , which is based on the idea of background modeling with basis vectors [9] :
PPT Slide
Lager Image
where α is the sparse coefficient.
Compared with the eigenvalue decomposition [9] , the sparse decomposition over a redundant dictionary is more effective in applying signal processing. The background can be represented sparsely by projecting on the atoms of the dictionary. This process leads to the first assumption similar to [15] :
Assumption 1: The background of an arbitrary scene can be sparsely and linearly represented by the atoms of the dictionary.
Sparse representation always aims that the reconstruction signal can be as close as possible to the original one. When a moving target enters into the scene, it changes the structure of the background, and the original sparse representation will not be the same. In other words, when the test frame with moving objects is presented by the subspace spanned by pure background bases, the unchanged area of the scene can be well recovered. By contrast, the changed area would be reconstructed with a deviation of the projection on such subspace. Measuring this deviation satisfies the purpose of detection. The second assumption is proposed based on the above-mentioned analysis:
Assumption 2: The foreground leads to the changing of the background and greatly transforms the projection over the dictionary.
PPT Slide
Lager Image
Process of sparse coefficients changing when a foreground enters the scene. The dictionary D and sparse coefficients α are employed to represent the pure background model and the scene with foreground. Fig. 1A is the sparse representation of the pure background as described in Assumption 1. Then, the foreground breaks the original equation, as shown in Fig. 1B. The dictionary D can be obtained by employing the learning algorithms such as K-SVD and Online Dictionary Learning, and the coefficient α is a sparse coding problem.
Fig. 1 shows the process of sparse coefficients changing when a foreground enters into the scene. Fig. 1 A shows the sparse representation of the pure background as described in Assumption 1. Then, the foreground breaks the original equation, as shown in Fig. 1 B.
The two predominant sources of noise in digital image acquisition are the stochastic nature of the photon counting and the intrinsic thermal and electronic fluctuations of the acquisition devices [17] . Under the normal illumination circumstance, the second noise is the primary component. When the light decreases, the rapid increase of the first noise brings a large number of noise to the captured images. When the noise flashing level is very large, the existing detection methods become ineffective. To guarantee the adaptability of the proposed method under low light condition, a noise assumption is proposed:
Assumption 3: The projection of the noise over the dictionary is irregular and random.
PPT Slide
Lager Image
Comparison of sparse coefficients. The red curve in Fig. 2A is the sparse coefficient of the original background projected on the dictionary, and the blue one is the scene with foreground. The other three curves in Fig. 2B-D are the background with Gaussian white noise (σ = 400 ), Poisson noise ( α = 500 ) and mixture of both ( σ = 250,α = 250 ).
Fig. 2 shows the comparison of the sparse coefficients under different circumstances in a certain background. The red curve in Fig. 2 A represents the sparse coefficients of the original background projected on the learned dictionary, and the blue one is a case with foreground entering. When these two curves are compared, the foreground significantly and regularly changes the sparse coefficients. Regardless of the types of foreground, it always has a certain structure that presents the regular coefficients over the bases. The other three curves in Fig. 2 B-D are the background with Gaussian white noise ( σ = 400 ), Poisson noise ( α = 500 ) and mixture of both ( σ = 250, α = 250 ). These three curves randomly and confusedly reflect the sparse coefficients of the noise distributes on the dictionary as described in Assumption 3. Regardless of the types of noise, the randomness and anisotropy of noise determine the disorder of the distribution on the whole dictionary. Thus, when reconstructing an image through the sparse model, only several atoms in the dictionary are selected to represent the original signal. Most of the noise can be effectively removed. These factors ensure the proposed method is suitable for handling large noise environments.
3. Proposed method
The three assumptions described in Section 2 are the bases of the proposed method. First, according to Assumption 1, dictionary learning is applied to obtain the basis vectors of the scene. Then, sparse coding is combined with dictionary learning to model the background of the scene. For an arbitrary frame, the proposed method projects it on the learned dictionary to acquire the sparse representation. Finally, the difference between the sparse representations of the background model and the current frame are regarded as the detection criteria. Fig. 3 shows the flowchart about the detailed process of the proposed method.
PPT Slide
Lager Image
Flowchart of the proposed method.
- 3.1 Background modeling
In (2), the background model is formulated as the linear and sparse combination of the atoms in the dictionary D . Dictionary has been proved very effective for signal reconstruction and classification in audio and image processing domains [18] . Compared with the traditional signal decomposition methods such as wavelet and PCA, dictionary learning does not emphasize the orthogonality of bases. Thus, it represents the signal as having better adaptability and flexibility.
PPT Slide
Lager Image
(a) Training set established with N samples. Each image is divided into m×l blocks of size n×n pixels. (b) Learned dictionary with 256 atoms by Online Dictionary Learning [18].
The background frames without foreground are extracted from the surveillance video to form a training set with N samples. Fig. 4 (a) shows that each of the collected images is divided into m × l blocks of size n × n pixels. The j th image block of the i th sample can be vectorized as
PPT Slide
Lager Image
. Then the j th image block of each sample is combined and it consists of a training set of
PPT Slide
Lager Image
Its dictionary Dj satisfies the following formula [18 , 19] :
PPT Slide
Lager Image
where αi is the i th sparse coefficient and λ is a regularization parameter.
The Online Dictionary Learning algorithm [18] is used in this study to solve Formula (3). The learned dictionary with 256 atoms is shown in Fig. 4 (b). The algorithm adopts the stochastic gradient descent method in each loop to choose a vector
PPT Slide
Lager Image
that is regarded as xt from Xj and t is the times of the repeat. It applies sparse coding based on the previous t −1 loops to obtain the t th decomposition coefficient αt . The formula is as follows:
PPT Slide
Lager Image
Sparse coding is a class of methods that automatically choose good basis vectors for unlabeled data. The Least Angle Regression algorithm [20] can solve Formula (4), especially when the solution is sufficiently sparse. Furthermore, the solution is precise and does not rely on the correlation of atoms in the dictionary unless the solution is not unique. Then the dictionary D t−1 = [ d 1 ,⋯, dk ] is updated column by column and a new dictionary Dt is obtained. The update rules are as follows:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
When the background of a scene changes, the above-mentioned update rules can be used to update the background model, thereby ensuring robustness.
Sparse coding and dictionary updating are alternately performed until the times of iteration are achieved. This algorithm is simple, fast, and suitable for large-scale image processing. Mairal et al. [18] have shown that the algorithm can converge to a fixed point. The above-mentioned method is applied onto each block and then the whole image dictionary D and sparse coefficients α are obtained. The process of background modeling is then completed with (2), as described in Algorithm 1 .
PPT Slide
Lager Image
- 3.2 Foreground detection
After the background model is established, the next step is to detect the foreground. Similar to the process of background modeling, the sparse coefficients α′ on the dictionary D for an arbitrary frame I can be obtained by sparse coding. Referring to the idea of the background subtraction method, the foreground is detected by the differences between the sparse representation of the current frame I and the background model IB . Thus, the foreground that enters the monitored scene can be presented as follows:
PPT Slide
Lager Image
The differences of IF are calculated in blocks, and they are summed up as vector Δ:
PPT Slide
Lager Image
where IjF ( p ) is the p th pixel of the j th block in IF .
Then the threshold region T is used to judge Δ( j ). The structure of the j th block in this region does not change, i.e., no foreground accesses. By contrary, if an object enters the scene, then Δ( j ) is set to 0. This study assumes that the data in the Δ approximately follow the Gaussian distribution. Therefore, the upper and lower limits of the threshold region T are set with 3 σ criterion:
PPT Slide
Lager Image
where μ and σ are the mean and variance respectively of the differences between images in the training set and background model.
However, avoiding an isolated point that appears in the detection results is difficult for one time of judgment with certain threshold. The results of the previous threshold judgment are post-processed with weight coefficients. Given that the pixels in an object are monolithic, the image block can be determined according to its neighbor:
PPT Slide
Lager Image
where neighbour ( j ) is the 3×3 neighborhood of the j th block. SSIMj is the value of Structural Similarity Index Measurement [21] between the thi block of the current frame and the background model. 1− SSIMj is the weight coefficient that adequately uses the structure information. If the block of the current frame is similar to the one in the background model, the 1− SSIMj would be very small. The Δ′( j ) would then be low and the block would not be regarded as the foreground. Formula (9) can enhance the effect of foreground segmentation. The detailed algorithm about foreground detection is described in Algorithm 2 .
PPT Slide
Lager Image
4. Experiments
To show the qualitative and quantitative performance of the proposed method, it has been tested under different levels of light and noise conditions. The experiments are implemented in two parts: on the public testing datasets [22] and on realistic low light videos. The realistic videos are converted to 360×240 size, similar to the size of datasets in [22] to provide an even comparison.
- 4.1 Implementation details
Different levels of Gaussian noise, Poisson noise, or mixture of both are added to the images in the public testing dataset [22] . The following equation presents the process of artificial noise:
PPT Slide
Lager Image
where nI is the pixel value of the noise image and α is the scale factor. y and n obey the distribution of Poisson P ( λ ) and Gaussian N ( μ , σ 2) respectively. λ is the pixel value of the original image. μ and σ are the mean and variance of the Gaussian noise. The different degrees of noise images can be obtained by adjusting the parameters of (10). For the low light video, the illumination of the environment was recorded when taking the video.
In this study, the image block is treated as a basic processing unit, and the size of block has a certain effect on the computing speed, detection results, and recovered image effects. The different performances on the block size are shown in Fig. 5 . Smaller blocks can maintain a better precision of detection results, whereas a larger block size can guarantee the accuracy, as shown in the second row of Fig. 5 . Precision and accuracy are trade-off parameters, and simultaneously ensuring both at a high level is difficult. After testing and comparison, 12×12 is chosen as the block size. Such a block size deals with a frame in 1.5 seconds to 2.0 seconds on the MATLAB implementation of the proposed algorithm. The execution time is recorded by a machine with 2.2GHz Pentium E2200 processor and 2GB of RAM.
PPT Slide
Lager Image
Different sizes of block comparison. Noise was added to the Pedestrians dataset [22] with a mixture of additive Gaussian white noise ( σ = 50 ) and Poisson noise ( α = 50 ). The detection results are presented with blue boxes. Fig. 5A-C: Detection results of 8×8, 12×12, and 20×20 block size.
- 4.2 Results on public testing dataset
The proposed method is compared with the competing background subtraction algorithms: Mixture of Gaussian model [7] , Non-parametric model [8] , and ViBe [28] . The robustness of the proposed approach is then confirmed under different types of large noise.
PPT Slide
Lager Image
Detection results of different methods with different types of noise on Backdoor dataset [22]. Different levels of Gaussian white noise ( σ = 50 and 250), Poisson noise ( α=50 and 250) and mixture of both ( σ = 50,α = 50 and σ = 250, α = 250 ) are added to the original image. Fig. 6A: Test images with different types of noise. Fig. 6B-E: Detection results of using mixture of Gaussian model [7], non-parametric model [8], ViBe [28] and the proposed method.
Fig. 6 shows the comparison between competing background subtraction algorithms and the proposed method. Different levels of Gaussian white noise ( σ =50 and 250), Poisson noise ( α =50 and 250) and mixture of both ( σ = 50, α = 50 and σ = 250, α = 250 ) are added to the original image. The results show that in none noise condition, non-parametric [8] and Vibe [28] have a better exact detection result. However, when adding a certain degree noise to the original image, the results of the compared methods are seriously affected by the noise. With the noise continuing to rise, the compared methods lose efficacy completely because the background model assumptions fail. By contrary, the proposed method performs robust and handles different noise well. The proposed method was tested on various datasets in [22] and the detection results were presented on one of the datasets.
PPT Slide
Lager Image
Detection results under different types of noise on Bus Station dataset [22]. Fig. 7A: Original images with ground truth. Fig. 7B-D: Images and results with Gaussian white noise ( σ = 250 ), Poisson noise ( α = 250 ), and the mixture of both (σ = 150,α = 250).
In Fig. 7 , different types of noise are added on original test images. Disregard to the noise type, the proposed approach can perform a stable and robust detection work. However, since the dictionary is learned from the background images, the recovered blocks are close to the background when the foreground has similar colors. This circumstance increases the difficulty in detection process. In the second row of Fig. 7 , the color of the right-most pedestrian’s cloth is identical to the color of the dustbin which leads to a fail detection. By contrast, if the background and foreground colors have visible differences (the third row of Fig. 7 ), the proposed method can properly detect the person/object.
To evaluate the quantitative performance of the proposed method, three quantitative metrics were adopted in this study [30] :
PPT Slide
Lager Image
PPT Slide
Lager Image
PPT Slide
Lager Image
where tp is the number of pixels correctly classified as foreground. tp + fn and tp + fp are the number of pixels detected as foreground pixels by ground truth and the proposed method, respectively.
One hundred frames with foreground from Backdoor and Bus station dataset [22] were selected to calculate the quantitative metrics. The results are shown in Table 1 , 2 and 3 . In Table 1 , the compared methods have a higher value of Recall because of false detection affected by noise, whereas the Precision of the proposed method obtains a better performance in Table 2 . F-measure is considered as a single measure, that is, the weighted harmonic mean of Recall and Precision in (13). Table 3 shows that the proposed method has a satisfactory quantitative performance regardless of the dataset or noise level.
Quantitative metric ofRecall
PPT Slide
Lager Image
Quantitative metric of Recall
Quantitative metric ofPrecision
PPT Slide
Lager Image
Quantitative metric of Precision
Quantitative metric ofF-measure
PPT Slide
Lager Image
Quantitative metric of F-measure
- 4.3 Results on our low light video
The proposed method is employed on the low light video taken in this study. The image sensor is SONY IMX 104 CMOS. Similar to the Section 4.1, the methods of [7] , [8] and [28] are compared with the proposed method and the detection results are shown under different low illumination environments.
PPT Slide
Lager Image
Detection results of different methods under 0.1-0.5 lx environment. Fig. 8A: Test frames extracted from the low light video. Fig. 8B-E: Detection results of using mixture of Gaussian model [7], non-parametric model [8], ViBe [28] and the proposed method.
In Fig. 8 , the mixture of Gaussian [7] , non-parametric model [8] and ViBe [28] are compared with the proposed method under realistic low light condition. The illumination of the environment in Fig. 8 is about 0.1-0.5 lx . There are obvious noise appearing in the captured images. The noise in Fig. 8 corresponds the mixture of Gaussian ( σ = 50 ) and Poisson ( α = 50 ) noise approximately. With the reducing of the illumination, the noise captured in the low light video increases exponentially. While the light of the scene decreases from right to left of the scene, the performances of the compared methods are degenerate. When the moving object is in the left of the scene (the third and four row of the Fig. 8 ), the methods of [7] , [8] and [28] can detect the object. However, the compared methods perform poorly when it appears in the left of scene (the first and fifth row of the Fig. 8 ). Meanwhile, the proposed method behaves robust and well no matter where the foreground is. The method proposed in this study is robust regardless of the artificial large noise or realistic low light circumstances, as shown in Fig. 6 .
Fig. 9 shows the detection results of different methods under lower light. The illumination of environment in Fig. 9 is about 0.01-0.05 lx . In order to facilitate observation, the intensity of the brightness is increased, which also increases the noise level. Similar to the Fig. 8 , the methods of [7] , [8] and [28] are also compared with the proposed approach. The proposed method still behaves more robust than the compared methods.
PPT Slide
Lager Image
Detection results of different methods under 0.01-0.05 lx environment. Fig. 9A: Test frames extracted from the low light video. Fig. 9B-E: Detection results of using mixture of Gaussian model [7], non-parametric model [8], ViBe [28] and the proposed method.
5. Conclusion
Most background subtraction methods highlight the capability of handling dynamic scenes [10 , 15 , 29] but ignore low light circumstances. Large noise caused by low light will greatly affect the traditional algorithms and lead to their poor performances, as shown in Fig. 6 and 8 . This paper proposes a robust background subtraction algorithm based on dictionary learning and sparse coding to handle the large noise condition. The proposed method can achieve a satisfactory detection performance that is not influenced by the large statistical noise with different types and scales. The proposed method would poorly work when the variance of noise is larger than 500. In this case, distinguishing it from others is difficult for the human visual system.
In the proposed method, the whole image is divided into a group of blocks within which the motion detection is independently dealt with. Thus, the result is calculated as an inaccurate mosaicking output when compared with the pixel-level background subtraction methods. This study will focus on the precise detection of the proposed method, which will be further refined in the future. This study will also be a promising topic of investigation in the future.
BIO
Huaxin Xiao received his BS degree in automation from the University of Electronic Science and Technology of China. He is currently pursuing his MS degree in control science and engineering from the National University of Defense Technology, Changsha, China. His research interests include sparse representation and computer vision.
Yu Liu received his BS degree from Northwestern Polytechnical University, Xi’an, China in 2005. He then received his MSc on image processing and PhD on computer graphics from the University of East Anglia, Norwich, UK, in 2007 and 2011, respectively. He is currently a lecturer in the department of system engineering, National University of Defense Technology. His research interests include image/video processing, computer graphics, and visual haptic technology.
Shuren Tan received the Bachelor, MS and PhD degrees from the Department of System Engineering at the National University of Defense Technology , Changsha, China in 1993, 1996 and 2011, respectively. He is currently an Associate Professor of System Engineering at the National University of Defense Technology. His research interests include computational imaging, computer vision, and signal processing.
Jiang Duan received his BS degree from Southwest Jiaotong University, Chengdu, China in 2002. He then received his PhD on image processing from the University of Nottingham, England, UK in 2006. He is currently a professor in the school of economic information engineering, Southwestern University of Finance and Economics. His research interests include image processing, computer vision, and information engineering.
Maojun Zhang received his BS and PhD degrees in system engineering from National the University of Defense Technology, Changsha, China, in 1992 and 1997, respectively. He is currently a professor in the department of system engineering, National University of Defense Technology. His research interests include computer vision, information system engineering, system simulation, and virtual reality technology.
References
Barron J. L. , Fleet D. J. , Beauchemin S. S. 1994 “Performance of optical flow techniques” International journal of computer vision Article (CrossRef Link) 12 (1) 43 - 77    DOI : 10.1007/BF01420984
Beauchemin S. S. , Barron J. L. 1995 “The computation of optical flow” ACM Computing Surveys (CSUR) Article (CrossRef Link) 27 (3) 433 - 466    DOI : 10.1145/212094.212141
Migliore D. A. , Matteucci M. , Naccari M M. 2006 “A revaluation of frame difference in fast and robust motion detection” in Proc. of ACM VSSN Oct. Article (CrossRef Link) 215 - 218
Lee H. , Hong S. , Kim E. 2011 “Probabilistic Background Subtraction in a Video-based Recognition System” KSII Transactions on Internet & Information Systems 5 (4)
Article (CrossRef Link)
Wren C. R. , Azarbayejani A. , Darrell T. , Pentland A. P. 1997 “Pfinder: Real-time tracking of the human body” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 19 (7) 780 - 785    DOI : 10.1109/34.598236
Friedman N. , Russell S S. 1997 “Image segmentation in video sequences: A probabilistic approach” in Proc. of the 13th conference on Uncertainty in artificial intelligence Aug. Article (CrossRef Link) 175 - 181
Stauffer C. , Grimson W. E. L. 1999 “Adaptive background mixture models for real-time tracking” in Proc. of IEEE CVPR Jun. vol. 2, Article (CrossRef Link)
Elgammal A. , Harwood D. , Davis L. 2000 “Non-parametric model for background subtraction” in Proc. of Computer Vision-ECCV Jul. Article (CrossRef Link) 751 - 767
Oliver N. M. , Rosario B. , Pentland A. P. 2000 “A Bayesian computer vision system for modeling human interactions” IEEE Transactions on Pattern Analysis and Machine Intelligence Article (CrossRef Link) 22 (8) 831 - 843    DOI : 10.1109/34.868684
Monnet A. , Mittal A. , Paragios N. , Ramesh V. 2003 “Background modeling and subtraction of dynamic scenes” in Proc. of IEEE ICCV Oct. Article (CrossRef Link) 1305 - 1312
De La Torre F. , Black M. J. 2003 “A framework for robust subspace learning” International Journal of Computer Vision Article (CrossRef Link) 54 (1-3) 117 - 142    DOI : 10.1023/A:1023709501986
Ke Q. , Kanade T. 2005 “Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming” in Proc. of IEEE CVPR Jun. vol. 1, Article (CrossRef Link) 739 - 746
Candès E. J. , Li X. , Ma Y. , Wright J. 2011 “Robust principal component analysis?” Journal of the ACM Article (CrossRef Link) 58 (3) 11 -    DOI : 10.1145/1970392.1970395
Cevher V. , Sankaranarayanan A. , Duarte M. F. , Reddy D. , Baraniuk R. G. , Chellappa R. 2008 “Compressive sensing for background subtraction” in Proc. of Computer Vision-ECCV Oct. Article (CrossRef Link) 155 - 168
Cong Z. , Xiaogang W. , Wai-Kuen. C. 2011 “Background subtraction via robust dictionary learning” EURASIP Journal on Image and Video Processing Article (CrossRef Link)
Sivalingam R. , D'Souza A. , Bazakos M. , Miezianko R. , Morellas V. , Papanikolopoulos. N. 2011 “Dictionary learning for robust background modeling” in Proc. of IEEE ICRA Article (CrossRef Link) 4234 - 4239
Luisier F. , Blu T. , Unser M. 2011 “Image denoising in mixed Poisson-Gaussian noise” IEEE Transactions on Image Processing Article (CrossRef Link) 20 (3) 696 - 708    DOI : 10.1109/TIP.2010.2073477
Mairal J. , Bach F. , Ponce J. , Sapiro G. 2010 “Online learning for matrix factorization and sparse coding” The Journal of Machine Learning Researc Article (CrossRef Link) 11 19 - 60
Aharon M. , Elad M. , Bruckstein A. 2006 “K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation” IEEE Transactions on Signal Processing Article (CrossRef Link) 54 (11) 4311 - 4322    DOI : 10.1109/TSP.2006.881199
Efron B. , Hastie T. , Johnstone I. , Tibshirani R. 2004 “Least angle regression” The Annals of statistics Article (CrossRef Link) 32 (2) 407 - 499    DOI : 10.1214/009053604000000067
Wang Z. , Bovik A. C. , Sheikh H. R. , Simoncelli E. P. 2004 “Image quality assessment: From error visibility to structural similarity” IEEE Transactions on Image Processing Article (CrossRef Link) 13 (4) 600 - 612    DOI : 10.1109/TIP.2003.819861
Dataset available from:
Horn B. K. P. , Schunck B. G. 1981 “Determining optical flow” Artificial intelligence Article (CrossRef Link) 17 (1) 185 - 203    DOI : 10.1016/0004-3702(81)90024-2
Koenderink J. J. 1986 “Optic flow” Vision research Article (CrossRef Link) 26 (1) 161 - 179    DOI : 10.1016/0042-6989(86)90078-7
Li L. , Huang W. , Gu I Y H , Tian Q 2004 “Statistical modeling of complex backgrounds for foreground object detection” IEEE Transactions on Image Processing Article (CrossRef Link) 13 (11) 1459 - 1472    DOI : 10.1109/TIP.2004.836169
Jodoin P. M. , Mignotte M. , Konrad J. 2007 “Statistical background subtraction using spatial cues” IEEE Transactions on Circuits and Systems for Video Technology Article (CrossRef Link) 17 (12) 1758 - 1763    DOI : 10.1109/TCSVT.2007.906935
Huang J. , Huang X. , Metaxas D. 2009 “Learning with dynamic group sparsity” in Proc. of IEEE ICCV Sep. Article (CrossRef Link) 64 - 71
Barnich O. , Droogenbroeck M. V. 2011 “Vibe: A universal background subtraction algorithm for video sequences” IEEE Transactions on Image Processing Article (CrossRef Link) 20 (6) 1709 - 1724    DOI : 10.1109/TIP.2010.2101613
Wu B. F. , Juang J. H. 2011 “Real-Time Vehicle Detector with Dynamic Segmentation and Rule-based Tracking Reasoning for Complex Traffic Conditions” KSII Transactions on Internet & Information Systems Article (CrossRef Link) 5 (12)
Maddalena L. , Petrosino A. 2008 “A self-organizing approach to background subtraction for visual surveillance applications” IEEE Transactions on Image Processing Article (CrossRef Link) 17 (7) 1168 - 1177    DOI : 10.1109/TIP.2008.924285