Most background subtraction methods focus on dynamic and complex scenes without considering robustness against noise. This paper proposes a background subtraction algorithm based on dictionary learning and sparse coding for handling low light conditions. The proposed method formulates background modeling as the linear and sparse combination of atoms in the dictionary. The background subtraction is considered as the difference between sparse representations of the current frame and the background model. Assuming that the projection of the noise over the dictionary is irregular and random guarantees the adaptability of the approach in large noisy scenes. Experimental results divided in simulated large noise and realistic low light conditions show the promising robustness of the proposed approach compared with other competing methods.
1. Introduction
T
he continuous improvement of equipment manufacturing and computer processing capability led to the wide application of intelligent video surveillance technology to industry, defense, transportation, and other fields. One of the goals of this technology is to simulate the function of the human visual system, such as object tracking, classification, and behavior understanding, in an arbitrary scene. However, these smart applications are based on motion detection that correctly detects moving targets and exactly segments them. Three methods of motion detection have been developed in previous literature: optic flow, frame difference, and background subtraction.
The optic flow method
[1
,
23
,
24]
assigns a velocity vector to each pixel of image that forms an optic flow field. The optic flow field is an approximate estimate of the true motion field that reflects the gray changing trend of pixels. This method can detect motion object without any information of scene or with a stationary camera. However, sensitivity of light and complexity of computation restrict its application in video surveillance systems
[2]
.
Frame difference is based on a threshold difference between the previous and the current frames. This method efficiently performs in computational terms and grants a prompt object motion detection between two frames
[3]
. Nevertheless, it suffers two wellknown drawbacks
[4]
caused by frame rate and object speed: foreground aperture and ghosting. Moreover, it lacks the flexibility to handle the dynamic and complex scene.
Background subtraction
[5

10
,
14

16
,
30]
establishes a background model of the monitored scene through a suitable method and then calculates the difference between the current frame and the background model, which segments the foreground area from the scene. It can solve issues of frame difference and robustly perform in the dynamic scene by using the background update procedure. A large number of algorithms have been developed to represent the statistical model of the background
[25]
. These methods perform at the level of pixel and ignore the correlation of spatial information. Wren et al.
[5]
independently modeled the background at each pixel location with a Gaussian probability density function. Later, Friedman and Russell
[6]
used three Gaussian distributions that correspond to the road, shadow, and vehicle to model the traffic surveillance system. Stauffer and Grimson
[7]
extended this opinion by employing a mixture of multiple Gaussian distributions to model the pixels in the scene. It has been proved to be a popular solution to the modeling of complex background. When the assumptions imposed by the selected model in parametric methods fail, the nonparametric approaches are a better choice. In nonparametric approaches, a kernel is created around each of the previous samples, and the density is estimated using an average over the kernels. Elgammal et al.
[8]
proposed a normal kernel that can deal with arbitrary shapes of the density function. Unlike the methods of statistical background model, Oliver et al.
[9]
considered the spatial configuration that captured the Eigen backgrounds by eigenvalue decomposition based on the whole image. Later Monnet et al.
[10]
proposed an incremental principal component analysis (PCA) method to predict model states, which can be used to capture motion characteristics of backgrounds. In practical applications, a robust PCA approach
[11

13]
was proposed that is more effective than the incremental PCA method. The spatial correlated approaches can effectively deal with the brightness and other global changes. In addition, employing compressive sensing theory in solving background subtraction has been successful in recent years. Cevher et al.
[14]
assumed that the majority of the pixels in a frame belong to the background. Thus, the foreground is sparse after background subtraction. Subsequently, Zhao et al.
[15]
further developed this idea by adding an assumption that the background also has a sparse representation and learning a dictionary to characterize the background changes.
 1.1 Contribution and Organization
The aforementioned methods are mainly for the complex and dynamic scene in the background, such as rain, snow, waves and shaking trees, without considering low light or noisy environment. Large noise, low value and small differences in grey level are the typical characteristics of low light images. Excessive large noise and low grey value would bring negative influence on detection, which leads to the existing motion detection methods perform inappropriately.
In this paper, we propose a robust background subtraction method based on dictionary learning and sparse coding to handle the large noise condition. Firstly, this paper formulates the background modeling step as a sparse representation problem and regards the background subtraction as the sparse projection over the dictionary. Then it detects the foreground as the difference between the reconstructed image and the background model.
Secondly, different from the assumptions of
[14
,
15]
, we put forward a significant assumption that statistical noise is typically distributed through the larger space anisotropically. Then analysis and certify this assumption in the latter section. Based on this assumption, the proposed method can remove the influence of large noise distinctly and perform robust under different large noise and low light environments as a result of sparse representation.
The rest of this paper is organized as follows: Section 2 describes the basic principle of the proposed method based on three assumptions. Section 3 presents the mathematical formulation of the proposed method. Section 4 shows the comparison of the experimental results with those of the existing methods on public testing datasets with simulated noise and realistic low light videos. Section 5 concludes and discusses future possible research direction.
2. Basic principle
According to the approximate description of the proposed method in Section 1, the proposed approach can be divided into three parts: background modeling by dictionary learning and sparse coding, sparse representation of the current frame and foreground detection. This Section will introduce the principles of these parts based on the following assumptions that theoretically provide a reasonable explanation of the proposed method.
In the framework of background subtraction, the current frame
I
can be linearly decomposed as follows:
where
I_{B}
is the background model and
I_{F}
is the foreground candidate.
The background model
I_{B}
is the most critical step to the success of background subtraction approach. This model is established with the linear and sparse combination of the atoms in the dictionary
D
, which is based on the idea of background modeling with basis vectors
[9]
:
where
α
is the sparse coefficient.
Compared with the eigenvalue decomposition
[9]
, the sparse decomposition over a redundant dictionary is more effective in applying signal processing. The background can be represented sparsely by projecting on the atoms of the dictionary. This process leads to the first assumption similar to
[15]
:
Assumption 1:
The background of an arbitrary scene can be sparsely and linearly represented by the atoms of the dictionary.
Sparse representation always aims that the reconstruction signal can be as close as possible to the original one. When a moving target enters into the scene, it changes the structure of the background, and the original sparse representation will not be the same. In other words, when the test frame with moving objects is presented by the subspace spanned by pure background bases, the unchanged area of the scene can be well recovered. By contrast, the changed area would be reconstructed with a deviation of the projection on such subspace. Measuring this deviation satisfies the purpose of detection. The second assumption is proposed based on the abovementioned analysis:
Assumption 2:
The foreground leads to the changing of the background and greatly transforms the projection over the dictionary.
Process of sparse coefficients changing when a foreground enters the scene. The dictionary D and sparse coefficients α are employed to represent the pure background model and the scene with foreground. Fig. 1A is the sparse representation of the pure background as described in Assumption 1. Then, the foreground breaks the original equation, as shown in Fig. 1B. The dictionary D can be obtained by employing the learning algorithms such as KSVD and Online Dictionary Learning, and the coefficient α is a sparse coding problem.
Fig. 1
shows the process of sparse coefficients changing when a foreground enters into the scene.
Fig. 1
A shows the sparse representation of the pure background as described in Assumption 1. Then, the foreground breaks the original equation, as shown in
Fig. 1
B.
The two predominant sources of noise in digital image acquisition are the stochastic nature of the photon counting and the intrinsic thermal and electronic fluctuations of the acquisition devices
[17]
. Under the normal illumination circumstance, the second noise is the primary component. When the light decreases, the rapid increase of the first noise brings a large number of noise to the captured images. When the noise flashing level is very large, the existing detection methods become ineffective. To guarantee the adaptability of the proposed method under low light condition, a noise assumption is proposed:
Assumption 3:
The projection of the noise over the dictionary is irregular and random.
Comparison of sparse coefficients. The red curve in Fig. 2A is the sparse coefficient of the original background projected on the dictionary, and the blue one is the scene with foreground. The other three curves in Fig. 2BD are the background with Gaussian white noise (σ = 400 ), Poisson noise ( α = 500 ) and mixture of both ( σ = 250,α = 250 ).
Fig. 2
shows the comparison of the sparse coefficients under different circumstances in a certain background. The red curve in
Fig. 2
A represents the sparse coefficients of the original background projected on the learned dictionary, and the blue one is a case with foreground entering. When these two curves are compared, the foreground significantly and regularly changes the sparse coefficients. Regardless of the types of foreground, it always has a certain structure that presents the regular coefficients over the bases. The other three curves in
Fig. 2
BD are the background with Gaussian white noise (
σ
= 400 ), Poisson noise (
α
= 500 ) and mixture of both (
σ
= 250,
α
= 250 ). These three curves randomly and confusedly reflect the sparse coefficients of the noise distributes on the dictionary as described in Assumption 3. Regardless of the types of noise, the randomness and anisotropy of noise determine the disorder of the distribution on the whole dictionary. Thus, when reconstructing an image through the sparse model, only several atoms in the dictionary are selected to represent the original signal. Most of the noise can be effectively removed. These factors ensure the proposed method is suitable for handling large noise environments.
3. Proposed method
The three assumptions described in Section 2 are the bases of the proposed method. First, according to Assumption 1, dictionary learning is applied to obtain the basis vectors of the scene. Then, sparse coding is combined with dictionary learning to model the background of the scene. For an arbitrary frame, the proposed method projects it on the learned dictionary to acquire the sparse representation. Finally, the difference between the sparse representations of the background model and the current frame are regarded as the detection criteria.
Fig. 3
shows the flowchart about the detailed process of the proposed method.
Flowchart of the proposed method.
 3.1 Background modeling
In (2), the background model is formulated as the linear and sparse combination of the atoms in the dictionary
D
. Dictionary has been proved very effective for signal reconstruction and classification in audio and image processing domains
[18]
. Compared with the traditional signal decomposition methods such as wavelet and PCA, dictionary learning does not emphasize the orthogonality of bases. Thus, it represents the signal as having better adaptability and flexibility.
(a) Training set established with N samples. Each image is divided into m×l blocks of size n×n pixels. (b) Learned dictionary with 256 atoms by Online Dictionary Learning [18].
The background frames without foreground are extracted from the surveillance video to form a training set with
N
samples.
Fig. 4
(a) shows that each of the collected images is divided into
m
×
l
blocks of size
n
×
n
pixels. The
j
th image block of the
i
th sample can be vectorized as
. Then the
j
th image block of each sample is combined and it consists of a training set of
Its dictionary
D_{j}
satisfies the following formula
[18
,
19]
:
where
α_{i}
is the
i
th sparse coefficient and
λ
is a regularization parameter.
The Online Dictionary Learning algorithm
[18]
is used in this study to solve Formula (3). The learned dictionary with 256 atoms is shown in
Fig. 4
(b). The algorithm adopts the stochastic gradient descent method in each loop to choose a vector
that is regarded as
x_{t}
from
X_{j}
and
t
is the times of the repeat. It applies sparse coding based on the previous
t
−1 loops to obtain the
t
th decomposition coefficient
α_{t}
. The formula is as follows:
Sparse coding is a class of methods that automatically choose good basis vectors for unlabeled data. The Least Angle Regression algorithm
[20]
can solve Formula (4), especially when the solution is sufficiently sparse. Furthermore, the solution is precise and does not rely on the correlation of atoms in the dictionary unless the solution is not unique. Then the dictionary
D
_{t−1}
= [
d
_{1}
,⋯,
d_{k}
] is updated column by column and a new dictionary
D_{t}
is obtained. The update rules are as follows:
where
When the background of a scene changes, the abovementioned update rules can be used to update the background model, thereby ensuring robustness.
Sparse coding and dictionary updating are alternately performed until the times of iteration are achieved. This algorithm is simple, fast, and suitable for largescale image processing. Mairal et al.
[18]
have shown that the algorithm can converge to a fixed point. The abovementioned method is applied onto each block and then the whole image dictionary
D
and sparse coefficients
α
are obtained. The process of background modeling is then completed with (2), as described in
Algorithm 1
.
 3.2 Foreground detection
After the background model is established, the next step is to detect the foreground. Similar to the process of background modeling, the sparse coefficients
α′
on the dictionary
D
for an arbitrary frame
I
can be obtained by sparse coding. Referring to the idea of the background subtraction method, the foreground is detected by the differences between the sparse representation of the current frame
I
and the background model
I_{B}
. Thus, the foreground that enters the monitored scene can be presented as follows:
The differences of
I_{F}
are calculated in blocks, and they are summed up as vector Δ:
where
I^{j}_{F}
(
p
) is the
p
th pixel of the
j
th block in
I_{F}
.
Then the threshold region
T
is used to judge Δ(
j
). The structure of the
j
th block in this region does not change, i.e., no foreground accesses. By contrary, if an object enters the scene, then Δ(
j
) is set to 0. This study assumes that the data in the Δ approximately follow the Gaussian distribution. Therefore, the upper and lower limits of the threshold region
T
are set with 3
σ
criterion:
where
μ
and
σ
are the mean and variance respectively of the differences between images in the training set and background model.
However, avoiding an isolated point that appears in the detection results is difficult for one time of judgment with certain threshold. The results of the previous threshold judgment are postprocessed with weight coefficients. Given that the pixels in an object are monolithic, the image block can be determined according to its neighbor:
where
neighbour
(
j
) is the 3×3 neighborhood of the
j
th block.
SSIM_{j}
is the value of Structural Similarity Index Measurement
[21]
between the thi block of the current frame and the background model. 1−
SSIM_{j}
is the weight coefficient that adequately uses the structure information. If the block of the current frame is similar to the one in the background model, the 1−
SSIM_{j}
would be very small. The Δ′(
j
) would then be low and the block would not be regarded as the foreground. Formula (9) can enhance the effect of foreground segmentation. The detailed algorithm about foreground detection is described in
Algorithm 2
.
4. Experiments
To show the qualitative and quantitative performance of the proposed method, it has been tested under different levels of light and noise conditions. The experiments are implemented in two parts: on the public testing datasets
[22]
and on realistic low light videos. The realistic videos are converted to 360×240 size, similar to the size of datasets in
[22]
to provide an even comparison.
 4.1 Implementation details
Different levels of Gaussian noise, Poisson noise, or mixture of both are added to the images in the public testing dataset
[22]
. The following equation presents the process of artificial noise:
where
nI
is the pixel value of the noise image and
α
is the scale factor.
y
and
n
obey the distribution of Poisson
P
(
λ
) and Gaussian
N
(
μ
,
σ
2) respectively.
λ
is the pixel value of the original image.
μ
and
σ
are the mean and variance of the Gaussian noise. The different degrees of noise images can be obtained by adjusting the parameters of (10). For the low light video, the illumination of the environment was recorded when taking the video.
In this study, the image block is treated as a basic processing unit, and the size of block has a certain effect on the computing speed, detection results, and recovered image effects. The different performances on the block size are shown in
Fig. 5
. Smaller blocks can maintain a better precision of detection results, whereas a larger block size can guarantee the accuracy, as shown in the second row of
Fig. 5
. Precision and accuracy are tradeoff parameters, and simultaneously ensuring both at a high level is difficult. After testing and comparison, 12×12 is chosen as the block size. Such a block size deals with a frame in 1.5 seconds to 2.0 seconds on the MATLAB implementation of the proposed algorithm. The execution time is recorded by a machine with 2.2GHz Pentium E2200 processor and 2GB of RAM.
Different sizes of block comparison. Noise was added to the Pedestrians dataset [22] with a mixture of additive Gaussian white noise ( σ = 50 ) and Poisson noise ( α = 50 ). The detection results are presented with blue boxes. Fig. 5AC: Detection results of 8×8, 12×12, and 20×20 block size.
 4.2 Results on public testing dataset
The proposed method is compared with the competing background subtraction algorithms: Mixture of Gaussian model
[7]
, Nonparametric model
[8]
, and ViBe
[28]
. The robustness of the proposed approach is then confirmed under different types of large noise.
Detection results of different methods with different types of noise on Backdoor dataset [22]. Different levels of Gaussian white noise ( σ = 50 and 250), Poisson noise ( α=50 and 250) and mixture of both ( σ = 50,α = 50 and σ = 250, α = 250 ) are added to the original image. Fig. 6A: Test images with different types of noise. Fig. 6BE: Detection results of using mixture of Gaussian model [7], nonparametric model [8], ViBe [28] and the proposed method.
Fig. 6
shows the comparison between competing background subtraction algorithms and the proposed method. Different levels of Gaussian white noise (
σ
=50 and 250), Poisson noise (
α
=50 and 250) and mixture of both (
σ
= 50,
α
= 50 and
σ
= 250,
α
= 250 ) are added to the original image. The results show that in none noise condition, nonparametric
[8]
and Vibe
[28]
have a better exact detection result. However, when adding a certain degree noise to the original image, the results of the compared methods are seriously affected by the noise. With the noise continuing to rise, the compared methods lose efficacy completely because the background model assumptions fail. By contrary, the proposed method performs robust and handles different noise well. The proposed method was tested on various datasets in
[22]
and the detection results were presented on one of the datasets.
Detection results under different types of noise on Bus Station dataset [22]. Fig. 7A: Original images with ground truth. Fig. 7BD: Images and results with Gaussian white noise ( σ = 250 ), Poisson noise ( α = 250 ), and the mixture of both (σ = 150,α = 250).
In
Fig. 7
, different types of noise are added on original test images. Disregard to the noise type, the proposed approach can perform a stable and robust detection work. However, since the dictionary is learned from the background images, the recovered blocks are close to the background when the foreground has similar colors. This circumstance increases the difficulty in detection process. In the second row of
Fig. 7
, the color of the rightmost pedestrian’s cloth is identical to the color of the dustbin which leads to a fail detection. By contrast, if the background and foreground colors have visible differences (the third row of
Fig. 7
), the proposed method can properly detect the person/object.
To evaluate the quantitative performance of the proposed method, three quantitative metrics were adopted in this study
[30]
:
where
tp
is the number of pixels correctly classified as foreground.
tp
+
fn
and
tp
+
fp
are the number of pixels detected as foreground pixels by ground truth and the proposed method, respectively.
One hundred frames with foreground from
Backdoor
and
Bus station
dataset
[22]
were selected to calculate the quantitative metrics. The results are shown in
Table 1
,
2
and
3
. In
Table 1
, the compared methods have a higher value of Recall because of false detection affected by noise, whereas the Precision of the proposed method obtains a better performance in
Table 2
. Fmeasure is considered as a single measure, that is, the weighted harmonic mean of Recall and Precision in (13).
Table 3
shows that the proposed method has a satisfactory quantitative performance regardless of the dataset or noise level.
Quantitative metric ofRecall
Quantitative metric of Recall
Quantitative metric ofPrecision
Quantitative metric of Precision
Quantitative metric ofFmeasure
Quantitative metric of Fmeasure
 4.3 Results on our low light video
The proposed method is employed on the low light video taken in this study. The image sensor is SONY IMX 104 CMOS. Similar to the Section 4.1, the methods of
[7]
,
[8]
and
[28]
are compared with the proposed method and the detection results are shown under different low illumination environments.
Detection results of different methods under 0.10.5 lx environment. Fig. 8A: Test frames extracted from the low light video. Fig. 8BE: Detection results of using mixture of Gaussian model [7], nonparametric model [8], ViBe [28] and the proposed method.
In
Fig. 8
, the mixture of Gaussian
[7]
, nonparametric model
[8]
and ViBe
[28]
are compared with the proposed method under realistic low light condition. The illumination of the environment in
Fig. 8
is about 0.10.5
lx
. There are obvious noise appearing in the captured images. The noise in
Fig. 8
corresponds the mixture of Gaussian (
σ
= 50 ) and Poisson (
α
= 50 ) noise approximately. With the reducing of the illumination, the noise captured in the low light video increases exponentially. While the light of the scene decreases from right to left of the scene, the performances of the compared methods are degenerate. When the moving object is in the left of the scene (the third and four row of the
Fig. 8
), the methods of
[7]
,
[8]
and
[28]
can detect the object. However, the compared methods perform poorly when it appears in the left of scene (the first and fifth row of the
Fig. 8
). Meanwhile, the proposed method behaves robust and well no matter where the foreground is. The method proposed in this study is robust regardless of the artificial large noise or realistic low light circumstances, as shown in
Fig. 6
.
Fig. 9
shows the detection results of different methods under lower light. The illumination of environment in
Fig. 9
is about 0.010.05
lx
. In order to facilitate observation, the intensity of the brightness is increased, which also increases the noise level. Similar to the
Fig. 8
, the methods of
[7]
,
[8]
and
[28]
are also compared with the proposed approach. The proposed method still behaves more robust than the compared methods.
Detection results of different methods under 0.010.05 lx environment. Fig. 9A: Test frames extracted from the low light video. Fig. 9BE: Detection results of using mixture of Gaussian model [7], nonparametric model [8], ViBe [28] and the proposed method.
5. Conclusion
Most background subtraction methods highlight the capability of handling dynamic scenes
[10
,
15
,
29]
but ignore low light circumstances. Large noise caused by low light will greatly affect the traditional algorithms and lead to their poor performances, as shown in
Fig. 6
and
8
. This paper proposes a robust background subtraction algorithm based on dictionary learning and sparse coding to handle the large noise condition. The proposed method can achieve a satisfactory detection performance that is not influenced by the large statistical noise with different types and scales. The proposed method would poorly work when the variance of noise is larger than 500. In this case, distinguishing it from others is difficult for the human visual system.
In the proposed method, the whole image is divided into a group of blocks within which the motion detection is independently dealt with. Thus, the result is calculated as an inaccurate mosaicking output when compared with the pixellevel background subtraction methods. This study will focus on the precise detection of the proposed method, which will be further refined in the future. This study will also be a promising topic of investigation in the future.
BIO
Huaxin Xiao received his BS degree in automation from the University of Electronic Science and Technology of China. He is currently pursuing his MS degree in control science and engineering from the National University of Defense Technology, Changsha, China. His research interests include sparse representation and computer vision.
Yu Liu received his BS degree from Northwestern Polytechnical University, Xi’an, China in 2005. He then received his MSc on image processing and PhD on computer graphics from the University of East Anglia, Norwich, UK, in 2007 and 2011, respectively. He is currently a lecturer in the department of system engineering, National University of Defense Technology. His research interests include image/video processing, computer graphics, and visual haptic technology.
Shuren Tan received the Bachelor, MS and PhD degrees from the Department of System Engineering at the National University of Defense Technology , Changsha, China in 1993, 1996 and 2011, respectively. He is currently an Associate Professor of System Engineering at the National University of Defense Technology. His research interests include computational imaging, computer vision, and signal processing.
Jiang Duan received his BS degree from Southwest Jiaotong University, Chengdu, China in 2002. He then received his PhD on image processing from the University of Nottingham, England, UK in 2006. He is currently a professor in the school of economic information engineering, Southwestern University of Finance and Economics. His research interests include image processing, computer vision, and information engineering.
Maojun Zhang received his BS and PhD degrees in system engineering from National the University of Defense Technology, Changsha, China, in 1992 and 1997, respectively. He is currently a professor in the department of system engineering, National University of Defense Technology. His research interests include computer vision, information system engineering, system simulation, and virtual reality technology.
Barron J. L.
,
Fleet D. J.
,
Beauchemin S. S.
1994
“Performance of optical flow techniques”
International journal of computer vision
Article (CrossRef Link)
12
(1)
43 
77
DOI : 10.1007/BF01420984
Beauchemin S. S.
,
Barron J. L.
1995
“The computation of optical flow”
ACM Computing Surveys (CSUR)
Article (CrossRef Link)
27
(3)
433 
466
DOI : 10.1145/212094.212141
Migliore D. A.
,
Matteucci M.
,
Naccari M M.
2006
“A revaluation of frame difference in fast and robust motion detection”
in Proc. of ACM VSSN
Oct.
Article (CrossRef Link)
215 
218
Lee H.
,
Hong S.
,
Kim E.
2011
“Probabilistic Background Subtraction in a Videobased Recognition System”
KSII Transactions on Internet & Information Systems
5
(4)
Wren C. R.
,
Azarbayejani A.
,
Darrell T.
,
Pentland A. P.
1997
“Pfinder: Realtime tracking of the human body”
IEEE Transactions on Pattern Analysis and Machine Intelligence
Article (CrossRef Link)
19
(7)
780 
785
DOI : 10.1109/34.598236
Friedman N.
,
Russell S S.
1997
“Image segmentation in video sequences: A probabilistic approach”
in Proc. of the 13th conference on Uncertainty in artificial intelligence
Aug.
Article (CrossRef Link)
175 
181
Stauffer C.
,
Grimson W. E. L.
1999
“Adaptive background mixture models for realtime tracking”
in Proc. of IEEE CVPR
Jun.
vol. 2, Article (CrossRef Link)
Elgammal A.
,
Harwood D.
,
Davis L.
2000
“Nonparametric model for background subtraction”
in Proc. of Computer VisionECCV
Jul.
Article (CrossRef Link)
751 
767
Oliver N. M.
,
Rosario B.
,
Pentland A. P.
2000
“A Bayesian computer vision system for modeling human interactions”
IEEE Transactions on Pattern Analysis and Machine Intelligence
Article (CrossRef Link)
22
(8)
831 
843
DOI : 10.1109/34.868684
Monnet A.
,
Mittal A.
,
Paragios N.
,
Ramesh V.
2003
“Background modeling and subtraction of dynamic scenes”
in Proc. of IEEE ICCV
Oct.
Article (CrossRef Link)
1305 
1312
De La Torre F.
,
Black M. J.
2003
“A framework for robust subspace learning”
International Journal of Computer Vision
Article (CrossRef Link)
54
(13)
117 
142
DOI : 10.1023/A:1023709501986
Ke Q.
,
Kanade T.
2005
“Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming”
in Proc. of IEEE CVPR
Jun.
vol. 1, Article (CrossRef Link)
739 
746
Candès E. J.
,
Li X.
,
Ma Y.
,
Wright J.
2011
“Robust principal component analysis?”
Journal of the ACM
Article (CrossRef Link)
58
(3)
11 
DOI : 10.1145/1970392.1970395
Cevher V.
,
Sankaranarayanan A.
,
Duarte M. F.
,
Reddy D.
,
Baraniuk R. G.
,
Chellappa R.
2008
“Compressive sensing for background subtraction”
in Proc. of Computer VisionECCV
Oct.
Article (CrossRef Link)
155 
168
Cong Z.
,
Xiaogang W.
,
WaiKuen. C.
2011
“Background subtraction via robust dictionary learning”
EURASIP Journal on Image and Video Processing
Article (CrossRef Link)
Sivalingam R.
,
D'Souza A.
,
Bazakos M.
,
Miezianko R.
,
Morellas V.
,
Papanikolopoulos. N.
2011
“Dictionary learning for robust background modeling”
in Proc. of IEEE ICRA
Article (CrossRef Link)
4234 
4239
Luisier F.
,
Blu T.
,
Unser M.
2011
“Image denoising in mixed PoissonGaussian noise”
IEEE Transactions on Image Processing
Article (CrossRef Link)
20
(3)
696 
708
DOI : 10.1109/TIP.2010.2073477
Mairal J.
,
Bach F.
,
Ponce J.
,
Sapiro G.
2010
“Online learning for matrix factorization and sparse coding”
The Journal of Machine Learning Researc
Article (CrossRef Link)
11
19 
60
Aharon M.
,
Elad M.
,
Bruckstein A.
2006
“KSVD: An algorithm for designing overcomplete dictionaries for sparse representation”
IEEE Transactions on Signal Processing
Article (CrossRef Link)
54
(11)
4311 
4322
DOI : 10.1109/TSP.2006.881199
Efron B.
,
Hastie T.
,
Johnstone I.
,
Tibshirani R.
2004
“Least angle regression”
The Annals of statistics
Article (CrossRef Link)
32
(2)
407 
499
DOI : 10.1214/009053604000000067
Wang Z.
,
Bovik A. C.
,
Sheikh H. R.
,
Simoncelli E. P.
2004
“Image quality assessment: From error visibility to structural similarity”
IEEE Transactions on Image Processing
Article (CrossRef Link)
13
(4)
600 
612
DOI : 10.1109/TIP.2003.819861
Li L.
,
Huang W.
,
Gu I Y H
,
Tian Q
2004
“Statistical modeling of complex backgrounds for foreground object detection”
IEEE Transactions on Image Processing
Article (CrossRef Link)
13
(11)
1459 
1472
DOI : 10.1109/TIP.2004.836169
Jodoin P. M.
,
Mignotte M.
,
Konrad J.
2007
“Statistical background subtraction using spatial cues”
IEEE Transactions on Circuits and Systems for Video Technology
Article (CrossRef Link)
17
(12)
1758 
1763
DOI : 10.1109/TCSVT.2007.906935
Huang J.
,
Huang X.
,
Metaxas D.
2009
“Learning with dynamic group sparsity”
in Proc. of IEEE ICCV
Sep.
Article (CrossRef Link)
64 
71
Barnich O.
,
Droogenbroeck M. V.
2011
“Vibe: A universal background subtraction algorithm for video sequences”
IEEE Transactions on Image Processing
Article (CrossRef Link)
20
(6)
1709 
1724
DOI : 10.1109/TIP.2010.2101613
Wu B. F.
,
Juang J. H.
2011
“RealTime Vehicle Detector with Dynamic Segmentation and Rulebased Tracking Reasoning for Complex Traffic Conditions”
KSII Transactions on Internet & Information Systems
Article (CrossRef Link)
5
(12)
Maddalena L.
,
Petrosino A.
2008
“A selforganizing approach to background subtraction for visual surveillance applications”
IEEE Transactions on Image Processing
Article (CrossRef Link)
17
(7)
1168 
1177
DOI : 10.1109/TIP.2008.924285