Advanced
High-frame-rate Video Denoising for Ultra-low Illumination
High-frame-rate Video Denoising for Ultra-low Illumination
KSII Transactions on Internet and Information Systems (TIIS). 2014. Nov, 8(11): 4170-4188
Copyright © 2014, Korean Society For Internet Information
  • Received : June 27, 2014
  • Accepted : September 21, 2014
  • Published : November 30, 2014
Download
PDF
e-PUB
PubReader
PPT
Export by style
Share
Article
Author
Metrics
Cited by
TagCloud
About the Authors
Xin Tan
College of Information System and Management, National University of Defense Technology Changsha, Hunan, P. R. China, 410073
Yu Liu
College of Information System and Management, National University of Defense Technology Changsha, Hunan, P. R. China, 410073
Zheng Zhang
College of Information System and Management, National University of Defense Technology Changsha, Hunan, P. R. China, 410073
Maojun Zhang
College of Information System and Management, National University of Defense Technology Changsha, Hunan, P. R. China, 410073

Abstract
In this study, we present a denoising algorithm for high-frame-rate videos in an ultra-low illumination environment on the basis of Kalman filtering model and a new motion segmentation scheme. The Kalman filter removes temporal noise from signals by propagating error covariance statistics. Regarded as the process noise for imaging, motion is important in Kalman filtering. We propose a new motion estimation scheme that is suitable for serious noise. This scheme employs the small motion vector characteristic of high-frame-rate videos. Small changing patches are intentionally neglected because distinguishing details from large-scale noise is difficult and unimportant. Finally, a spatial bilateral filter is used to improve denoising capability in the motion area. Experiments are performed on videos with both synthetic and real noises. Results show that the proposed algorithm outperforms other state-of-the-art methods in both peak signal-to-noise ratio objective evaluation and visual quality.
Keywords
1. Introduction
A camera unavoidably suffers from noise in low illumination. A corrupted signal influences the perceptual quality and efficiency of subsequent processing tasks such as video compression and pattern recognition. Thus, denoising is an important issue in image and video processing.
A common technique to enhance image quality in low illumination in photography is to prolong the exposure time. However, motion blurring occurs in images with long exposure time, particularly when moving objects are present in the scene, such as during recording of sports competitions or vehicular collision experiments. Blurring and noise are key challenges in ensuring good video quality in low illumination. Denoising is required when the exposure time is short, whereas deblurring is needed when the exposure time is long. Deblurring is more complex and time consuming for computer vision than denoising, so the latter can be regarded as the preferable option.
A short exposure time produces high-frame-rate videos with crisp and fluid imagery. A judder is also absent unlike in a normal-frame-rate video with a short exposure time. Fig. 1 shows the comparison between normal-frame-rate and high-frame-rate videos. In this figure, two high-frame-rate cameras were used to simultaneously record a waving hand. The normal video frame rate is approximately 25 fps. However, such a rate is still unable to provide clear images of fast-moving objects. Depending on the velocity of motion, a high frame rate, such as 100 fps, 250 fps, or even higher, is necessary.
PPT Slide
Lager Image
Comparison between the 25 fps video and 250 fps video. (a) The 25fps video with a blurred waving hand. (b) The 250 fps video with a clear waving hand.
Numerous studies on video denoising have been conducted in recent decades and are currently available as references. To a certain extent, denoising methods, such as those presented in [1 - 7] , effectively work in low illumination. We take VBM3D [2] as an example. Fig. 2 shows the denoising result of VBM3D for additive white Gaussian noise (AWGN) with standard deviation σn = 30 , which represents light noise in low illumination. The result looks excellent compared with those of the ground truth frame ( Fig. 1 (b)), except for some details, such as the text on the top left corner.
PPT Slide
Lager Image
Denoising performance of VBM3D for AWGN with standard deviation σn = 30 . (a) Noisy frame whose ground truth is Fig. 1(b) (PSNR = 18.59dB). (b) Denoising result of VBM3D (PSNR = 33.57dB).
However, the performance dramatically degrades in ultra-low illumination. Fig. 3 shows the denoising result of VBM3D for a video captured in a real noisy environment in ultra-low illumination. The video is captured by a high-frame-rate sensor, i.e., Viimagic 9222B. An image in ultra-low illumination has minimal chrominance components. Hence, only the luminance part is taken. Fig. 3 depicts that VBM3D removes noise to a certain degree, but can visual quality still be improved? This task is difficult in ultra-low illumination because a useful signal is nearly submerged in noise.
PPT Slide
Lager Image
Denoising performance of VBM3D in ultra-low illumination. (a) Noisy frame. (b) Denoising result of VBM3D.
In this study, we address the video denoising problem in ultra-low illumination. The target video has a stationary background. This type of video has extensive applications, such as in ubiquitous surveillance cameras.
Our work is based on the Kalman filtering framework. The Kalman filter was proposed by Rudolph E. Kalman [8] in 1960. This set of mathematical equations provides an efficient recursive means to estimate the state of a process such that the mean squared error is minimized. The Kalman filter is widely used in denoising and other video processing tasks [9 - 12] . For video denoising, this algorithm removes noise from a signal by propagating error covariance statistics. In this study, motion is modeled as imaging process noise. Estimating motion determines the denoising capability of the Kalman filter. We employ two characteristics of noisy high-frame-rate videos to obtain reliable motion estimation. The first characteristic is the small motion vector. The motion area between two frames with a small motion vector is contained within an area with a large motion vector. However, flickering noise does not exhibit this feature. The second characteristic is the loss of small objects and details in large-scale noise in ultra-low illumination. The movement of such minute objects and details is difficult to detect. Therefore, we sacrifice them to improve the estimation on major parts, which is valuable for overall performance.
The remainder of this paper is organized as follows. Section 2 discusses related works on video denoising. Section 3 presents our Kalman filtering framework. Section 4 proposes a motion estimation scheme for ultra-low illumination. Section 5 discusses the experiments. Finally, Section 6 summarizes the study.
2. Related Work
Video denoising can exploit redundant information from nearby frames. Thus, a better denoising capability compared with single-image processing can be expected. Determining how to deal with the temporal relationship of frames is the key to video denoising.
In recent years, many algorithms have used 2D similar patch clusters to implicitly estimate motion information [2 , 4 , 13 - 15] . Similar patches have been matched across several frames in spatiotemporal space. In [13 - 15] , weighted averaging on selected patches was conducted after patch matching to denoise the reference patch. In [4] , patch matching was followed by an adaptive threshold approach, i.e., SURE-LET. In VBM3D [2] , a two-step Wiener filtering framework was used to handle similar patch clusters.
Meanwhile, some researchers [3 , 6 , 16 , 17] found that 3D patches are more appropriate for video denoising than 2D patches. The former can possibly better characterize motion-related temporal dependency than the latter. In [6] and [16] , 3D patches were used as atoms of a sparse dictionary. In [17] , a Bayesian framework was proposed to process 3D patch clusters. The basic blocks of VBM4D [3] , which is an extension of VBM3D [2] , are spatiotemporal 3D volumes that form a 4D group.
Some researchers directly conducted 3D transformations on video data without using 3D patches [5 , 18 - 21] . In [18 , 19] , two 3D complex wavelet transforms were proposed. In [5 , 20] , a 2D discrete shearlet transform [21] was extended to have a 3D version.
Motion information is implicitly represented regardless of whether 2D patches, 3D patches, or 3D domain transforms are used for video denoising. However, many researchers have also expected to explicitly estimate motion [7 , 22 - 26] . In [22] , a block-based multiple hypotheses motion estimation method was proposed. In [23] and [24] , an optical flow method was used to estimate motion. In [25] , a hierarchical motion estimation method was discussed. The basic idea of this method is to track matching blocks and filters along the motion trajectory. Although a 2D patch was used in [22] and [25] , motion information was explicitly obtained. In [26] , motion estimation was performed in the wavelet transform domain. In [7] , a cross-correlation algorithm that is robust to Fourier domain noise, called spatiotemporal Gaussian scale mixture (ST-GSM), was proposed for motion estimation. Similarly, we employed the explicit representation approach to estimate motion in the current work.
All of the aforementioned algorithms can be directly applied to high-frame-rate videos. In this study, however, we exploited several new characteristics to improve denoising performance. The details are discussed in Section 4.
3. The Kalman Filtering Framework for Video Denoising
The discrete instant of time is denoted as k . The system state of the previous time step is
PPT Slide
Lager Image
. The optional control input is u k . The system working process noise is w k . A is the state transition model that operates in system
PPT Slide
Lager Image
. B is the control input model that operates in u k . The predicted state of the current time step
PPT Slide
Lager Image
. The actual measurement
PPT Slide
Lager Image
is
PPT Slide
Lager Image
, where H is the observation model that maps the true state space into the observed space.
PPT Slide
Lager Image
is the state of the current time step. v k is the observation noise.
In video imaging systems, a camera directly records input light rays. Thus, the state transition model A = I , and the mapping observation model H = I . I is the unit matrix in both models. No control input is available for video capturing; hence, u k = 0 . Consequently, the predicted state of the video imaging system is
PPT Slide
Lager Image
, and the actual measurement is
PPT Slide
Lager Image
.
The working process noise w and the observation noise v are independent Gaussian random processes. We model w to be caused by motion and v to be caused by camera noise. Both noises are assumed to have zero mean Gaussian distribution with covariance Q and R , i.e., w ~ N (0, Q ) and v ~ N (0, R ).
The Kalman filter works in two steps: priori and updated posteriori state estimations. For the video denoising system, the former is given by
PPT Slide
Lager Image
This equation indicates that if process noise is absent (i.e., Q = 0 ), then the predicted state is equal to the system state of the previous time step. The latter updated state is expressed as follows:
PPT Slide
Lager Image
where K k is the optimal Kalman gain that minimizes the posteriori error covariance. This gain is computed as follows:
PPT Slide
Lager Image
where
PPT Slide
Lager Image
is the predicted priori estimate covariance. This covariance is computed as follows:
PPT Slide
Lager Image
where P k -1 is the posteriori estimate covariance of the previous time step. The updated posteriori estimate covariance of the current time step is expressed as follows:
PPT Slide
Lager Image
If the noise covariance of a single image pixel is
PPT Slide
Lager Image
, then R can be written as
PPT Slide
Lager Image
for the 2D image matrix. The covariance Q of the imaging process noise w is estimated as
PPT Slide
Lager Image
is the motion-caused deviation between the current frame k and the last frame k – 1. The computation of this deviation is further discussed in Section 4.
Regarding initialization, the measurement of the first noisy frame z 1 functions as the posteriori state estimate, i.e.,
PPT Slide
Lager Image
. Kalman filtering starts at the second frame. The posteriori estimate covariance of the first frame is set as P 1 = R .
Moreover, the large motion estimation
PPT Slide
Lager Image
results in the large value of the Kalman gain K k in the motion area, according to Eqs. (3) and (4). In extreme cases, such as when K k = I , the updated posteriori state
PPT Slide
Lager Image
based on Eq. (2). Thus, spatial denoising is required for the motion area to improve denoising performance. In our work, the classical edge preserving filter, i.e., the bilateral filter [27] , is employed. It can be calculated as
PPT Slide
Lager Image
where
PPT Slide
Lager Image
is the spatial bilateral denoising result of frame k at spatial coordinates i , and zs , zi are the pixel values of noisy frame z k in positions s and i . N ( i ) is a (2 r + 1) x (2 r + 1) block centered at i , and s is the coordinate of the block N ( i ) , i.e., su = [ iu - r , iu + r ] in horizontal direction u and sv = [ iv - r , iv + r ] in vertical direction v . Two Gaussian filter kernels are used for the bilateral filter. The first, which is the spatial distance kernel Gs (·) with standard deviation σs , is the same as the Gaussian filter. The second, which is the pixel intensity difference kernel GI (·) with standard deviation σI , is used to preserve edges. A large intensity difference ║ zs - zi ║ results in a small weight GI . Thus, pixels on different sides can be distinguished.
The spatial denoising result is denoted as
PPT Slide
Lager Image
. Kalman temporal denoising
PPT Slide
Lager Image
and bilateral spatial denoising
PPT Slide
Lager Image
are mixed by weighted averaging. As the Kalman gain K k approaches zero, the reliability of the actual measurement z k decreases, whereas trust on the predicted estimate
PPT Slide
Lager Image
increases. Accordingly, we use the Kalman gain K k , which reflects motion degree, as the weight. Thus, the final denoising result is
PPT Slide
Lager Image
For the predicted priori state estimate, i.e., Eq. (1), this equation is revised as
PPT Slide
Lager Image
.
Fig. 4 shows the Kalman filtering performance without motion estimation ( Q = 0 ). The video is the same one in Fig. 3 . A total of 350 frames are used. The noise standard deviation σn is set to 100. In the figure, the still background is clearer than that of VBM3D ( Fig. 3 (b)). Our Kalman filtering framework is suitable for videos with a fixed background because it employs all frames by propagating the estimate error covariance P k . The estimation of the original signal is improved when the number of frames is increased. By contrast, only a few adjacent frames are employed in VBM3D. Thus, a good estimation at the background is obtained.
PPT Slide
Lager Image
Denoising performance of the Kalman filtering framework without motion estimation
However, the blurring occurs without motion estimation. Nearly no moving objects can be observed in Fig. 4 , but in fact, a moving man at the left of the visual field can be seen in Fig. 3 (b). This blurring phenomenon indicates the importance of motion estimation.
4. Motion Estimation for High-frame-rate Videos in Ultra-low Illumination
Our motion estimation method is based on frame difference. In ultra-low illumination, the noise is extremely large, such that the direct frame difference cannot provide a good estimation. This problem significantly influences motion estimation [ Fig. 5 (e) and 6 (b)]. A Gaussian filter is employed to preprocess a noisy frame. In Fig. 5 (f) and 6 (c), a large Gaussian kernel with a window that measures 20 x 20 and a standard deviation σG = 5 is used to suppress noise.
PPT Slide
Lager Image
Difference between two successive frames ( k – 1 and k ). (a) Noise-free frames. (b) Noisy frames of (a) polluted by AWGN with standard deviation σn = 50 . (c) Gaussian prefiltered frames of (b) with a 20×20 Gaussian kernel ( σG = 5 ). (d) Frame difference of (a) that functions as the ground truth. (e) Frame difference of (b). (f) Frame difference of (c)
PPT Slide
Lager Image
Difference between two spaced frames (frames k – 5 and k ). (a) Frame difference of noise-free frames. (b) Frame difference of noisy frames. (c) Frame difference of Gaussian prefiltered frames.
Although the influence of noise is decreased to a considerable extent by the large Gaussian kernel, several undesired patches still affect reliable motion estimation. We further improve our method by applying the small motion vector characteristic and intentionally sacrificing small changing patches.
Before the work, the definition of the small motion vector is given. Suppose the length of the object along the motion direction is L , and the movement distance between two successive frames is M . Both variables are measured by the number of pixels. Let M = αL . We define the video to have a small motion vector characteristic when α ≤ 0.5 . The interframe motion distance is less than half of the object size; this is a rough definition. In reality, several motion objects with different lengths and velocities may simultaneously exist in a scene. In such a case, a user often concentrates on only one moving object within a period of time. We call this movement of the most concerned object as the main movement. To simplify this problem, we consider only the main movement for evaluation. Under this condition, the selection of the main movement is objective. For example, suppose a slow walking man and a rushing car appear in the same scene. The movement of the man satisfies α ≤ 0.5, whereas the movement of the car does not. If the man is highlighted, the video can be considered as owning a small motion vector characteristic. If the car is selected, the video is thought to not have such a characteristic. When α ≤ 0.1 , we can assume that the small motion vector characteristic is obvious.
- Small motion vector of high-frame-rate videos
Motion estimation becomes more difficult for a small motion vector than for a large motion vector with large-scale noise, as shown in Fig. 5 . The video in the figure recorded a moving man at 250 fps. Nearly no motion information can be obtained from the difference of the noisy frames [ Fig. 5 (e)]. Even the Gaussian prefilter loses its effect [ Fig. 5 (f)]. However, a large motion vector is easier to detect than a small one ( Fig. 6 ). Frame k in Fig. 6 is also frame k in Fig. 5 . The interval is five frames in Fig. 6 .
Fig. 5 and Fig. 6 also show another feature of high-frame-rate videos, i.e., based on the same reference frame, the motion area between two close frames is nearly contained between two relatively far frames. This feature results from a small motion vector. Close frames imply a small motion area. Fig. 7 further illustrates the aforementioned feature with several typical motion modes. The major part of the motion area conforms to this feature. Only some minor parts at the edge of the motion area (in red) violate this rule. When the frame rate is high, this containment rule is obviously reliable. However, the flickering noise does not satisfy this rule.
PPT Slide
Lager Image
Motion area containment of several typical motion modes. Only the small red regions violate the rule. (a) Translation. (b) Rotation. (c) Scale variation.
- Neglecting small changing patches in ultra-low illumination
Most details and small objects are submerged in noise in ultra-low illumination. From the perspective of frequency-domain analysis, these details and small objects are the high-frequency part of the image. Noise also exists in the high-frequency part of the image, and, thus, these elements overlap with one another. The only object that can be easily recognized in ultra-low illumination is the main structure of the image, which is located at the low-frequency part. From the perspective of principal components analysis, the main structure is the principal component of the image, which represents the main feature of such an image. When the main structure changes, it can be easily perceived. When the details change, however, it is difficult to be detected because the change caused by noise seriously interferes with the change caused by motion. This reason also explains why motion estimation is difficult for the small motion vector in Fig. 5 .
In this study, we do not detect small motions, and we neglect small changing patches. To provide an extreme example, detecting a fly in ultra-low illumination is senseless. Our objective is to sacrifice the minor part to improve the major part. Thus, the red regions in Fig. 7 that violate the containment rule can also be ignored.
Our motion estimation scheme is based on the two aforementioned assumptions. Supposing that the last denoised frame is x k -1 , then the current frame is z k . The future adjacent frames z k+i ( i = 1,2,⋯) are used to help in motion segmentation. A large motion area of N 1 frames is initially segmented with extra N 2 frames. Segmentation relies on the containment rule of small motion vectors. Then, motion estimation for each of the two successive images within N 1 frames is performed in the segmented area. Subsequently, large motion area segmentation is conducted between frames k – 1 + N 1 and k – 1 + 2 N 1 , as shown in Fig. 8 .
PPT Slide
Lager Image
Extra N2 frames help the large motion area segmentation of N1 frames.
The motion area is initially segmented as a black-and-white map by hard thresholding as follows:
PPT Slide
Lager Image
where i = N 1 , N 1 + 1,⋯, N 1 + N 2 ; G (·) is the Gaussian prefilter operation, and t is the threshold. The intensity difference below t can be ignored. For the intensity range 0 to 255, our threshold t is fixed as 5 because the human eye nearly cannot discriminate an intensity difference below 5.
The refinement of the motion area between k – 1 and k – 1 + N 1 is obtained with an AND operation among other N 2 adjacent motion segmentation areas according to the containment rule of small motion vectors as follows:
PPT Slide
Lager Image
where i = N 1 , N 1 + 1,⋯, N 1 + N 2 .
Several small changing patches still exist after the AND operation. We ignore these remaining patches by
PPT Slide
Lager Image
where the connected region of
PPT Slide
Lager Image
is obtained with a ConnRegion(·) operation. The area of each connected regions is then calculated. The area is measured by the number of pixels in a connected region. If the area is smaller than S , then the region can be set as 0 to neglect it.
We choose the classical connected region detection algorithm in the book [28] for the ConnRegion(·) operation. The main process is as follows. The algorithm is based on 8-connectivity. At first, the black-and-white binary image, whose pixel value is 0 or 1, is scanned pixel by pixel from left to right and top to bottom. Let x (m,n) ∈ {0,1} denote the current processing pixel, with the position denoted as ( m , n ). Inspect the neighbor left, top left, top, top right pixels of the current white pixel ( x = 1). If all neighbor pixel values are 0, a new label is assigned to the pixel at ( m , n ) to form a label map L . If one or more pixel values are not-zero, the least label value of these nonzero pixels is assigned to L (m,n) , and the other label values of neighboring non-zero pixels are recorded as equivalent. After the scan, the equivalent relations among labels are searched according to reflexivity, symmetry, and transitivity, and then the equivalent labels are modified to the least one. For example, if label value 1 is equivalent to label value 2, label value 2 is equivalent to label value 6, label value 1 is equivalent to label value 6, and all labels are set to value 1. After the modification, the labels are reassigned from small to large with the use of a natural number index. Finally, the original label map is replaced by the new labels. The pixels that have the same label are in the same connected region.
Fig. 9 shows the process of our motion segmentation method. The video in Fig. 9 is the same video in Fig. 5 and Fig. 6 . The large motion area contains five frames ( N 1 = 5 ), and another four frames are used for segmentation ( N 2 = 4 ). N 1 and N 2 are selected according to the motion vector. In our work, N 1 + N 2 ≤ 1/ α needs to be satisfied. In the definition of the small motion vector, α ≤ 0.5. Thus, N 1 + N 2 ≤ 2. N 1 and N 2 should at least be equal to 1. If N 1 is too small, detecting the motion area for the small motion vector characteristic of high-frame-rate videos is difficult, such as in Fig. 5 . If N 2 is too small, it cannot effectively suppress noise influence. We choose N 1 and N 2 as
PPT Slide
Lager Image
Motion segmentation process
PPT Slide
Lager Image
If the velocity is low or the frame rate is high, then a small motion vector is obtained. A large N 1 and N 2 can be used to improve the accuracy of motion segmentation results. The small patch threshold S is set to 100, which is approximately a 10 x 10 patch that is in line with the noise scale.
Finally, motion estimation for each two successive images within N 1 frames is calculated as follows:
PPT Slide
Lager Image
where j = 0,1,⋯, N 1 -1.
5. Experiments and Analysis
The performance of the proposed algorithm is evaluated in this section. The experiments are divided into two parts. The first part is on synthetic noisy videos, whereas the second part is on real noisy videos captured in ultra-low illumination. The experiments are conducted in the luminance channel of the video because minimal color can be captured in ultra-low illumination. Three state-of-the-art video denoising methods are chosen for comparison. These methods are the 2D block matching method VBM3D [2] , the 3D domain transformation method 3D shearlets [5] , and the explicit motion estimation method ST-GSM [7] . The toolbox of these methods can be downloaded from the websites of the authors [29 , 30 , 31] . The objective criterion peak signal-to-noise ratio (PSNR) is employed to provide quantitative evaluation, which is defined as
PPT Slide
Lager Image
where L is the dynamic range of the image. In our experiments, L equals 255 for 8-bit images. MSE is the mean squared error between the original and the corrupted or denoised images. In our experiments, a PSNR score is computed for each frame in the noisy or restored video. The final PSNR for a video is the average of the score of each frame. The denoised pixel intensity range is 0 to 255.
- 5.1 Synthetic video
The noise-free videos, which function as ground truth, are captured by our high-frame-rate camera (JVC GC-P100). The frame rate is 250 fps, the frame resolution is 640×360, and the duration is 250 frames. Three videos are captured, namely, (1) a man moving from right to left (MovR2L), (2) a man moving from far to near (MovF2N), and (3) a waving hand (Waving). The noise is AWGN with standard deviation σn . Three noise levels are chosen to simulate large-scale noise in ultra-low illumination.
A common parameter among all methods, including ours, is the noise level estimator. In our experiments, this parameter is equal to σn . The parameters of the other algorithms are set as their default values. For our algorithm, the Gaussian prefilter is a 20×20 kernel with σG = 5 . The intensity threshold t = 5 , and the small patch threshold S = 100 . N 1 = 6 and N 2 = 5 frames are used for motion segmentation. The spatial distance Gaussian kernel of the spatial bilateral filter is 5×5 with a standard deviation of 3. The standard deviation of the pixel intensity difference kernel is 5.
The PSNR comparison is shown in Table 1 . It indicates that the proposed algorithm is superior to the other methods in the overall evaluation. The processing time is shown in Table 2 . The configuration of the computer is Intel Core i5-2430M CPU with 2.40 GHz speed, 8 GB RAM, and a 64-bit Windows 7 OS. Table 2 illustrates that the runtime of our algorithm is faster than that of the other methods.
PSNR (dB) comparison for three large noise levels.
PPT Slide
Lager Image
PSNR (dB) comparison for three large noise levels.
Processing time (seconds) comparison for three large noise levels.
PPT Slide
Lager Image
Processing time (seconds) comparison for three large noise levels.
In detail, the per-frame PSNR is shown in Fig. 10 . Our algorithm outperforms the other methods after an initialization of approximately 50 frames. The advantage of our method can be attributed to the excellent denoising performance on the background. Even the outline of a bunch of balloons, which is shown in the enlarged patch in Fig. 11 , is restored. Our robust motion segmentation method ensures good temporal denoising of the Kalman filter. Several segmentation examples are provided in Fig. 12 .
PPT Slide
Lager Image
PSNR comparison for the synthetic videos. (a1) to (a3) MovR2L, MovF2N, and Waving with σn = 80 . (b1) to (b3) MovR2L, MovF2N, and Waving with σn = 100 . (c1) to (c3) MovR2L, MovF2N, and Waving with σn = 120 .
PPT Slide
Lager Image
Denoising result for frame 250 of the “MovR2L” video. (a) Noisy input. (b) Ground truth. (c) VBM3D. (d) ST-GSM. (e) 3D shearlets. (f) Proposed method.
PPT Slide
Lager Image
Several examples of our motion segmentation method. The first row is from MovR2L video processing, the second row is from MovF2N video processing, and the third row is from Waving video.
One shortcoming of our algorithm lies in the motion area because only a few frames can be employed for Kalman filtering. Using only the spatial bilateral filter is insufficient to obtain a good result. When the object moves fast or when the frame rate is low, the motion vector or motion area is large. Consequently, the denoising performance of our algorithm is poor. The moving leg in Fig. 13 has a small motion vector. Thus, the influence of noise is not obvious. In Fig. 14 (a), the hand stops for an instant when it is turning back, so the motion vector becomes zero, and a good denoising result is achieved. However, the influence of noise appears in a relatively large motion area when the hand is moving, as shown in Fig. 14 (b). The peaks in Fig. 10 (a3), (b3), and (c3) result from this instant stop-and-move switch. In the extreme case, when the entire visual field is moving, the algorithm degrades to spatial bilateral filtering.
PPT Slide
Lager Image
Denoising result for frame 215 of the “MovF2N” video. (a) Noisy input. (b) Ground truth. (c) VBM3D. (d) ST-GSM. (e) 3D shearlets. (f) Proposed method.
PPT Slide
Lager Image
Denoising result of our algorithm for the “Waving” video. (a) Frame 150. (b) Frame 170.
- 5.2 Real video
We also test our algorithm on real videos captured in ultra-low illumination by using a high-frame-rate sensor (Viimagic 9222B). The frame rate is 240 fps, and the frame resolution is 720 × 480. The illumination during video capture is approximately 0.01lux. We increase the global gain of the sensor to obtain a bright image. Three videos are also used for the experiments, namely, (1) a man moving from left to right (MovL2R), (2) a man moving from far to near (MovF2N), and (3) a waving hand (Waving). The MovL2R video is the same one in Fig. 3 .
The noise estimator for all methods is set as 100. The settings of the other parameters are the same as those in Subsection 5.1. A sample denoising result is presented in Fig. 15 . Our algorithm also shows the clearest background and no artifacts, and the moving objects are maintained too.
PPT Slide
Lager Image
Denoising results for videos with real noise captured under an illumination of 0.01 lux. (a1) to (a5) MoVL2R video. (b1) to (b5) MovF2N video. (c1) to (c5) Waving video.
The experiments prove that our algorithm is suitable for videos with a still background and a small motion vector. The experiments also demonstrate that neglecting small changing patches in videos with serious noise is valuable. Through this procedure, good motion estimation is obtained, and the main bodies of moving objects are maintained.
6. Conclusion
In this study, we propose a denoising algorithm for high-frame-rate videos in ultra-low illumination. Kalman filtering is used as a temporal denoising method. Imaging process noise is modeled as a result of motion, whereas observation noise is caused by the camera. The bilateral filter is used to help denoising in the motion area. Kalman temporal denoising and bilateral spatial denoising are combined by the Kalman gain, which indicates motion degree. Kalman filtering can provide good estimation for the background. However, the motion would be blurred if no motion estimation method is implemented. Reliable motion estimation is the key to an effective Kalman filtering framework.
We exploit two features of high-frame-rate videos in ultra-low illumination for motion segmentation. The first feature is the small motion vector of moving objects in high-frame-rate videos. Thus, the motion area between two close frames is contained within the motion area between two relatively far frames ( Fig. 7 ). This containment rule is valid when the motion vector is small. The second feature is reasonably neglecting small changing patches in ultra-low illumination because detail changes are lost in noise, and only the main structure of the image is detected. A robust motion estimation method for high-frame-rate videos in ultra-low illumination is derived by application of the aforementioned features. A large motion area between two spaced frames is initially segmented with the use of several extra frames under the containment rule and through neglect of small patches [Eqs. (8) and (9)]. Then, the motion of two successive images is estimated in the segmented areas [Eq. (10)].
The experiments on videos with synthetic and real noises demonstrate that our algorithm performs better than other state-of-the-art methods, particularly at the background. Regarding moving objects, the main body is maintained because of our effective motion segmentation scheme, although the denoising performance is not as good as that in the background. When the motion vector is small, the motion area is also small, and the denoising performance is improved. A small motion vector can be achieved when objects are captured at a normal velocity with the use of a high-frame-rate camera. Our algorithm is suitable for denoising high-frame-rate videos in ultra-low illumination.
BIO
Xin Tan received his B.S. degree in Electronics Information from Sichuan University in 2008, and M.S. degrees in Systems Engineering at National University of Defense Technology (NUDT) at 2010.
He is currently working towards the Ph. D. degree at College of Information System and Management at NUDT. His research interests focus on image/video processing and multimedia information systems.
Yu Liu received his B.S. from Northwestern Polytechnical University, Xi’an, China in 2005. He received his MSc in image processing and PhD in computer graphics from the University of East Anglia, Norwich, United Kingdom, in 2007 and 2011, respectively.
He is currently a lecturer in the College of Information System and Management, National University of Defense Technology. His research interests include image/video processing, computer graphics, and visual-haptic technology.
Zheng Zhang was born in Hunan, China, in 1983. He received his M.E. degree from the National University of Defense Technology, China, in 2007,and PhD in Computer Engineering from Nanyang Technological University (Singapore) in 2012. He is currently working at National University of Defense Technology. His research interests include wide-area video surveillance and analysis, computational photography, and human motion analysis.
Maojun Zhang received his B.S. and Ph.D. degrees in Systems Engineering at National University of Defense Technology (NUDT) at 1992 and 1997, respectively.
From 1997 to 2003, he was an Assistant Professor with Department of Systems Engineering, NUDT. Since 2003, He has been a Professor of College of Information System and Management at NUDT, Changsha, China. His major interests include computer vision, image/video processing, multimedia information systems, and virtual reality technology.
References
Dabov K. , Foi A. , Katkovnik V. , Egiazarian K. 2007 “Image denoising by sparse 3D transform-domain collaborative filtering,” IEEE transactions on Image Processing 16 (8) 2080 - 2095    DOI : 10.1109/TIP.2007.901238
Dabov K. , Foi A. , Egiazarian K. “Video denoising by sparse 3-D transform-domain collaborative filtering,” in Proc. of 15th European Signal Processing Conf. September 3-7, 2007 145 - 149
Maggioni M. , Boracchi G. , Foi A. , Egiazarian K. 2012 “Video denoising, deblocking and enhancement through separable 4-D nonlocal spatiotemporal transforms,” IEEE transactions on Image Processing 21 (9) 3952 - 3966    DOI : 10.1109/TIP.2012.2199324
Luiser F. , Blu T. , Unser M. 2010 “SURE-LET for orthonormal wavelet-domain video denoising,” IEEE Transactions on Circuits and Systems for Video Technology 20 (6) 913 - 919    DOI : 10.1109/TCSVT.2010.2045819
Negi P. S. , Labate D. 2012 “3-D discrete shearlet transform and video processing,” IEEE transactions on Image Processing 21 (6) 2944 - 2954    DOI : 10.1109/TIP.2012.2183883
Protter M. , Elad M. 2009 “Image sequence denoising via sparse and redundant representations,” IEEE transactions on Image Processing 18 (1) 27 - 35    DOI : 10.1109/TIP.2008.2008065
Varghese G. , Wang Z. 2010 “Video denoising based on a spatiotemporal Gaussian scale mixture model,” IEEE Transactions on Circuits and Systems for Video Technology 20 (7) 1032 - 1040    DOI : 10.1109/TCSVT.2010.2051366
Kalman R. E. 1960 “A new approach to linear filtering and prediction problems,” Journal of Basic Engineering 82 (1) 35 - 45    DOI : 10.1115/1.3662552
Kim M. , Park D. , Han D. K. , Ko H. “A novel framework for extremely low-light video enhancement,” in Proc. of 2014 IEEE Int. Conf. Consumer Electronics January 10-13, 2014 91 - 92
Conte F. , Germani A. , Iannello G. 2013 “A kalman filter approach for denoising and deblurring 3-d microscopy images,” IEEE transactions on Image Processing 22 (12) 5306 - 5321    DOI : 10.1109/TIP.2013.2284873
Biloslavo M. , Ramponi G. , Olivieri S. , Albani L. “Joint kalman-based noise filtering and motion compensated video coding for low bit rate videoconferencing,” in Proc. of 2000 Int. Conf. Image Processing September 10-13, 2000 vol. 1 992 - 995
Dugad R. , Ahuja N. “Video denoising by combing Kalman and wiener estimates,” in Proc. 1999 Int. Conf. Image Processing October 24-28, 1999 vol. 4 152 - 156
Mahmoudi M. , Sapiro G. 2005 “Fast image and video denoising via non-local means of similar neighborhoods,” IEEE Signal Processing Letters 12 (12) 839 - 842    DOI : 10.1109/LSP.2005.859509
Buades A. , Coll B. , Morel J. 2008 “Nonlocal image and movie denoising,” International Journal of Computer Vision 76 (2) 123 - 139    DOI : 10.1007/s11263-007-0052-1
Han Y. , Chen R. 2012 “Efficient video denoising based on dynamic nonlocal means,” Image and Vision Computing 30 (2) 78 - 85    DOI : 10.1016/j.imavis.2012.01.002
Kuang Y. , Zhang L. , Yi Z. 2014 “An adaptive rank-sparsity k-svd algorithm for image sequence denoising,” Pattern Recognition Letters 45 46 - 54    DOI : 10.1016/j.patrec.2014.03.003
Li X. , Zheng Y. 2009 “Patch-based video processing: a variational Bayesian approach,” IEEE Transactions on Circuits and Systems for Video Technology 19 (1) 27 - 40    DOI : 10.1109/TCSVT.2008.2005805
Selesnick I. W. , Li K. Y. “Video denoising using 2d and 3d dual-tree complex wavelet transforms,” in Proc. of SPIE 5207, Wavelets: Applications in Signal and Image Processing November 14, 2003 607 - 618
Rabbani H. , Gazor S. 2012 “Video denoising in three-dimensional complex wavelet domain using a doubly stochastic modelling,” IET image processing 6 (9) 1262 - 1274    DOI : 10.1049/iet-ipr.2012.0017
Labate D. , Negi P. S. “3D discrete shearlet transform and video denoising,” in Proc. of SPIE 8138, Wavelets and Sparsity XIV September, 2011 81381Y - 81381Y-11
Easley G. , Labate D. , Lim W. 2008 “Sparse directional image representations using the discrete shearlet transform,” Applied and Computational Harmonic Analysis 25 (1) 25 - 46    DOI : 10.1016/j.acha.2007.09.003
Tan H. , Tian F. , Qiu Y. , Wang S. , Zhang J. 2010 “Multihypothesis recursive video denoising based on separation of motion state,” IET Image Processing 4 (4) 261 - 268    DOI : 10.1049/iet-ipr.2009.0279
Liu C. , Freeman W. T. “A high-quality video denoising algorithm based on reliable motion estimation,” in Proc. of 2010 European Conf. Computer Vision September 5-11, 2010 vol. 6313 706 - 719
Portz T. , Zhang L. , Jiang H. “High-quality video denoising for motion-based exposure control,” in Proc. of 2011 IEEE Int. Conf. Computer Vision Workshops November 6-13, 2011 9 - 16
Cong Z. , Gao Z. , Zhang X. “A practical video denoising method based on hierarchical motion estimation,” in Proc. of 2013 IEEE Int. Symposium on Broadband Multimedia Systems and Broadcasting June 5-7, 2013 1 - 5
Zlokolica V. , Pizurica A. , Philips W. 2006 “Wavelet-domain video denoising based on reliability measures,” IEEE Transactions on Circuits and Systems for Video Technology 16 (8) 993 - 1007    DOI : 10.1109/TCSVT.2006.879994
Tomasi C. , Manduchi R. “Bilateral filtering for gray and color images,” in Proc. of 1998 6th Int. Conf. on Computer Vision January 4-7, 1998 839 - 846
Gonzales R. C. , Woods R. E. 2002 Digital Image Processing 2nd Edition Prentice Hall New Jersey
VBM3D denoising toolbox [Onlline] http://www.cs.tut.fi/~foi/GCF-BM3D/
ST-GSM denoising toolbox [Onlline] https://ece.uwaterloo.ca/~z70wang/research/stgsm/
Shearlet denoising toolbox [Online] http://www.math.uh.edu/~dlabate/software.html