European Conference on Computer Vision 2010
A High-Quality Video Denoising Algorithm based on Reliable Motion Estimation
|Although the recent advances in the sparse representations of images have achieved outstanding denosing results, removing real, structured noise in digital videos remains a challenging problem. We show the utility of reliable motion estimation to establish temporal correspondence across frames in order to achieve high-quality video denoising. In this paper, we propose an adaptive video denosing framework that integrates robust optical flow into a non-local means (NLM) framework with noise level estimation. The spatial regularization in optical flow is the key to ensure temporal coherence in removing structured noise. Furthermore, we introduce approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods. Experimental results show that our system is comparable with the state of the art in removing AWGN, and significantly outperforms the state of the art in removing real, structured noise.|
Temporal coherence is key to video denoising, as illustrated in Figure 1 and 2. The same amount of noise can be perceived differently depending on the temporal consistency. Therefore, a high-quality video dneoising algorithm should also enforce temporal consistency in the results.
|Figure 1. Which of these two videos contains more noise (please click the pictures to play the videos)? You may feel that the video on the right contains more noise, but in fact, the same amount of noise has been applied to both videos. The noise is temporally consistent for the one on the left, while the noise is indepenent for the one on the right.|
|Figure 2. The same as Figure 1, with higher noise level.|
Although the state of the art video denoising algorithms often satisfy the temporal coherence criterion in removing additive white Gaussian noise (AWGN), many real videos contain structured noise that makes it challenging to ensure temporal coherence. As shown in Figure 3, the blue channel of the image contains structured noise that can be misinterpreted as signal by many denoising algorithms. Confused by the jittering blocky noise, block matching techniques (e.g. in ) may fail to track the true motion of the objects.
|RGB noisy image||R channel||G channel||B channel|
|Figure 3. In real video denoising scenarios, images contain structural noise. For this example, the blue channel is heavily contaminated with structured noise that can be mixed with signal.|
Therefore, in contrast with , we argue that high-quality video denoising, especially when structured noise is taken into account, indeed needs reliable motion estimation. In theory, estimating motion and noise suffers from a chicken-and-egg problem, since motion should be estimated from the underlying signals after denoising, and denoising relies on the temporal correspondence from motion estimation. In practice, however, we used our robust optical flow (MATLAB/C++ code is available for download) with spatial regularization to establish reliable temporal correspondence despite noise. Because of its power, we use non-local means (NLM) as the backbone of our system. Due to the inherent search complexity of NLM, searching for similar patches is often constrained to a small neighborhood. We introduce approximate K-nearest neighbor patch matching with much lower complexity to allow for searching over the entire image for similar patches. In addition, we estimate the noise level at each frame for noise-adaptive denoising.
|Figure 4. A set of similar patches are collected from adjacent frames and from the current frame to estimate the underlying pixel values in a NonLocal Mean (NLM) manner. A key contribution of our system is to make temporally adjacent pixels share an overlapped set of similar patches so that the estimate is consistent along the motion trajectory.|
We first examine the importance of regularization in motion estimation by comparing block matching to the optical flow algorithm with spatial regularization. The motion estimation of one frame is shown in Figure 5, where motion vectors are visualized by the color scheme proposed in . Clearly, spatially independent block matching in (b) is highly affected by the presence of structured noise. On the contrary, the optical flow with spatial regularization in (c) produces a smooth, discontinuity preserving temporal motion field that corresponds to the human perception of motion, and to the known smooth character of the optical flow induced by a camera moving through this piecewise smooth planar, static scene.
|(a) Two adjacent frames from the noisy input||(b) Motion obtained by block matching||(c) Motion obtained by optical flow||(d) Flow code|
|Figure 5. Block matching (independent matching for every pixel) and optical flow (taking into account spatial smoothness) result in very different motion when structura noise is present.|
The quality of our motion estimation determines the quality of our video denoising. Because the code we downloaded from VBM3D does not allow input of frame-based noise intensities, we try two parameters σ= 20 and σ = 40 to denoise the room sequence together with our denoising algorithm, as shown in the video below. Although there is no ground truth of this video, it is clear that our system outperforms VBM3D in both smoothing regions and preserving boundaries.
Following is denoising effect of the whole sequence. Notice that our algorithm can adaptively remove the noise according to the estimated noise level.