European Conference on Computer Vision 2010

A High-Quality Video Denoising Algorithm based on Reliable Motion Estimation

Ce Liu1   William T. Freeman1,2

1Microsoft Research New England     2Massachusetts Institute of Techonolgy

Download the PDF


Although the recent advances in the sparse representations of images have achieved outstanding denosing results, removing real, structured noise in digital videos remains a challenging problem. We show the utility of reliable motion estimation to establish temporal correspondence across frames in order to achieve high-quality video denoising. In this paper, we propose an adaptive video denosing framework that integrates robust optical flow into a non-local means (NLM) framework with noise level estimation. The spatial regularization in optical flow is the key to ensure temporal coherence in removing structured noise. Furthermore, we introduce approximate K-nearest neighbor matching to significantly reduce the complexity of classical NLM methods. Experimental results show that our system is comparable with the state of the art in removing AWGN, and significantly outperforms the state of the art in removing real, structured noise.         


Temporal coherence is key to video denoising, as illustrated in Figure 1 and 2. The same amount of noise can be perceived differently depending on the temporal consistency. Therefore, a high-quality video dneoising algorithm should also enforce temporal consistency in the results.

Figure 1. Which of these two videos contains more noise (please click the pictures to play the videos)? You may feel that the video on the right contains more noise, but in fact, the same amount of noise has been applied to both videos. The noise is temporally consistent for the one on the left, while the noise is indepenent for the one on the right.
Figure 2. The same as Figure 1, with higher noise level.

Although the state of the art video denoising algorithms often satisfy the temporal coherence criterion in removing additive white Gaussian noise (AWGN), many real videos contain structured noise that makes it challenging to ensure temporal coherence. As shown in Figure 3, the blue channel of the image contains structured noise that can be misinterpreted as signal by many denoising algorithms. Confused by the jittering blocky noise, block matching techniques (e.g. in [4]) may fail to track the true motion of the objects.

RGB noisy image R channel G channel B channel
Figure 3. In real video denoising scenarios, images contain structural noise. For this example, the blue channel is heavily contaminated with structured noise that can be mixed with signal.

Our system

Therefore, in contrast with [3], we argue that high-quality video denoising, especially when structured noise is taken into account, indeed needs reliable motion estimation. In theory, estimating motion and noise suffers from a chicken-and-egg problem, since motion should be estimated from the underlying signals after denoising, and denoising relies on the temporal correspondence from motion estimation. In practice, however, we used our robust optical flow (MATLAB/C++ code is available for download) with spatial regularization to establish reliable temporal correspondence despite noise. Because of its power, we use non-local means (NLM) as the backbone of our system. Due to the inherent search complexity of NLM, searching for similar patches is often constrained to a small neighborhood. We introduce approximate K-nearest neighbor patch matching with much lower complexity to allow for searching over the entire image for similar patches. In addition, we estimate the noise level at each frame for noise-adaptive denoising.

Figure 4. A set of similar patches are collected from adjacent frames and from the current frame to estimate the underlying pixel values in a NonLocal Mean (NLM) manner. A key contribution of our system is to make temporally adjacent pixels share an overlapped set of similar patches so that the estimate is consistent along the motion trajectory.

Experimental results

We first examine the importance of regularization in motion estimation by comparing block matching to the optical flow algorithm with spatial regularization. The motion estimation of one frame is shown in Figure 5, where motion vectors are visualized by the color scheme proposed in [1]. Clearly, spatially independent block matching in (b) is highly affected by the presence of structured noise. On the contrary, the optical flow with spatial regularization in (c) produces a smooth, discontinuity preserving temporal motion field that corresponds to the human perception of motion, and to the known smooth character of the optical flow induced by a camera moving through this piecewise smooth planar, static scene.

(a) Two adjacent frames from the noisy input (b) Motion obtained by block matching (c) Motion obtained by optical flow (d) Flow code
Figure 5. Block matching (independent matching for every pixel) and optical flow (taking into account spatial smoothness) result in very different motion when structura noise is present.

The quality of our motion estimation determines the quality of our video denoising. Because the code we downloaded from VBM3D does not allow input of frame-based noise intensities, we try two parameters σ= 20 and σ = 40 to denoise the room sequence together with our denoising algorithm, as shown in the video below. Although there is no ground truth of this video, it is clear that our system outperforms VBM3D in both smoothing regions and preserving boundaries.

Get Microsoft Silverlight
Get Microsoft Silverlight

Following is denoising effect of the whole sequence. Notice that our algorithm can adaptively remove the noise according to the estimated noise level.

Get Microsoft Silverlight
Get Microsoft Silverlight

Average PSNR over the video sequence has been used to measure video denoising qualities, but temporal coherence was not included in the quality assessment. We feel thattemporal coherence is indeed vital to evaluate video denoising algorithms. For this purpose, we use our motation annotation [5] toolbox to obtain the ground-truth motion of the "room" sequence. Using the annotated motion we can analyze how pixel intensities change over time for different denoising algorithms. Two exemplar motion paths are plotted in Figure 6. Clearly, our system has overall the least temporal fluctuation, which we feel is crucial for visual quality.

Figure 6. Temporal smoothness of different denosing algorithms.We measure pixel intensities along motion paths over frames. Two motion paths are shown here. Our system (red curve) has the least amount of temporal fluctuation.

Removing realistic video noise has broad applications. For example, we can turn a noisy HD home video to a high-quality, noise-free video (below), which can be pleasantly played on an HDTV.

Get Microsoft Silverlight
Get Microsoft Silverlight

We also compare our algorithm with VBM3D on this babay sequence, as shown in the video below.


[1] Baker, S., Scharstein, D., Lewis, J.P., Roth, S., Black, M.J., Szeliski, R.: A database and evaluation methodology for optical flow. In: Proc. ICCV. (2007)
[2] A. Bruhn, J.Weickert and C. Schn¨orr. Lucas/Kanade meets Horn/Schunk: combining local and global optical flow methods. International Journal of Computer Vision (IJCV), 61(3):211–231, 2005.
[3] A. Buades, B. Coll and J.M. Morel. Nonlocal image and movie denoising. International Journal of Computer Vision (IJCV) 76 (2008) 123–139
[4] K. Dabov, A. Foi and K. Egiazarian. Video dneoising by sparse 3D transform-domain collaborative filtering. In European Signal Processing Conference (EUSIPCO), 2007
[5] C. Liu, W. T. Freeman, Y. Weiss and E. H. Adelson. Human-Assisted Motion Annotation. CVPR 2008.

Last update: Sep, 2010