Difference between Microsoft's and Instagram's Hyperlapse

Many people have asked us about the relationship of our Hyperlapse tech demo presented at SIGGRAPH and Instagram's recent Hyperlapse app for the iPhone. These two tech­nol­o­gies were developed concurrently, we did not know anything about their work until a few days ago. We like the beautifully simple user interface in their app and the clever idea to use the de­vice's gyroscope, which makes real-time stabilization on a phone possible.

There are some fundamental differences between their and our technology, though, which we want to explain here.

Instagram's Hyperlapse is similar to existing video stabilization algorithms in that it warps each video frame in order to remove slight camera shake. Unlike Adobe After Effects or the Youtube video stabilizer it does not rely on image analysis but rather the camera's built-in gyroscope to determine the necessary amount of rotation for each frame. To avoid visible out-of-frame regions it zooms into the video to leave some buffer area for cropping.

This works well for sequences with only a little bit of motion, such as walking carefully around an object or filming out of a plane window. However, in less controlled situations, for example with a wearable camera, it breaks down. To see why, consider this hiking video:


Every frame in the left video was generated by warping just a single input frame. As you see, there are lots of out-of-frame pixels visible. So, existing methods would have to either sta­bi­lize less to follow the camera motion, or crop to a tiny common area in the center.

Our method is fundamentally different from previous approaches. It reconstructs a full 3D camera path and world model. This enables smoothing the camera path in space-time and generating an output video with a constant-speed camera, skipping over 'slow' parts of the input video, such as waiting times in front of red lights. Just as importantly, our method can fill the missing regions in the video above by stitching together pixels from multiple input frames. Thanks to these two innovations we can handle much 'wilder' input videos, such as climbing or riding.

For more details about our algorithm check out this technical explanation video: