The 7-Scenes dataset is a collection of tracked RGB-D camera frames. The dataset may be used for evaluation of methods for different applications such as dense tracking and mapping and relocalization techniques.
All scenes were recorded from a handheld Kinect RGB-D camera at 640x480 resolution. We use an implementation of the KinectFusion system to obtain the 'ground truth' camera tracks, and a dense 3D model. Several sequences were recorded per scene by different users, and split into distinct training and testing sequence sets. Details on how this data can be used for example for the evaluation of relocalization methods can be found in our papers listed under publications.
For each scene, we provide one zip file which contains several sequences. Each sequence is a continuous stream of tracked RGB-D camera frames. Tracking has been performed using ICP and frame-to-model alignment with respect to a dense reconstruction represented by a truncated signed distance volume.
Each sequence (seq-XX.zip) consists of 500-1000 frames. Each frame consists of three files:
Color: frame-XXXXXX.color.png (RGB, 24-bit, PNG)
Depth: frame-XXXXXX.depth.png (depth in millimeters, 16-bit, PNG, invalid depth is set to 65535).
Pose: frame-XXXXXX.pose.txt (camera-to-world, 4x4 matrix in homogeneous coordinates).
For each scene, we further provide:
For Evaluation: TrainSplit.txt / TestSplit.txt (splits used in the papers listed under publications).
TSDF Volume: The dense reconstruction used for frame-to-model alignment. Volumes are stored in MetaImage format (2-file-format with a text header and binary data). Binary data stores the signed distances for each voxel in 16-bit short. All volumes are of size 512x512x512 with varying element spacings and origin offsets. The spacings and offsets in millimeters are provided in the header file.
A screenshot of the raycasted dense reconstruction.
Please note: The RGB and depth camera have not been calibrated and we can't provide calibration parameters at the moment. The recorded frames correspond to the raw, uncalibrated camera images. In the KinectFusion pipeline we used the following default intrinsics for the depth camera: Principle point (320,240), Focal length (585,585).
If you report results based on the 7-scenes dataset, please cite at least one of the papers mentioned under publications. You may choose the paper that is more relevant to your own publication.
- Ben Glocker, Shahram Izadi, Jamie Shotton, and Antonio Criminisi, Real-Time RGB-D Camera Relocalization, in International Symposium on Mixed and Augmented Reality (ISMAR), IEEE, October 2013
- Jamie Shotton, Ben Glocker, Christopher Zach, Shahram Izadi, Antonio Criminisi, and Andrew Fitzgibbon, Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images, in Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, June 2013