Image Analysis Lab :: Change Detection Dataset

Summary

This dataset contains the image sequences of city streets captured by a vehicle-mounted camera at two different time points. We make them publicly available for the researchers who are interested in the problem of the image-based detection of temporal changes of 3D scene structures. Although we own its copyright, you can freely use it for research purposes. We request that you cite the following paper if you publish research results utilizing these data:

Ken Sakurada, Takayuki Okatani, Koichiro Deguchi, Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-mounted Camera, Proc. Computer Vision and Pattern Recognition, 2013. [pdf] [web]

Download

Click here to download the dataset (about 500MBytes).

Click here to download additional data (the data which we fed to PMVS2 to obtain some of the results shown in our CVPR paper, about 900MBytes).

Click here to download the point cloud data (the point clouds which were reconstructed using PMVS2 to obtain some of the results shown in our CVPR paper, about 70MBytes).

Kamaishi

	Panoramic (5000x2500 pixels)	Perspective image (640x480 pixels)
	Panoramic (5000x2500 pixels)	Left	Right
April 2011 56x3=168 images
July 2011 55x3=165 images

Takata

	Panoramic (5000x2500 pixels)	Perspective image (640x480 pixels)
	Panoramic (5000x2500 pixels)	Left	Right
April 2011 199x3=597 images
July 2011 199x3=597 images

Description

The dataset currently contains the data of two city streets, Kamaishi and Takata. (These are the names of the cities.) Each street dataset consists of two image sequences t0 & t1, which are captured at two different times (about three months apart).

Each image sequence contains cylindrical panoramic images (5000 x 2500 pixels) along with their camera poses. The panoramic images are named as 'panorama/*.jpg' in the corresponding directory. These images are renedered by (Ladybug SDK 1.5 Release 7 - Windows (64-bit), ladybugRenderOffScreenImage(..., LADYBUG_PANORAMIC, ...) ) and are produced by the equirectangular projection. Please refer to [1] for the details of the transformation between the image coordinates and the (ladybug) camera coordinates. Their camera poses are obtained by our SfM code from these panoramic images. They are stored in the text file 'panorama/cam_detail.txt' in the following format:

r11^1 r12^1 r13^1 t1^1
r21^1 r22^1 r23^1 t2^1
r31^1 r32^1 r33^1 t3^1
0 0 0 1
r11^2 r12^2 r13^2 t1^2
r21^2 r22^2 r23^2 t2^2
r31^2 r32^2 r33^2 t3^2
0 0 0 1
...

where rij is the (i,j) component of the rotation matrix, ti is the ith entry of the translational vector, and ^k indicates they are the parameters of the k-th viewpoint. The global coordinates X is transferred to the local coordinates of k-th viewpoint by X^k = R^k X + t^k.

These camera poses are computed independently for each of t0 and t1. In order to make a comparison between t0 and t1, we need camera poses registered in a single coordinate system. This can be performed by an additional bundle adjustment over t0 and t1; the results are stored in 'panoramic/T0.txt' for t0 and 'panoramic/T1.txt' for t1. Their format are the same as 'cam_detail.txt'

In the dataset, there are also perspective images (640 x 480 pixels) cropped from these panoramic images. (The results shown in our CVPR paper were obtained by using some of them.) There are two image sets for each street of each time; one is a set of images looking at the left side of the street and the other is those looking at the right　side. Thus, there are four image sets in total for each street, i.e., t0-left, t0-right, t1-left, and t1-right; you are to compare t0-* and t1-*.

The internal camera parameters are identical for all of these perspective images and are given in 'intrinsic_param.txt' in the following format:

f 0 cx
0 f cy
0 0 1.

The external camera parameters for the four images sets (t0-left, t0-right, t1-left, t1-right) are stored in 't0/perspective_left/T0_left.txt,' 't0/perspective_right/T0_right.txt,' 't1/perspective_left/T1_left.txt,' and 't1/perspective_right/T1_right.txt' in the same format as 'cam_detail.txt.' They were computed from the camera poses T0 and T1 for the panoramic images in the following way:

T0_l^k = T_l T0^k
T1_l^k = T_l T1^k
T0_r^k = T_r T0^k
T1_r^k = T_r T1^k

where T_l and T_r are the transformation matrices from the ladybug camera coordinates to the left and right perspective camera coordinates, respectively, and are given by

T_l =
1 0 0 0
0 0 -1 0
0 1 0 0
0 0 0 1

and

T_r =
-1 0 0 0
0 0 -1 0
0 -1 0 0
0 0 0 1

Ground truth

Some of the perspective images have ground truths of temporal changes, which are manually obtained by ourselves. They are stored in "gt_mask_*.jpg."

Directory structure

Change_detection_dataset

|-README.txt

We welcome your questions, comments and suggestions. Please send them to sakurada@vision.is.tohoku.ac.jp or okatani@vision.is.tohoku.ac.jp

Ken Sakurada and Takayuki Okatani
Tohoku University, Japan
June 2013

Reference
[1] Torii Akihiko, Michal Havlena, and Tomas Pajdla, From google street view to 3d city models, Proc. ICCV Workshops, 2009.

東北大学 大学院情報科学研究科 システム情報科学専攻 岡谷研究室 (Okatani Laboratory)

Change Detection Dataset

東北大学大学院情報科学研究科システム情報科学専攻
岡谷研究室 (Okatani Laboratory)