Change Detection Dataset


This dataset contains the image sequences of city streets captured by a vehicle-mounted camera at two different time points. We make them publicly available for the researchers who are interested in the problem of the image-based detection of temporal changes of 3D scene structures. Although we own its copyright, you can freely use it for research purposes. We request that you cite the following paper if you publish research results utilizing these data:

Ken Sakurada, Takayuki Okatani, Koichiro Deguchi, Detecting Changes in 3D Structure of a Scene from Multi-view Images Captured by a Vehicle-mounted Camera, Proc. Computer Vision and Pattern Recognition, 2013. [pdf] [web]


Click here to download the dataset (about 500MBytes).

Click here to download additional data (the data which we fed to PMVS2 to obtain some of the results shown in our CVPR paper, about 900MBytes).

Click here to download the point cloud data (the point clouds which were reconstructed using PMVS2 to obtain some of the results shown in our CVPR paper, about 70MBytes).



  Panoramic (5000x2500 pixels) Perspective image (640x480 pixels)
Left Right

April 2011 

56x3=168 images

panoramic_kamaishi_t0_003.jpg perspective_kamaishi_t0_left_003.jpg perspective_kamaishi_t0_right_003.jpg

July 2011

55x3=165 images

panoramic_kamaishi_t1_002.jpg perspective_kamaishi_t1_left_002.jpg perspective_kamaishi_t1_right_002.jpg


  Panoramic (5000x2500 pixels) Perspective image (640x480 pixels)
Left Right

April 2011 

199x3=597 images

panoramic_takata_t0_043.jpg perspective_takata_t0_left_043.jpg perspective_takata_t0_right_043.jpg

July 2011

199x3=597 images

panoramic_takata_t1_043.jpg perspective_takata_t1_left_043.jpg perspective_takata_t1_right_043.jpg


The dataset currently contains the data of two city streets, Kamaishi and Takata. (These are the names of the cities.) Each street dataset consists of two image sequences t0 & t1, which are captured at two different times (about three months apart).

Each image sequence contains cylindrical panoramic images (5000 x 2500 pixels) along with their camera poses. The panoramic images are named as 'panorama/*.jpg' in the corresponding directory. These images are renedered by (Ladybug SDK 1.5 Release 7 - Windows (64-bit), ladybugRenderOffScreenImage(..., LADYBUG_PANORAMIC, ...) ) and are produced by the equirectangular projection. Please refer to [1] for the details of the transformation between the image coordinates and the (ladybug) camera coordinates. Their camera poses are obtained by our SfM code from these panoramic images. They are stored in the text file 'panorama/cam_detail.txt' in the following format:

r11^1 r12^1 r13^1 t1^1
r21^1 r22^1 r23^1 t2^1
r31^1 r32^1 r33^1 t3^1
0 0 0 1
r11^2 r12^2 r13^2 t1^2
r21^2 r22^2 r23^2 t2^2
r31^2 r32^2 r33^2 t3^2
0 0 0 1

where rij is the (i,j) component of the rotation matrix, ti is the ith entry of the translational vector, and ^k indicates they are the parameters of the k-th viewpoint. The global coordinates X is transferred to the local coordinates of k-th viewpoint by X^k = R^k X + t^k.

These camera poses are computed independently for each of t0 and t1. In order to make a comparison between t0 and t1, we need camera poses registered in a single coordinate system. This can be performed by an additional bundle adjustment over t0 and t1; the results are stored in 'panoramic/T0.txt' for t0 and 'panoramic/T1.txt' for t1. Their format are the same as 'cam_detail.txt'

In the dataset, there are also perspective images (640 x 480 pixels) cropped from these panoramic images. (The results shown in our CVPR paper were obtained by using some of them.) There are two image sets for each street of each time; one is a set of images looking at the left side of the street and the other is those looking at the right side. Thus, there are four image sets in total for each street, i.e., t0-left, t0-right, t1-left, and t1-right; you are to compare t0-* and t1-*.

The internal camera parameters are identical for all of these perspective images and are given in 'intrinsic_param.txt' in the following format:

 f   0  cx
 0   f  cy
 0  0   1.

The external camera parameters for the four images sets (t0-left, t0-right, t1-left, t1-right) are stored in 't0/perspective_left/T0_left.txt,' 't0/perspective_right/T0_right.txt,' 't1/perspective_left/T1_left.txt,' and 't1/perspective_right/T1_right.txt' in the same format as 'cam_detail.txt.' They were computed from the camera poses T0 and T1 for the panoramic images in the following way:

T0_l^k = T_l T0^k
T1_l^k = T_l T1^k
T0_r^k = T_r T0^k
T1_r^k = T_r T1^k

where T_l and T_r are the transformation matrices from the ladybug camera coordinates to the left and right perspective camera coordinates, respectively, and are given by

T_l =
 1  0  0  0
 0  0 -1 0
 0  1  0  0
 0  0  0  1


T_r =
-1  0  0  0
 0  0 -1  0
 0 -1  0  0
 0  0   0  1

Ground truth

Some of the perspective images have ground truths of temporal changes, which are manually obtained by ourselves. They are stored in "gt_mask_*.jpg."

Directory structure



| |--t0
| | |--panorama // *.jpg, cam_detail.txt, T0.txt
| | |--perspective_left // *.jpg, T0_left.txt
| | --perspective_right // *.jpg, T0_right.txt
| |--t1
| | |--panorama // *.jpg, cam_detail.txt, T1.txt
| | |--perspective_left // *.jpg, T1_left.txt
| | --perspective_right // *.jpg, T1_right.txt
| |
| --ground_truth // gt_*.jpg, gt_mask_*.jpg
  | |--panorama // *.jpg, cam_detail.txt, T0.txt
  | |--perspective_left // *.jpg, T0_left.txt
  | --perspective_right // *.jpg, T0_right.txt
  | |--panorama // *.jpg, cam_detail.txt, T1.txt
  | |--perspective_left // *.jpg, T1_left.txt
  | --perspective_right // *.jpg, T1_right.txt
  |--ground_truth_left // gt_*.jpg, gt_mask_*.jpg
  --ground_truth_right // gt_*.jpg, gt_mask_*.jpg

We welcome your questions, comments and suggestions. Please send them to or

Ken Sakurada and Takayuki Okatani
Tohoku University, Japan
June 2013

[1] Torii Akihiko, Michal Havlena, and Tomas Pajdla, From google street view to 3d city models, Proc. ICCV Workshops, 2009.