Building Rome with Convex Optimization

1School of Engineering and Applied Sciences, Harvard University

XM is a powerful and scalable optimization engine designed for large-scale Structure-from-Motion (SfM) tasks. The video showcases its capability to efficiently solve 10,155 frames to a global minimum within just one hour. (You may need sometime to load the video.)

Abstract

Global bundle adjustment is made easy by depth prediction and convex optimization. We (i) propose a scaled bundle adjustment (SBA) formulation that lifts 2D keypoint measurements to 3D with learned depth, (ii) design an empirically tight convex semidefinite program (SDP) relaxation that solves SBA to certifiable global optimality, (iii) solve the SDP relaxations at extreme scale with Burer-Monteiro factorization and a CUDA-based trust-region Riemannian optimizer (dubbed XM), (iv) build a structure from motion (SfM) pipeline with XM as the optimization engine and show that XM-SFM dominates or compares favorably with existing SfM pipelines in terms of reconstruction quality while being faster, more scalable, and initialization-free.

Reconstruction

We present a reconstruction visualization featuring 3D points (colorful points) and camera poses (red frames). Some selected datasets include dense reconstructions generated directly from depth maps. (You may need sometime to load the video.)

BAL datasets

Replica datasets

TUM datasets

IMC datasets




Rendered video for SLAM

We present the reconstruction results (left) alongside the input image (right) from the Replica dataset. The reconstruction is rendered along the red camera trajectory shown in the "Reconstruction" section.

Interpolate start reference image.

Start Frame

Loading...
Interpolation end reference image.

End Frame

Interpolate start reference image.

Start Frame

Loading...
Interpolation end reference image.

End Frame




3DGS Rendering

For the Mip-NeRF datasets, we input the camera poses generated by our solver into a 3D Gaussian Splatting renderer. The rendered video is shown below, and a link beneath it allows interactive exploration via the web-based renderer. Tips: use the mouse to rotate the camera, and WSAD to move the camera.