Building Rome with Convex Optimization

Robotics: Science and Systems (RSS) 2025

Haoyu Han¹, Heng Yang¹

¹School of Engineering and Applied Sciences, Harvard University

XM is a powerful and scalable optimization engine designed for large-scale Structure-from-Motion (SfM) tasks. The video showcases its capability to efficiently solve 10,155 frames to a global minimum within just one hour. (You may need sometime to load the video.)

Abstract

Global bundle adjustment is made easy by depth prediction and convex optimization. We (i) propose a scaled bundle adjustment (SBA) formulation that lifts 2D keypoint measurements to 3D with learned depth, (ii) design an empirically tight convex semidefinite program (SDP) relaxation that solves SBA to certifiable global optimality, (iii) solve the SDP relaxations at extreme scale with Burer-Monteiro factorization and a CUDA-based trust-region Riemannian optimizer (dubbed XM), (iv) build a structure from motion (SfM) pipeline with XM as the optimization engine and show that XM-SFM compares favorably with existing SfM pipelines in terms of reconstruction quality while being faster, more scalable, and initialization-free.

Reconstruction

We present a reconstruction visualization featuring 3D points (colorful points) and camera poses (red frames). Some selected datasets include dense reconstructions generated directly from depth maps. (You may need sometime to load the video.)

BAL datasets

Replica datasets

BAL-93

BAL-392

BAL-1934

BAL-10155

Room0

Room1

Office0

Office1

TUM datasets

IMC datasets

fr1/rpy

fr1/xyz

fr1/desk

fr1/room

Temple Nara Japan

Colosseum Exterior

Notre Dame Front Facade

Brandenburg Gate

Rendered video for SLAM

We present the reconstruction results (left) alongside the input image (right) from the Replica dataset. The reconstruction is rendered along the red camera trajectory shown in the "Reconstruction" section.

Start Frame

End Frame

Start Frame

End Frame

3DGS Rendering

For the Mip-NeRF datasets, we input the camera poses generated by our solver into a 3D Gaussian Splatting renderer. The rendered video is shown below, and a link beneath it allows interactive exploration via the web-based renderer. Tips: use the mouse to rotate the camera, and WSAD to move the camera.

Kitchen

open renderer for kitchen

Garden

open renderer for garden