Recent advances in 3D scene reconstruction enable real-time viewing in virtual and augmented reality. To support interactive operations for better immersiveness, such as moving or editing objects, 3D scene inpainting methods are proposed to repair or complete the altered geometry. However, current approaches rely on lengthy and computationally intensive optimization, making them impractical for real-time or online applications.
We propose InstaInpaint, a reference-based feed-forward framework that produces 3D-scene inpainting from a 2D inpainting proposal within 0.4 seconds. We develop a self-supervised masked-finetuning strategy to enable training of our custom large reconstruction model (LRM) on the large- scale dataset. Through extensive experiments, we analyze and identify several key designs that improve generalization, textural consistency, and geometric correctness. InstaInpaint achieves a 1000× speed-up from prior methods while maintaining a state-of-the-art performance across two standard benchmarks. Moreover, we show that InstaInpaint generalizes well to flexible downstream applications such as object insertion and multi-region inpainting.
Original Scene
Original Scene
Edited Scene
Depth
@misc{you2025instainpaint,
title={InstaInpaint: Instant 3D-Scene Inpainting with Masked Large Reconstruction Model},
author={Junqi You and Chieh Hubert Lin and Weijie Lyu and Zhengbo Zhang and Ming-Hsuan Yang},
year={2025},
url={https://arxiv.org/abs/2506.10980},
}