Figure 1: The goal of our work is segmenting 3D building instances in a large urban scene. Left shows 3D model and (optinal) UAV images, and right shows the output 3D building instances. The pyramids above the 3D scene model indicate the position and orientation of the cameras.
Abstract: We present a novel framework for instance segmentation of Multi-view Stereo (MVS) mesh models of buildings. Unlike existing works focusing on extracting semantically meaningful objects, the emphasis of this work lies in detecting and segmenting building instances even if they are adjacent and encoded in a large and imprecise 3D surface model. Multi-view RGB images are first lifted to RGBH images by adding a heightmap and are segmented to have all roof instances using a fine-tuned 2D instance segmentation neural network. Roof instance masks from different multi-view images are then clustered into global masks. Our mask clustering accounts for spatial occlusion and overlapping, which can eliminate ambiguities among multi-view images. Finally, 3D roof instances are segmented out by mask back-projections and extended to the entire building instances through an Markov random field (MRF) optimization. Quantitative evaluations and ablation studies show the effectiveness of all major steps of the method. We have also provided a dataset for the evaluation of instance segmentation of 3D building models. To the best of our knowledge, it is the first dataset for 3D instance segmentation of MVS buildings.
Figure 2: An overview of the proposed method. Our method takes a 3D urban scene and optionally multi-view UAV images as input and segments all 3D building instances as results. It contains three major steps: 2D roof instance segmentation, instance mask clustering, and 3D building instance segmentation. The multi-view images are not obligatory, as they can also be generated by the rendering of the input 3D scene with textures (noted by the dotted arrow in the ﬁgure). The red rectangles highlight a few global masks selected by our clustering method. The projection and back-projection operations noted by the red arrows in the ﬁgure contribute to both instance mask clustering and the occlusion-aware 3D roof segmentation.
To evaluate our 3D instance segmentation method, we have created a benchmark dataset InstanceBuilding that contains annotation for both UAV images and 3D urban scenes simultaneously. More details and download linkages can be found in the Datasets page.
Recommended citation: Jiazhou Chen, Yanghui Xu, Shufang Lu, Ronghua Liang*, Liangliang Nan. " 3D Instance Segmentation of MVS Buildings." IEEE Transactions on Geoscience and Remote Sensing. 2022, doi: 10.1109/TGRS.2022.3183567. BibTex