Fig. 1: Overview of HiSplat. HiSplat constructs hierarchical 3D Gaussians which can better represent large-scale structures (more accurate location and less crack), and texture details (fewer artefacts and less blurriness).
Fig. 2: Framework of HiSplat. For simplicity, the situation with two input images is illustrated. HiSplat utilizes a shared U-Net backbone to extract different-scale features. With these features, three processing stages predict pixel-aligned Gaussian parameters with different scales, respectively. Error aware module and modulating fusion module perceive the errors in the early stages and guide the Gaussians in the later stages for compensation and repair. Finally, the fusing hierarchical Gaussians can reconstruct both the large-scale structure and texture details.
Tab. 1: Evaluation on RealEstate10K and ACID. Compared with the previous NeRF-based and generalizable Gaussian-Splatting-based method, HiSplat can consistently obtain higher rendering quality of unseen views.
Tab. 2: Evaluation on cross-dataset generalization. We train models on RealEstate10K, and test them on object-centric dataset DTU, outdoor dataset ACID, and indoor dataset Replica in a zero-shot setting. Compared with previous methods, HiSplat can better handle various scenes with different distributions and scales.