TEGLO
High Fidelity Canonical Texture Mapping from Single-View Images

1University of California, San Diego   2Google Research
*Indicates Equal Contribution

TEGLO results: Multi-view consistent novel view synthesis from single-view.
Left - Target image from CelebA-HQ. Center - Textured orbit. Right - Textured orbit with edits.

Abstract

Recent work in Neural Fields (NFs) learn 3D representations from class-specific single view image collections. However, they are unable to reconstruct the input data preserving high-frequency details. Further, these methods do not disentangle appearance from geometry and hence are not suitable for tasks such as texture transfer and editing. In this work, we propose TEGLO (Textured EG3D-GLO) for learning 3D representations from single view in-the-wild image collections for a given class of objects. We accomplish this by training a conditional Neural Radiance Field (NeRF) without any explicit 3D supervision. We equip our method with editing capabilities by creating a dense correspondence mapping to a 2D canonical space. We demonstrate that such mapping enables texture transfer and texture editing without requiring meshes with shared topology. Our key insight is that by mapping the input image pixels onto the texture space we can achieve near perfect reconstruction (>= 74 dB PSNR at 10242 resolution). Our formulation allows for high quality 3D consistent novel view synthesis with high-frequency details at megapixel image resolution.

Method

teaser
TEGLO takes a single-view image and its approximate camera pose to map the pixels onto a texture. Then, to render the object from a different view, we extract the 3D surface points from the trained NeRF and use the dense correspondences to obtain the color for each pixel from the mapped canonical texture. Optionally, TEGLO can take texture edits and transfer textures across objects.
Our key insight is that by disentangling texture and geometry using the 3D surface points of objects to learn a dense correspondence mapping via a 2D canonical coordinate space, we can extract a texture for each object. Then, by using the learned correspondences to map the pixels from the input image of the object onto the texture, we enable preserving high-frequency details. As expected, copying the input image pixels onto the texture accurately allows near perfect reconstruction while preserving high-fidelity multi-view consistent representation with high-frequency details. In this work, we present TEGLO, consisting of a tri-plane and GLO-based conditional NeRF and a method to learn dense correspondences to enable challenging tasks such as texture transfer, texture editing and high-fidelity 3D reconstruction even at large megapixel resolutions. We also show that TEGLO enables single-view 3D reconstruction with no constraints on resolution by inverting the image into the latent table without requiring PTI or model fine-tuning.

teaser TEGLO Stage-1 (left) uses a tri-plane and GLO based conditional NeRF to learn a per-object table of latents to reconstruct the single-view image collection. TEGLO Stage-2 (right) learns dense correspondences via a 2D canonical coordinate space.

Results

results

Demonstrating TEGLO for high fidelity 3D reconstruction and multi-view consistent texture representation and texture editing from single-view image collections of objects.

Reconstruction of Train Data

results

Qualitative comparison with relevant 3D-aware baselines at 256x256 resolution for CelebA-HQ.

Single View 3D Reconstruction

results

Results for TEGLO trained on FFHQ data and evaluated on CelebA-HQ image targets.

Texture Transfer

results

Texture transfer results with CelebA-HQ. (Top row shows CelebA-HQ image targets).

Texture Editing

results

Textured Representation of Complex Objects

results

BibTeX

@article{vinod2023teglo,
  title   = {TEGLO: High Fidelity Canonical Texture Mapping from Single-View Images},
  author  = {Vishal Vinod and Tanmay Shah and Dmitry Lagun},
  year    = {2023},
  journal = {arXiv preprint arXiv: Arxiv-2303.13743}
}