Radiance Surfaces: Optimizing Surface Representations with a 5D
Radiance Field Loss
ZIYI ZHANG, École Polytechnique Fédérale de Lausanne (EPFL) and NVIDIA, Switzerland
NICOLAS ROUSSEL, École Polytechnique Fédérale de Lausanne (EPFL), Switzerland
THOMAS MÜLLER, NVIDIA, Switzerland
TIZIAN ZELTNER, NVIDIA, Switzerland
MERLIN NIMIER-DAVID, NVIDIA, Switzerland
FABRICE ROUSSELLE, NVIDIA, Switzerland
WENZEL JAKOB, École Polytechnique Fédérale de Lausanne (EPFL) and NVIDIA, Switzerland
Surface
rendering
Surface
normal
10 seconds10 seconds training 10 seconds 30 seconds
Loss computation
Color accumulation
(a) NeRF (b) Ours
Fig. 1. Our method reconstructs surfaces with the speed and robustness of NeRF-style methods. Le: In contrast to volume-based methods that minimize 2D
image losses, as shown in (a), we adopt a spatio-directional radiance field loss formulation, as shown in (b). At each step, our method considers a distribution
of optically independent surfaces, increasing the confidence of candidates that agree with the reference imagery. Right: A meaningful surface can be extracted
at any iteration during optimization.
We present a fast and simple technique to convert images into a radiance
surface-based scene representation. Building on existing radiance volume
reconstruction algorithms, we introduce a subtle yet impactful modication
of the loss function requiring changes to only a few lines of code: instead
of integrating the radiance eld along rays and supervising the resulting
images, we project the training images into the scene to directly supervise
the spatio-directional radiance eld.
The primary outcome of this change is the complete removal of alpha
blending and ray marching from the image formation model, instead moving
these steps into the loss computation. In addition to promoting convergence
to surfaces, this formulation assigns explicit semantic meaning to 2D subsets
Authors’ Contact Information: Ziyi Zhang, École Polytechnique Fédérale de Lau-
sanne (EPFL) and NVIDIA, Switzerland, ziyi.zhang@ep.ch; Nicolas Roussel, École
Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, nicolas.roussel@
ep.ch; Thomas Müller, NVIDIA, Switzerland, tmueller@nvidia.com; Tizian Zeltner,
NVIDIA, Switzerland, tzeltner@nvidia.com; Merlin Nimier-David, NVIDIA, Switzer-
land, mnimierdavid@nvidia.com; Fabrice Rousselle, NVIDIA, Switzerland, frousselle@
nvidia.com; Wenzel Jakob, École Polytechnique Fédérale de Lausanne (EPFL) and
NVIDIA, Switzerland, wenzel.jakob@ep.ch.
SIGGRAPH Conference Papers ’25, Vancouver, BC, Canada
© 2026 Copyright held by the owner/author(s). Publication rights licensed to ACM.
This is the author’s version of the work. It is posted here for your personal use. Not
for redistribution. The denitive Version of Record was published in Special Interest
Group on Computer Graphics and Interactive Techniques Conference Conference Papers
(SIGGRAPH Conference Papers ’25), August 10–14, 2025, Vancouver, BC, Canada, https:
//doi.org/10.1145/3721238.3730713.
of the radiance eld, turning them into well-dened radiance surfaces. We
nally extract a level set from this representation, which results in a high-
quality radiance surface model.
Our method retains much of the speed and quality of the baseline algo-
rithm. For instance, a suitably modied variant of Instant NGP maintains
comparable computational eciency, while achieving an average PSNR that
is only 0.1 dB lower. Most importantly, our method generates explicit sur-
faces in place of an exponential volume, doing so with a level of simplicity
not seen in prior work.
ACM Reference Format:
Ziyi Zhang, Nicolas Roussel, Thomas Müller, Tizian Zeltner, Merlin Nimier-
David, Fabrice Rousselle, and Wenzel Jakob. 2026. Radiance Surfaces: Op-
timizing Surface Representations with a 5D Radiance Field Loss. In Spe-
cial Interest Group on Computer Graphics and Interactive Techniques Con-
ference Conference Papers (SIGGRAPH Conference Papers ’25), August 10–14,
2025, Vancouver, BC, Canada. ACM, New York, NY, USA, 17 pages. https:
//doi.org/10.1145/3721238.3730713
1 Introduction
The task of reconstructing surfaces from a set of photographs has
been a long-standing challenge [Moons et al
.
2010]. The appeal of
surface representations, aside of their natural alignment with the
physical reality of objects, lies in their suitability for editing, anima-
tion and ecient rendering, which explains their near-ubiquitous
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
NeRF Ours
Minimize Minimize
blend colors blend local losses
Fig. 2. Comparison of the loss in volumetric optimization and our
radiance field loss. We denote alpha blending by
R
and the color dierence
metric as
(𝐿) (𝐿, 𝐿
target
)
and drop its dependency on the target color
for simplicity. Traditional volumetric reconstruction minimizes the image-
space loss of blended colors. In contrast, our method minimizes a blended
radiance field loss that yields a distribution of surfaces out of which a surface
representation can be trivially extracted, e.g., via marching cubes.
use in 3D graphics applications. Unfortunately, the optimization
landscape of a dierentiably rendered surface tends to be non-
convex and riddled with local minima. Consequently, the resulting
methods are often too fragile to handle complex, real-world scenes.
This problem can be cleverly sidestepped [Mildenhall et al
.
2020;
Kerbl et al
.
2023] by switching to a volumetric formulation of light
transport. The derivative of a continuous volumetric representation
is not only easier to evaluate, but it also leads to a smoother loss
landscape that brings enhanced robustness and scalability. However,
these improvements come at the cost of a more involved surface
extraction process requiring additional heuristics, such as surface-
promoting regularizers [Wang et al
.
2021] or multi-stage optimiza-
tion [Guédon and Lepetit 2024].
In this work, we seek a simple and direct approach to optimize
surfaces that retains the robustness and convergence speed of vol-
umetric methods. Our proposed method builds on a simple yet
powerful idea: optimizing a distribution over surfaces. Concretely,
we propose projecting the training photographs into the scene
and minimizing the attenuated dierence between the resulting
light eld and the spatial-directional emission originating from the
surface distribution.
The resulting radiance eld loss considers each point along a ray
as a surface candidate, individually optimized to match that ray’s
pixel color, leading to the desired distribution over surfaces. One
benet is that points along a ray receive independent gradients,
allowing the color or density to simultaneously increase at one
point and decrease at another. This is notably dierent from the
volumetric approach, which integrates the color along the ray prior
to the loss computation (see Figure 1, left). That is, with volume
reconstruction, all points along a ray receive gradients with the
same sign if their integrated color is too dark or bright, leading to
correlated adjustments.
Interestingly, our proposed radiance eld loss gives rise to equa-
tions remarkably similar to those of volumetric reconstruction meth-
ods (see Figure 2). In practical terms, this means that our method
is simple to integrate into existing volumetric frameworks. It also
means that we inherit many advantages of these prior works with-
out having to resort to additional heuristics to extract a surface.
While we have not focused on competing with existing methods in
terms of metrics, our proof-of-concept implementation in Instant
NGP [Müller et al
.
2022] consists of only a few modied lines of
code in the core algorithm and runs at roughly the same speed (in
terms of PSNR vs. time) while producing surfaces whose PSNR is,
on average, only 0.1 dB lower than that of the volumetric baseline.
2 Related work
This section reviews related work in the eld of 3D surface recon-
struction for novel view synthesis and tasks centered on geometric
representations. Because this is such an active eld, we highlight
particularly salient prior works rather than attempting an exhaus-
tive survey. As such, we only cover dierentiable rendering and
omit classical techniques like silhouette carving [Laurentini 1994].
Evolving a surface. The rst works on dierentiable rendering
embraced the high-level approach of optimizing an initial guess
of a shape via gradient descent [Loper and Black 2014], variously
representing the surface using SDF level sets [Zhang et al
.
2021;
Vicini et al
.
2022; Wang et al
.
2024], triangle meshes [Nicolet et al
.
2021], points [Chen et al
.
2024b], or hybrids [Munkberg et al
.
2022].
Regardless of the underlying representation, it remains challeng-
ing to achieve satisfactory results in this way: this is partly due to
the complex loss landscape of an evolving surface, and partly due
to the numerical diculties of computing visibility-induced gradi-
ents [Loubet et al
.
2019; Zhang et al
.
2020, 2023]. Without intricate
special-case handling, the optimization often fails when topologi-
cal changes are required [Mehta et al
.
2023], or when the surface
does not overlap with the target shape [Xing et al
.
2023]. Our work
sidesteps these limitations by replacing the surface boundary with
a distribution over surfaces.
Extracting geometry from a volume. After the advent of radiance
volume reconstruction (NeRF) for novel view synthesis [Milden-
hall et al
.
2020], researchers developed various regularizers and
parameterizations of radiance volumes to ensure that their level sets
yield plausible geometry [Wang et al. 2021; Yariv et al. 2021, 2023].
Surfaces can then be extracted using established algorithms like
marching cubes. However, while ecient NeRF implementations
reconstruct in seconds to minutes [Müller et al
.
2022], methods in
the aforementioned line of work require hours of computation [Li
et al
.
2023] or result in substantially reduced quality [Wang et al
.
2023]. In contrast, our method largely preserves the reconstruction
speed and quality of the baseline NeRF method.
In real-world reconstruction tasks, it is often ambiguous whether
ne details should be attributed to local color variation or geomet-
ric features. The optimal choice depends on whether the intended
application emphasizes novel view synthesis performance or re-
construction of smooth surface geometry. In the former case, our
method is a drop-in replacement, e.g., for MobileNeRF [Chen et al
.
2023]. For applications requiring smoother geometry, we propose a
lightweight Laplacian regularizer that maintains the eciency of our
method, while delivering results comparable to signicantly more
complex algorithms [Huang et al. 2024; Guédon and Lepetit 2024].
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Optimizing a distribution over surfaces. Several prior works con-
ceptualized volumetric reconstruction as optimizing a distribution
over surfaces [Seyb et al
.
2024; Miller et al
.
2024; Wang et al
.
2021].
These methods however represent objects as the union of multiple
interacting surfaces (whose contributions are integrated along the
ray), which conicts with our end goal of extracting a single surface
to model an object geometry. Instead, we build upon the “many
worlds” concept proposed by Zhang et al
.
[2024], which considers a
distribution of non-interacting surfaces, and apply it to the problem
of radiance surface reconstruction. We show how, in this context,
the many worlds concept gives rise to a simple equation dual to the
one used in NeRF frameworks; see Figure 2.
3 Method
In this section, we derive our radiance eld loss (Figure 2) by progres-
sively transforming the optimization of a single evolving surface.
While the nal result resembles volumetric reconstruction, this pro-
gression demonstrates that the method’s origins are surface-based.
3.1 Non-local surface perturbation
Dierentiating a rendering with respect to geometry reveals how
small geometric perturbations aect the resulting image. However,
because these derivatives are only nonzero on the surfaces them-
selves, they tend to cause convergence issues when used in opti-
mizations.
To overcome this limitation, consider the eect of introducing
a small surface patch at some distance above an existing visible
surface. This modication also impacts the rendered image and
can be interpreted as a perturbation of a more general non-local
derivative. A similar concept was previously used by Mehta et al
.
[2023] to nucleate new shapes in 2D vector graphics, and by Zhang
et al. [2024] in the context of physically based rendering.
Optimizing surfaces on this extended domain mitigates two key
issues discussed previously: Because updates are no longer con-
strained to the surface, the algorithm can achieve faster and more
robust convergence within a higher-dimensional loss landscape, as
illustrated below:
Local surface perturbation Non-local perturbation
Initial surface
Target surface
Optimization states
Second, the need for complex, specialized methods to estimate
boundary derivatives is eliminated, which simplies the implemen-
tation and further improves performance. Before making these ab-
stract notions concrete, we cover the
used geometric representation.
Geometric representation. Non-local perturbations require a rep-
resentation that spans the entire space. To this end, we use an
occupancy eld [Mescheder et al
.
2019; Niemeyer et al
.
2020] that
encodes the discrete probability of a position x being occupied:
𝛼 (x) = Pr{x lies within an object} [0, 1].
Semi-transparent
Opaque
(a) Alpha-blending
(b) Binary choice
Fig. 3. Non-local perturbations. We consider a single candidate surface
patch (with color
𝐿
p
) along the ray as a perturbation of a background
surface (with color
𝐿
b
). (a) Blending colors violates the surface assumption
and leads to volumetric results. (b) We instead treat the perturbation as a
random binary choice and optimize the associated discrete probability. The
final reconstruction is non-random and will never blend the contribution of
multiple surfaces.
After convergence, the eld is expected to have occupancy values
approaching 1 on the surface, and 0 in the exterior. We note that the
choice of an occupancy eld is somewhat arbitrary. The primary
focus of this work is on optimizing geometry irrespective of the
specic details of the representation.
3.2 Radiance field loss
Single candidate. To explain the concept of a non-local perturba-
tion, we rst focus on the case of a single candidate surface patch
along a ray. Figure 3 depicts this setup, in which a candidate at
position p with color
𝐿
p
and occupancy
𝛼
p
precedes a background
1
with color 𝐿
b
.
How this geometric conguration arises will be cov-
ered later—for now, we assume that is given, and that the color
values 𝐿
p
and 𝐿
b
are furthermore xed.
In this case, the optimal reconstruction is straightforward: the
candidate should be created if it improves the match with respect
to a specied target color 𝐿
target
; otherwise, it should be discarded.
The occupancy parameter
𝛼
p
provides the means to achieve this
outcome. However, there are dierent ways to integrate it. The stan-
dard volumetric approach (Figure 3a) interprets
𝛼
p
as an opacity for
alpha-compositing, minimizing a color dierence
(
ˆ
𝐿, 𝐿) of the form
𝛼
p
𝐿
p
+ (1 𝛼
p
) 𝐿
b
, 𝐿
target
. (1)
The fundamental limitation of this approach is its inability to pro-
mote binary occupancy values. When the best match is given by a
blend of
𝐿
p
and
𝐿
b
, the loss will reach zero without forming a distinct
surface. A common remedy involves adding additional loss terms
to penalize such behavior, but this lacks a principled theoretical
foundation and adds complexity in the form of hyperparameters.
We instead interpret the non-local perturbation as a binary choice:
the candidate surface either exists, or it does not. Thus, the nal color
value associated with the ray is either that of the candidate
𝐿
p
or the
background
𝐿
b
(Figure 3b). We quantify the quality of each possibil-
ity via and seek the occupancy value 𝛼
p
[0, 1] that minimizes:
L(p) = 𝛼
p
(𝐿
p
, 𝐿
target
) + (1 𝛼
p
) (𝐿
b
, 𝐿
target
). (2)
1
For now, the term background could refer to a surface, an environment map, etc. Later
sections will provide a concrete denition.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Subproblem 1
Subproblem 2
backgroundcandidates
Fig. 4. Surface candidates as independent subproblems. With multiple
candidates along a ray, each perturbation is treated as an independent
subproblem, resulting in local losses distributed spatially over the scene.
By blending the losses of the two surfaces instead of their colors,
this approach selects the surface that best explains the target color.
The simplied example shown here assumes that the candidate
color
𝐿
p
is static. In practice,
𝐿
p
(but not
𝐿
b
) is also subject to opti-
mization, which requires multiple viewpoints to resolve ambiguity;
more on this later.
Multiple candidates. We now extend the loss formulation to con-
sider multiple candidates. This is advantageous because it will allow
our method to simultaneously evaluate the eect of several pertur-
bations, which in turn accelerates convergence.
The key property of the single-candidate loss formulation is that
it isolates the candidate from the background surface (i.e., observing
one or the other). The generalization to multiple candidates pre-
serves this property by treating each candidate as an independent
subproblem (Figure 4), minimizing the sum of respective losses:
L
ray
(r) =
𝑚
𝑖=1
L(p
𝑖
), (3)
where
L(
p
𝑖
)
(following Equation 2) represents the loss of the
𝑖
-th
of 𝑚 candidates sampled along the ray r.
Spatio-directional loss. Reconstruction tasks evaluate the loss
(3)
along a large set of rays r
𝑘
(𝑘 =
1
, . . . , 𝑛)
, where
𝑛
denotes the total
number of pixels across all reference images. This further expands
the set of independently considered candidate surfaces and leads to
the combined loss
L
total
=
𝑛
𝑘=1
L
ray
(r
𝑘
). (4)
Whereas conventional surface optimization only propagates gra-
dients to the surface itself, the use of
𝛼
p
and
𝐿
p
in Equation
(2)
covers the entirety of the observed 3D space. For positions viewed
from multiple directions, the loss generally also varies with respect
to direction:
spatial
directional
background
surface
radiance field loss
Fig. 5. Stochastic background. Selecting the background surface at ran-
dom from a distribution
𝑓
b
enables visibility through high-occupancy regions.
Each sampled background surface defines a new perturbation problem solv-
able with the radiance field loss. Taking an expectation of this process leads
to a simple deterministic expression that we implement in practice.
In other words: by moving the evaluation of
from image space into
the scene, we have created a spatio-directional radiance eld loss.
3.3 Stochastic background surface
To complete our derivation of the loss function, what remains is
the denition of the background surface. Rather than a determin-
istic surface (e.g., a level set of the occupancy eld), we draw the
background from a per-ray distribution
𝑓
b
. This enables occasional
“visibility” through high-occupancy regions, allowing occluded ob-
jects to be considered as the background (Figure 5). Crucially, we
thereby support complex topological changes in our optimization
without having to explicitly account for them [Mehta et al
.
2023];
see Appendix C for additional details.
The design of the distribution
𝑓
b
is exible. One straightforward
approach is to prioritize sampling in high-occupancy regions, as
these areas are more likely to correspond to surfaces. During ray
traversal, we stochastically decide whether to use a position as the
background surface based on its occupancy value. This sequential
decision process reects the concept of free-ight distance [Novák
et al. 2018], forming the free-ight background distribution.
We can formulate the expectation of sampling the background
surface from such a free-ight distribution analytically and derive a
corresponding aggregated local loss analogous to classical volumet-
ric light transport:
L(p
𝑖
) =
𝑖1
𝑗=1
(1 𝛼
p
𝑗
)
𝛼
p
𝑖
(𝐿
p
𝑖
) , (5)
which, when plugged into Equation
(4)
, yields the radiance eld
loss (Figure 2). See Appendix A for the complete derivation.
Implementation. An implementation of our loss function can be
arranged to resemble the color blending structure of standard vol-
ume reconstruction methods like NeRF [Mildenhall et al
.
2020]. As
such, it is exceedingly simple to implement in existing codebases,
as illustrated in the following comparison of pseudocode.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
This resemblance also suggests that the optimization landscape of
our method is similar to that of NeRF, inheriting its robustness.
However, while NeRF’s loss supervises all samples along the ray to
collectively match the target color, our loss aims for each sample to
match the target color independently or become transparent when
the background is a better match. This distinction fundamentally
denes our approach as a surface reconstruction algorithm.
3.4 Volume relaxation
We also propose a heuristic-based generalization of our method. It
is orthogonal to the above algorithm and optional during training.
While a surface representation oers many advantages, the opaque
surface assumption has inherent limitations in certain scenarios.
For example, sub-pixel structures are challenging to model with
geometry, and a single surface may fail to accurately represent
the appearance of directional-varying materials. In these regions, a
volumetric representation is more suitable.
Our goal is to relax our method to reconstruct most of the scene as
surfaces (regions where low loss can be reached) and use volumetric
representations only in the remaining challenging regions. To this
end, we rst train with our algorithm for 20k iterations to obtain an
initial surface representation. We then identify challenging regions
by evaluating where local losses remain high. In subsequent training
steps, we relax the surface assumption, allowing volumetric alpha
blending in these regions.
After training, rather than extracting a surface, we render the
scene volumetrically, with surface regions treated as fully opaque
“volumes”. Comparing to a volume scene optimized with NeRF, our
method still benets from the compact representation of surface
regions. When accumulating colors along a ray, very few samples
are required to saturate the transmittance, leading to faster inference
and reduced computational resources during training.
Where not explicitly stated, all results in this paper (marked as
“ours”) are trained without volume relaxation.
4 Results
4.1 Novel view synthesis
Visual quality. Despite the inherently fewer degrees of freedom
of surfaces, Figure 9 shows that our method achieves results that are
qualitatively comparable to NeRF. We also visualize the surface ren-
derings at occupancy level sets
{
0
.
01
,
0
.
1
,
0
.
5
,
0
.
9
,
0
.
99
}
. Renderings
of the scene optimized by our algorithm barely change, indicating a
near-Heaviside step function in the occupancy eld. In contrast, the
inherently volumetric nature of NeRF does not produce meaningful
Table 1. Visual quality comparison. We integrate our loss into Instant
NGP and train on the MipNeRF360 dataset using default hyperparams.
Indoor mean Outdoor mean
PSNR SSIM LPIPS PSNR SSIM LPIPS
Ours 29.02 dB 0.888 0. 275 22.41 dB 0.679 0.563
Ours (relaxed) 29. 41 dB 0. 897 0.284 22. 62 dB 0. 690 0.626
NeRF 29.19 dB 0.893 0.303 22.47 dB 0.683 0.638
visualizations for these level sets. Figure 10 highlights the recon-
struction of another scene where our method with volume relaxation
addresses challenges in modeling a semi-transparent object.
Table 1 shows that our method achieves visual quality comparable
to exponential volume reconstruction (NeRF) when trained on the
MipNeRF360 dataset, using default Instant NGP hyperparameters,
despite using a surface-based representation. A small PSNR gap is
expected, as volume representations oer inherently more degrees
of freedom that can be repurposed to model pixel-wise colors. A
similar trend is observed when implementing our method in the
ZipNeRF codebase, where we measured mean PSNR of 29
.
73 dB for
our method and 31
.
45 dB for NeRF on indoor scenes, and 24
.
06 dB
and 25.24 dB on outdoor scenes, respectively.
When evaluating our relaxed variant—which switches to volu-
metric rendering in hard regions—the visual quality slightly exceeds
the NeRF baseline. This improvement arises because our method en-
courages surface-like, sparse distributions, resulting in more empty
space that the renderer can eciently skip. Consequently, at equal
batch size, Instant NGP automatically spawns more rays when us-
ing our method, thereby covering more reference pixels per batch,
in turn leading to a better reconstruction. When the ray count is
restricted to match NeRF, the relaxed variant delivers results that
are approximately equal.
These trends are consistent across other metrics as well. For
instance, both our method and NeRF achieve SSIM scores of 0
.
89
(indoor) and 0
.
68 (outdoor). The full set of evaluation results is
provided in the appendix.
Rendering performance. Our implementation builds on the In-
stant NGP codebase, which ray-marches elds (
𝛼
p
, 𝐿
p
) represented
using an interpolated hash grid lookup combined with a lightweight
MLP. We repurpose this ray-marching code for surface rendering
by returning the color of the rst sample with an occupancy value
exceeding 0
.
5. This straightforward modication results in a 2
.
4
×
average speedup in frames per second (FPS) across MipNeRF360
scenes compared to the baseline. An average speedup of 2
.
0
×
is
achieved for the relaxed version of our method, as most of the scene
remains surface-like.
An additional 2
×
speedup can be achieved by replacing ray-
marching with rasterization of a meshed isosurface. In this case, the
color network is only used for mesh shading, maintaining the same
visual quality as before. Various strategies exist to further boost ren-
dering performance, e.g., by storing precomputed hash grid lookups
alongside mesh vertices [Chen et al
.
2023], or by projecting the direc-
tional MLP dependence into
spherical harmonics [Reiser et al. 2024].
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Laplacian
Fig. 6. Regularization. For simple scenes with enough observations, the
reconstructed surface closely matches the ground truth geometry without
requiring additional constraints. Adding Laplacian refinement helps smooth
out unnecessary small kinks, resulting in a more accurate final geometry.
Table 2. Average Chamfer distance comparison on the DTU dataset with
NeuS [Wang et al. 2021] and NeuS2 [Wang et al. 2023].
Ours (1 min) NeuS (8 hr) NeuS2 (5 min)
CD 0.80 0.77 0. 68
10 seconds training 50 seconds 1 minute
Ours NeuS2
Fig. 7. Straightforward extraction. Since our algorithm does not use an in-
termediate volume representation, eicient surface extraction is possible at
any point. At equal time, a fast NeuS2 baseline [Wang et al
.
2023] still models
the scene as a fuzzy volume, and a surface
cannot be confidently extracted.
4.2 Geometry reconstruction
Our method is also applicable to geometry reconstruction tasks, in
which achieving high-quality meshes matching the ground truth
geometry is of interest. For simple multi-view input (Figure 6), our
method produces highly detailed geometry with the speed of Instant
NGP (seconds). A mesh can be extracted at any point during the op-
timization (Figure 7). However, complex real-world reconstruction
tasks are often under-constrained. For example, a reective object
seen only from a narrow cone of directions does not provide su-
cient information for accurate shape recovery. Even with a larger
set of viewpoints, it can be challenging to disambiguate whether
surface detail is due to local color variation or small-scale geometry.
As a consequence, the reconstructed geometry often exhibits unde-
sirable bump-like artifacts representing such misattributed detail.
While our algorithm still excels at novel view synthesis under these
conditions, the reconstructed geometry can signicantly deviate
from the ground truth.
To mitigate this issue, we incorporate an exponentially decay-
ing Laplacian regularizer during training. This regularizer initially
enforces at surfaces and progressively provides more degrees of
freedom as its inuence decays. Figure 11 examines the inuence of
the nal Laplacian weight on reconstruction quality. Figure 12 show-
cases geometry reconstructions of scenes from the DTU [Jensen
et al
.
2014] and BlendedMVS [Yao et al
.
2020] datasets, all made
with a consistent Laplacian weight of 2 × 10
5
.
Table 2 shows that using only minimal Laplacian regularization,
our method achieves an average Chamfer distance on the DTU
dataset that is just 0
.
12 higher than NeuS2 [Wang et al
.
2023], while
reducing runtime to only 1 minute thanks to our algorithmic simplic-
ity. The complete evaluation results are provided in the appendix. In
this work, we do not intend to compete on geometric reconstruction
metrics and have not incorporated other regularization extensions,
which would detract from the simplicity of the presented idea. Such
extensions include multi-view consistency losses [Fu et al
.
2022;
Chen et al
.
2024a] to reduce ambiguities in regions with limited
observations, or the TSDF algorithm [Izadi et al
.
2011] that helps
extract smooth meshes while removing unnecessary geometry.
5 Discussion
5.1 Choice of background distribution
In Section 3.3, we used the free-ight background distribution to
derive a loss form dual to the NeRF loss. This choice is somewhat
arbitrary, and other distributions could be used with dierent trade-
os. In Appendix B, we discuss how alternative designs can enable
new optimization strategies that are not possible in image-space
methods, with one such example provided.
5.2 Relation to many-worlds inverse rendering
Our method builds on the core idea of Zhang et al
.
[2024] (we refer
to their method as PBR-MW ), namely that surface distributions can
be optimized more directly without involving exponential volumes.
We reconstruct purely emissive objects
2
, while PBR-MW handles
dierentiable shadowing and interreection to reconstruct reecting
objects in scenes with global illumination. Viewed supercially, our
method could be mistaken for a stripped down version of PBR-MW.
Our contribution lies in leveraging this simplicity to develop
a specialized method. We identify and implement optimizations
unique to radiance surfaces to fully realize the potential of the
many-worlds idea.
In radiance surface rendering, image formation is a direct 1:1
mapping between a ray and the nearest intersected surface, while
PBR-MW requires a complex nested integration over materials,
lighting, and geometry. Our approach to project training images
into the scene to establish a radiance eld loss depends on this 1:1
mapping and does not eciently translate to the nested integral
structure of a global illumination renderer.
Another important contribution is the introduction of a stochastic
background distribution, which enables topological changes and
substantially improves reconstruction quality. We show how to
cheaply evaluate this strategy in expectation, which is needed to
maintain algorithmic parity with NeRF. The associated derivations
and simplications (Appendix A) are specic to radiance surfaces
and do not transfer to PBR-MW.
2
In the equations of physically based rendering, radiance elds manifest in the emission
term [Nimier-David et al. 2022].
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Surface
rendering
Ground
truth
Surface
rendering
Surface
normal
Surface
normal
(a) Laplacian strength (b) Smooth conductor
Fig. 8. Limitations. (a) Our Laplacian smoothing strategy fails to recon-
struct the flat can surface due to its view-dependent appearance. A larger
Laplacian weight can help, but this also suppresses geometric detail seen in
Figure 12. (b) High-frequency color variation is more challenging to accu-
rately represent on a surface compared to a volumetric representation.
5.3 Limitations and future work
Moving the evaluation of the color loss
from image space into the
radiance eld makes our method incompatible with loss functions
that depend on image-space neighborhoods (e.g., style losses).
As shown in Figure 8, our lightweight Laplacian regularization
fails when there are insucient observations to constrain the geome-
try. Using alternative regularization techniques from state-of-the-art
geometry reconstruction methods could help mitigate this issue.
Our method also struggles to accurately capture the appearance of
conductive materials, which could be addressed by incorporating
solutions from prior work [Verbin et al. 2022].
An interesting extension of our work could involve implementing
a particle-based storage approach, such as Gaussian splatting [Kerbl
et al
.
2023]. However, 3D Gaussians are inherently semi-transparent,
which conicts with our assumption of opacity. Future work could
explore the use of opaque primitives, such as 2D disks, to replace
semi-transparent particles.
6 Conclusion
The "many worlds" paradigm—i.e., optimizing a distribution over
non-interacting primitives—is relatively new in the eld of dier-
entiable rendering. In this paper, we apply it to radiance surface
reconstruction, which yields a fast and simple alternative to prior
works. Particularly notable is that the derivation began with an
evolving surface, yet resulted in remarkably similar equations to
volumetric scene reconstructions: ones where losses rather than
colors are integrated along rays.
As reconstruction tasks increase in diculty, a key challenge
lies in deciding whether a region of space is best represented by a
surface or a volume. While the relaxed variant of our method oers
an eective heuristic, it also underscores the need for a principled
answer to this important question.
Much engineering has gone into the design of optimized algo-
rithms, regularizers, and heuristics for NeRF-based 3D reconstruc-
tion. Our hope is that a large portion of this eort will translate to
the radiance eld loss and yield state-of-the-art results in the future.
Acknowledgments
The authors would like to thank Aaron Lefohn and Alexander Keller
for their support. This project has received funding from the Euro-
pean Research Council (ERC) under the European Union’s Horizon
2020 research and innovation program (grant agreement No 948846).
References
Danpeng Chen, Hai Li, Weicai Ye, Yifan Wang, Weijian Xie, Shangjin Zhai, Nan Wang,
Haomin Liu, Hujun Bao, and Guofeng Zhang. 2024a. PGSR: Planar-based Gaussian
Splatting for Ecient and High-Fidelity Surface Reconstruction. arXiv preprint
arXiv:2406.06521 (2024).
Hanyu Chen, Bailey Miller, and Ioannis Gkioulekas. 2024b. 3D Reconstruction with
Fast Dipole Sums. ACM Trans. Graph. 43, 6, Article 192 (Nov. 2024), 19 pages.
doi:10.1145/3687914
Zhiqin Chen, Thomas Funkhouser, Peter Hedman, and Andrea Tagliasacchi. 2023.
Mobilenerf: Exploiting the polygon rasterization pipeline for ecient neural eld
rendering on mobile architectures. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 16569–16578.
Qiancheng Fu, Qingshan Xu, Yew Soon Ong, and Wenbing Tao. 2022. Geo-neus:
Geometry-consistent neural implicit surfaces learning for multi-view reconstruction.
Advances in Neural Information Processing Systems 35 (2022), 3403–3416.
Antoine Guédon and Vincent Lepetit. 2024. Sugar: Surface-aligned gaussian splatting
for ecient 3d mesh reconstruction and high-quality mesh rendering. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5354–5363.
Binbin Huang, Zehao Yu, Anpei Chen, Andreas Geiger, and Shenghua Gao. 2024. 2D
Gaussian Splatting for Geometrically Accurate Radiance Fields. In SIGGRAPH 2024
Conference Papers. Association for Computing Machinery. doi:10.1145/3641519.
3657428
Shahram Izadi, David Kim, Otmar Hilliges, David Molyneaux, Richard Newcombe,
Pushmeet Kohli, Jamie Shotton, Steve Hodges, Dustin Freeman, Andrew Davison,
et al
.
2011. Kinectfusion: real-time 3d reconstruction and interaction using a moving
depth camera. In Proceedings of the 24th annual ACM symposium on User interface
software and technology. 559–568.
Rasmus Jensen, Anders Dahl, George Vogiatzis, Engin Tola, and Henrik Aanæs. 2014.
Large scale multi-view stereopsis evaluation. In Proceedings of the IEEE conference
on computer vision and pattern recognition. 406–413.
Bernhard Kerbl, Georgios Kopanas, Thomas Leimkühler, and George Drettakis. 2023.
3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Trans. Graph.
42, 4 (2023), 139–1.
A. Laurentini. 1994. The Visual Hull Concept for Silhouette-Based Image Understanding.
IEEE Trans. Pattern Anal. Mach. Intell. 16, 2 (Feb. 1994), 150–162. doi:10.1109/34.
273735
Zhaoshuo Li, Thomas Müller, Alex Evans, Russell H Taylor, Mathias Unberath, Ming-
Yu Liu, and Chen-Hsuan Lin. 2023. Neuralangelo: High-delity neural surface
reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition. 8456–8465.
Matthew M. Loper and Michael J. Black. 2014. OpenDR: An Approximate Dierentiable
Renderer. In Computer Vision ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele,
and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 154–169.
Guillaume Loubet, Nicolas Holzschuch, and Wenzel Jakob. 2019. Reparameterizing
Discontinuous Integrands for Dierentiable Rendering. Transactions on Graphics
(Proceedings of SIGGRAPH Asia) 38, 6 (Dec. 2019). doi:10.1145/3355089.3356510
Ishit Mehta, Manmohan Chandraker, and Ravi Ramamoorthi. 2023. A Theory of
Topological Derivatives for Inverse Rendering of Geometry. In Proceedings of the
IEEE/CVF International Conference on Computer Vision. 419–429.
Lars Mescheder, Michael Oechsle, Michael Niemeyer, Sebastian Nowozin, and Andreas
Geiger. 2019. Occupancy networks: Learning 3d reconstruction in function space.
In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
4460–4470.
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ra-
mamoorthi, and Ren Ng. 2020. NeRF: Representing Scenes as Neural Radiance Fields
for View Synthesis. In ECCV.
Bailey Miller, Hanyu Chen, Alice Lai, and Ioannis Gkioulekas. 2024. Objects as Volumes:
A Stochastic Geometry View of Opaque Solids. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR). 87–97.
Theo Moons, Luc Van Gool, and Maarten Vergauwen. 2010. 3D Reconstruction from
Multiple Images Part 1: Principles. Foundations and Trends® in Computer Graphics
and Vision 4, 4 (2010), 287–404. doi:10.1561/0600000007
Thomas Müller, Alex Evans, Christoph Schied, and Alexander Keller. 2022. Instant
Neural Graphics Primitives with a Multiresolution Hash Encoding. ACM Trans.
Graph. 41, 4, Article 102 (July 2022), 15 pages. doi:10.1145/3528223.3530127
Jacob Munkberg, Jon Hasselgren, Tianchang Shen, Jun Gao, Wenzheng Chen, Alex
Evans, Thomas Müller, and Sanja Fidler. 2022. Extracting triangular 3d models,
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
materials, and lighting from images. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition. 8280–8290.
Baptiste Nicolet, Alec Jacobson, and Wenzel Jakob. 2021. Large Steps in Inverse Render-
ing of Geometry. ACM Trans. Graph. 40, 6 (Dec. 2021). doi:10.1145/3478513.3480501
Michael Niemeyer, Lars Mescheder, Michael Oechsle, and Andreas Geiger. 2020. Dif-
ferentiable volumetric rendering: Learning implicit 3d representations without 3d
supervision. In Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition. 3504–3515.
Merlin Nimier-David, Thomas Müller, Alexander Keller, and Wenzel Jakob. 2022. Unbi-
ased Inverse Volume Rendering with Dierential Trackers. ACM Trans. Graph. 41,
4, Article 44 (July 2022), 20 pages. doi:10.1145/3528223.3530073
Jan Novák, Iliyan Georgiev, Johannes Hanika, and Wojciech Jarosz. 2018. Monte
Carlo methods for volumetric light transport simulation. Computer Graphics Forum
(Proceedings of Eurographics - State of the Art Reports) 37, 2 (May 2018). doi:10/gd2jqq
Christian Reiser, Stephan Garbin, Pratul Srinivasan, Dor Verbin, Richard Szeliski, Ben
Mildenhall, Jonathan Barron, Peter Hedman, and Andreas Geiger. 2024. Binary
opacity grids: Capturing ne geometric detail for mesh-based view synthesis. ACM
Transactions on Graphics (TOG) 43, 4 (2024), 1–14.
Dario Seyb, Eugene D’Eon, Benedikt Bitterli, and Wojciech Jarosz. 2024. From micro-
facets to participating media: A unied theory of light transport with stochastic
geometry. ACM Transactions on Graphics (Proceedings of SIGGRAPH) 43, 4 (July
2024). doi:10.1145/3658121
Dor Verbin, Peter Hedman, Ben Mildenhall, Todd Zickler, Jonathan T. Barron, and
Pratul P. Srinivasan. 2022. Ref-NeRF: Structured View-Dependent Appearance for
Neural Radiance Fields. CVPR (2022).
Delio Vicini, Sébastien Speierer, and Wenzel Jakob. 2022. Dierentiable signed distance
function rendering. ACM Transactions on Graphics (TOG) 41, 4 (2022), 1–18.
Peng Wang, Lingjie Liu, Yuan Liu, Christian Theobalt, Taku Komura, and Wenping
Wang. 2021. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for
Multi-view Reconstruction. NeurIPS (2021).
Yiming Wang, Qin Han, Marc Habermann, Kostas Daniilidis, Christian Theobalt, and
Lingjie Liu. 2023. Neus2: Fast learning of neural implicit surfaces for multi-view
reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer
Vision. 3295–3306.
Zichen Wang, Xi Deng, Ziyi Zhang, Wenzel Jakob, and Steve Marschner. 2024. A
Simple Approach to Dierentiable Rendering of SDFs. In ACM SIGGRAPH Asia 2024
Conference Proceedings.
Jiankai Xing, Xuejun Hu, Fujun Luan, Ling-Qi Yan, and Kun Xu. 2023. Extended Path
Space Manifolds for Physically Based Dierentiable Rendering. In SIGGRAPH Asia
2023 Conference Papers. 1–11.
Yao Yao, Zixin Luo, Shiwei Li, Jingyang Zhang, Yufan Ren, Lei Zhou, Tian Fang, and
Long Quan. 2020. Blendedmvs: A large-scale dataset for generalized multi-view
stereo networks. In Proceedings of the IEEE/CVF conference on computer vision and
pattern recognition. 1790–1799.
Lior Yariv, Jiatao Gu, Yoni Kasten, and Yaron Lipman. 2021. Volume rendering of
neural implicit surfaces. In Thirty-Fifth Conference on Neural Information Processing
Systems.
Lior Yariv, Peter Hedman, Christian Reiser, Dor Verbin, Pratul P. Srinivasan, Richard
Szeliski, Jonathan T. Barron, and Ben Mildenhall. 2023. BakedSDF: Meshing Neural
SDFs for Real-Time View Synthesis. In ACM SIGGRAPH 2023 Conference Proceedings
(Los Angeles, CA, USA) (SIGGRAPH ’23). Association for Computing Machinery,
New York, NY, USA, Article 46, 9 pages. doi:10.1145/3588432.3591536
Cheng Zhang, Bailey Miller, Kai Yan, Ioannis Gkioulekas, and Shuang Zhao. 2020.
Path-Space Dierentiable Rendering. ACM Trans. Graph. 39, 4 (2020), 143:1–143:19.
Kai Zhang, Fujun Luan, Qianqian Wang, Kavita Bala, and Noah Snavely. 2021. Physg:
Inverse rendering with spherical gaussians for physics-based material editing and
relighting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. 5453–5462.
Ziyi Zhang, Nicolas Roussel, and Wenzel Jakob. 2023. Projective Sampling for Dieren-
tiable Rendering of Geometry. Transactions on Graphics (Proceedings of SIGGRAPH
Asia) 42, 6 (Dec. 2023). doi:10.1145/3618385
Ziyi Zhang, Nicolas Roussel, and Wenzel Jakob. 2024. Many-Worlds Inverse Rendering.
arXiv preprint arXiv:2408.16005 (2024).
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
(b) Volume rendering of NeRF
Surface rendering at varying occupancy levels
(a) Surface rendering of ours
Fig. 9. Nature of the reconstructed occupancy. Surface rendering at varying level sets of a scene reconstructed by our method and NeRF, both implemented
in Instant NGP using the same hyperparameters. (a) For our method, the surface rendering shows minimal changes across dierent level set thresholds,
indicating that the occupancy field has converged to a near-Heaviside step function on the surface, allowing for extraction of a surface-based representation.
(b) NeRF reconstructs the scene volumetrically, and any surface extracted using a level set is a poor approximation of the true color.
Ours
Ours (relaxed) NeRF
Surface rendering Volume rendering
Fig. 10. Volumetric relaxation. We compare reconstructions of our method without and with volume relaxation to NeRF, all implemented in Instant NGP
using the same hyperparameters. While our method achieves comparable visual quality using a surface-based representation, we highlight a region (white
arrow) where it fails to model a semi-transparent object due to the opaque surface assumption. The relaxed variant of our algorithm can recover by adopting
volume rendering in such regions. Rendering the reconstructions using the same ray marching implementation leads to significant performance dierences:
our surface-only reconstruction is 2.6× faster than NeRF. The relaxed variant benefits from the surface representation in most regions, and is 1.7× faster.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Surface rendering
Fig. 11. Eect of Laplacian weight on geometry reconstruction. The above results demonstrate the trade-o between geometric detail and surface
smoothness. For simple scenes lacking intricate features (boom row), the reconstruction is insensitive to this hyperparameter.
Fig. 12. Reconstruction showcase. Surface rendering and normals of various scenes from the DTU and BlendedMVS datasets, reconstructed with our
algorithm and a decaying Laplacian. All results were generated with the same hyperparameters and a training time of 1 minute per scene.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
A Derivation of our loss in NeRF form
Fig. 13. We derive an analytic loss expectation over all possible background
surfaces for a specific candidate position at 𝑡
p
.
Following the introduction of the radiance eld loss and the
stochastic background, a naive implementation of our method would
rst sample a surface
M
b
from a distribution as the perturbation
background, and for each sampled
M
b
, we need to sample multiple
candidates to solve the non-local perturbation problem. Naively
applying this strategy would result in inecient implementation. In
the following, we derive an expectation of losses over all potential
background surfaces (Figure 13) for a specic candidate.
Let
𝑓
b
be the probability distribution of the background surface
along a ray, where
𝑡
max
0
𝑓
b
(𝑡)
d
𝑡 =
1. Without loss of generality, we
focus on one candidate position p at distance
𝑡
p
along the ray. To
avoid clutter, we denote the color error metric
(𝐿, 𝐿
target
)
as
(𝐿)
.
The expectation of all losses in the form of Equation 2 of the main
text, local at p, is given by:
E[L(p)] =
𝑡
max
𝑡
p
L(p) 𝑓
b
(𝑡) d𝑡
=
𝑡
max
𝑡
p
𝛼
p
(𝐿
p
) + (1 𝛼
p
) (𝐿
𝑡
)
𝑓
b
(𝑡) d𝑡
=
1
𝑡
p
0
𝑓
b
(𝑡) d𝑡
𝛼
p
(𝐿
p
) +
(1 𝛼
p
)
𝑡
max
𝑡
p
(𝐿
𝑡
) 𝑓
b
(𝑡) d𝑡. (6)
Let
E
𝑡 >𝑡
p
[ (𝐿
𝑡
)]
be the expectation of error metrics for
𝑡 > 𝑡
p
. We
can rewrite the result as:
E[L(p)] =
1
𝑡
p
0
𝑓
b
(𝑡) d𝑡

weight
𝛼
p
(𝐿
p
)

candidate
+ (1 𝛼
p
) E
𝑡 >𝑡
p
[ (𝐿
𝑡
)]

background
.
(7)
Equation 7 reects an aggregated form of non-local perturbation,
where the background color is treated as an expectation over all
possible background surfaces rather than a xed value. The weight
term captures the probability of selecting a background surface
located behind the perturbation position. The expectation compu-
tation should not change the loss landscape, so the weight term
should not be dierentiated during optimization.
In the following, we analyze the discrete case of the loss expec-
tation along a ray with
𝑚
sampled points, assuming the free-ight
background distribution
𝑓
b
. For simplicity, we denote the occupancy
and color at position p
𝑖
as
𝛼
𝑖
and
𝐿
𝑖
, respectively. The summation
of all losses is given by:
L
ray
=
𝑚
𝑖=1
ˆ
E[L(p
𝑖
)] (8)
=
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)

weight
𝛼
𝑖
(𝐿
𝑖
)

candidate
+(1 𝛼
𝑖
)
ˆ
E
𝑡 >𝑡
𝑖
[ (𝐿
𝑡
)]

background
,
where
ˆ
E
𝑡 >𝑡
𝑖
[ (𝐿
𝑡
)] =
𝑚
𝑗=𝑖+1
𝑗 1
𝑡=𝑖+1
(1 𝛼
𝑡
)
𝛼
𝑗
(𝐿
𝑗
). (9)
Not all variables in this loss function are meant to be dierenti-
ated. Specically, the weight term and the background are treated
as constants in the optimization process and are excluded from
dierentiation (i.e., detached). To indicate which terms should be
dierentiated, we underline them in the derivation as (·).
Below, we reformulate the loss function into a structure similar
to the NeRF loss function. We take some notational liberty of using
argmin
to transform the equation in such a way that its derivatives
and the location of minima are preserved, but the loss value may
dier by a constant value.
Equation 8 then becomes equivalent to minimizing:
argmin
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
𝛼
𝑖
(𝐿
𝑖
) + (1 𝛼
𝑖
)
ˆ
E
𝑡 >𝑡
𝑖
[ (𝐿
𝑡
)]
= argmin
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
𝛼
𝑖
(𝐿
𝑖
)

(a)
+
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
𝛼
𝑖
(𝐿
𝑖
)

(b)
+
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
(1 𝛼
𝑖
)
ˆ
E
𝑡 >𝑡
𝑖
[ (𝐿
𝑡
)]
,

(c)
(10)
where the terms
(a)
and
(b)
arise from an application of the product
rule. Reordering the double summation in term (c) yields:
(c)
(9)
=
𝑚
𝑖=1
𝑚
𝑗=𝑖+1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
(1 𝛼
𝑖
)
𝑗 1
𝑡=𝑖+1
(1 𝛼
𝑡
)
𝛼
𝑗
(𝐿
𝑗
)
=
𝑚
𝑗=1
𝑗 1
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
(1 𝛼
𝑖
)
𝑗 1
𝑡=𝑖+1
(1 𝛼
𝑡
)
𝛼
𝑗
(𝐿
𝑗
)
.
(11)
We rename the indices
𝑖 𝑗
and subsequently simplify the expres-
sion to:
(11) =
𝑚
𝑖=1
𝑖1
𝑗=1
𝑗 1
𝑘=1
(1 𝛼
𝑘
)
(1 𝛼
𝑗
)
𝑖1
𝑡=𝑗 +1
(1 𝛼
𝑡
)
𝛼
𝑖
(𝐿
𝑖
)
=
𝑚
𝑖=1
𝑖1
𝑗=1
𝑖1
𝑘=1
𝑘 𝑗
(1 𝛼
𝑘
) (1 𝛼
𝑗
) 𝛼
𝑖
(𝐿
𝑖
). (12)
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
NeRF Free-flight Color-dependent
Dense initialization
Optimization states
Fig. 14. Benefits of the color-dependent background distribution.
(Top) Aer 2000 iterations, the color-dependent variant explores further
along the ray and clears over aggressively optimized regions faster. (Boom)
In a contrived experiment where the scene is densely initialized, the opti-
mizer first aempts to bake the images onto the cube. The color-dependent
variant can penetrate high-occupancy regions, while others get stuck. All
experiments are conducted with the same hyperparameters in the Instant
NGP [Müller et al. 2022] codebase.
Combining the term
(c)
in the form of Equation 12 with the term
(b), we obtain:
(b)+(c) =
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
) 𝛼
𝑖
+
𝑖1
𝑗=1
𝑖1
𝑘=1
𝑘 𝑗
(1 𝛼
𝑘
) (1 𝛼
𝑗
) 𝛼
𝑖
(𝐿
𝑖
)
=
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
) 𝛼
𝑖
(𝐿
𝑖
)
+ 𝑐
1
, (13)
where
𝑐
1
is a constant value. Finally, we can insert this result back in
Equation 10 to obtain a loss where all variables can be dierentiated:
argmin (a) +(b) +(c)
(13)
= argmin
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
) 𝛼
𝑖
(𝐿
𝑖
) +
𝑖1
𝑘=1
(1 𝛼
𝑘
) 𝛼
𝑖
(𝐿
𝑖
)
= argmin
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
) 𝛼
𝑖
(𝐿
𝑖
)
. (14)
This nal result is equivalent to the one shown in Figure 2 of the
main document. It is now also apparent that we do not need to
produce extra samples to evaluate E
𝑡 >𝑡
p
[ (𝐿
𝑡
)].
B Design space of the background distribution
Section 3.3 of the main document proposes the stochastic back-
ground surface. Designing the background surface distribution
𝑓
b
(Figure 13) involves a tradeo between exploitation and exploration,
oering a wide design space.
On one hand, choosing a background surface close to the model’s
current best guess (i.e., in high occupancy regions) ensures that
only perturbations with an improvement will be accepted. An ex-
ample is the deterministic strategy to always use the 0
.
5 level set. It
aggressively selects the rst potential surface with more than 50%
condence along the ray as the background, ignoring any further
possibilities.
On the other hand, exploring more possibilities enhances the
algorithm’s robustness in scenes with complex occlusions. The free-
ight background distribution is a softer version of the deterministic
strategy. Instead of using a threshold to binarize the occupancy eld,
it stochastically decides whether to use a surface as the perturbation
background during ray traversal.
Many other designs are possible. One unique to our method is the
color-dependent background distribution. Unlike the free-ight dis-
tribution, which relies only on the occupancy value, this approach
also considers how well each potential surface aligns with the target
color, measured by
(𝐿
p
, 𝐿
target
)
. The additional information enables
us to discard high-occupancy surfaces that poorly match the target
color, which may result from overly aggressive optimization. Specif-
ically, we compute an eective occupancy
𝛼
, as a modication of
the original occupancy 𝛼:
𝛼
=
𝛼
1 + 𝑐 (𝐿
p
, 𝐿
target
)
.
When the color matches well, the transformation is neutral, but for
misaligned colors, it reduces the eective occupancy, lowering the
likelihood of selecting such surfaces as a background. In our experi-
ments, we used
𝑐 =
16. As shown in Figure 14, this color-dependent
distribution can be more ecient when penetrating incorrect sur-
faces.
This expanded design space is particularly compelling. Traditional
reconstruction methods mostly optimize in image space, interacting
with the 3D scene only through a rendering algorithm (surface-
based or volume-based). As a result, the design space in image space
is quite limited, with little to do beyond computing a loss.
In contrast, our method operates directly in the scene space. Back-
ground distributions can be tailored to focus on regions of inter-
est. This exibility enables the development of new optimization
strategies that are unattainable in image-space methods. The color-
dependent background distribution is an example to actively guide
the optimization to skip regions that are believed to be wrong re-
gardless of the occupancy value.
In this paper, we focus on the free-ight distribution to highlight
the dual-loss relationship with NeRF, leaving the exploration of
other distributions for future work. Only the result in Figure 14 uses
the color-dependent distribution.
C Additional experiments and results
Interior topology changes. Methods based on local surface evolu-
tion struggle with interior topological changes, like transforming a
sphere into a torus. Indeed, they primarily rely on deforming visi-
bility silhouettes to change the overall shape, but these silhouettes
often do not exist in regions away from the outer contour.
Correctly handling such topological changes requires making a
signicant modication, such as cutting a cone through the entire
object to expose the occluded background. This type of change is
beyond the reach of common derivative-based methods, which can
only account for innitesimal perturbations.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Interior
topological change
Mehta et al
.
[2023] propose a cone-shaped perturbation strategy
to test whether exposing the background improves the match to
the target color in the application of physically based rendering.
This approach signicantly improves convergence in scenes that re-
quire hole penetration, compared to conventional surface evolution
methods.
However, this strategy can also aect scenes where the topology
is already correct. In such cases, only local renement is needed,
and the cone perturbation may bias the derivative in the wrong
direction. Additionally, the cone perturbation strategy can only
penetrate a single obstacle, limiting its ability to handle complex
real-world scenes that require penetration through multiple layers of
geometry (Figure 15). Our stochastic background strategy addresses
these challenges by considering additional background possibilities,
enabling more robust optimization for complex scenes.
Rendering. Once the occupancy eld trained with our algorithm
has converged, it should have value 0 in empty space and 1 on the
surface. Since our eld storage is continuous in practice, we aim
for a near Heaviside step function on the surface. In Figure 20, we
show an additional level set rendering result for an outdoor scene
to demonstrate that our method can achieve this, with any level set
being usable. In this paper we use 0.5 as the threshold.
We propose two methods for rendering the level set. The rst
method involves ray marching with a small step size. In this ap-
proach, we immediately return the color of the rst sample point
that hits the surface (when occupancy exceeds 0
.
5), without any
weighting or color blending. The second method involves extracting
a triangle mesh using marching cubes or TSDF fusion, then rasteriz-
ing the mesh to obtain the hit point location and querying the color
network for the nal color.
Both methods produce nearly identical visual results, as shown
in Figure 16.
Codebase. Our work primarily focuses on the theoretical devel-
opment of a surface-based scene reconstruction algorithm, while
the specics of the model implementation are largely independent
of our core algorithm. For example, the Instant NGP codebase is
optimized for speed and designed for object-centric scenes, resulting
in suboptimal details in the far eld background (Figure 17). Our
results inherit these advantages and limitations.
Decaying Laplacian. For simple scenes with sucient observa-
tions, Laplacian smoothing as a post-processing step can eectively
rene surface geometry. However, this approach has limitations
in more challenging scenarios. As shown in Figure 18, we analyze
a highly underconstrained scene with shiny surfaces that exhibit
rapid color changes with viewing angle, captured only from the
front. Here, training without Laplacian smoothing achieves good
novel view synthesis but results in geometry errors, particularly at
the can’s bottom.
Cone
Optimization states
Ours
Initial state
High
occupancy
Low
occupancy
Fig. 15. Le: We test interior topological changes in a scene where orange
beer aligns with the target background color than indigo. Right: We show
optimization states by visualizing a 2D slice of the occupancy field. The
cone perturbation strategy [Mehta et al
.
2023] gets stuck aer penetrating
the torus once, as it can only see through a single obstacle.
Ray marching Rasterization
Fig. 16. Visual comparison of the same surface scene trained with our
algorithm using two rendering methods: ray marching (le) and mesh
rasterization (right). Both methods give nearly identical results.
Applying a Laplacian as a post-processing step requires numer-
ous iterations to address these issues and may degrade geometry
in other regions. In contrast, training our algorithm with an expo-
nentially decaying Laplacian is more ecient. Consequently, the
results in Figure 12 of the main document are obtained by training
our algorithm with an exponentially decaying Laplacian.
Training time. Figure 19 shows the loss convergence plot in the
Instant NGP codebase, demonstrating that our method converges at
a rate comparable to NeRF despite its surface reconstruction nature.
Like NeRF, our method computes the loss in linear time only using
per-sample occupancy and color values along a ray. Theoretically,
this ensures it is at least as fast as NeRF. However, the observed
increase in training time arises from INGP’s training strategy, which
targets a xed sample size per iteration (we use the default value 2
18
)
by spawning as many rays as needed. Since our method reconstructs
surfaces, it typically requires fewer samples along rays in near-
converged regions, allowing more rays to be processed within the
same sample budget. In practice, the increase in ray count causes
INGP to become slower.
This slowdown is a consequence of INGP’s implementation rather
than a limitation of our method. In fact, our method’s eciency in
using fewer resources per ray is advantageous. This also explains
why our relaxed variant achieves a higher PSNR than NeRF: it
utilizes the same sample budget to visit more reference pixels per
iteration.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
ZipNeRF codebase Instant NGP codebase
(hours)
(3 minutes)
NeRFOursZoom in
Fig. 17. alitative comparison of NeRF and our method in two codebases.
Miscellaneous. Figure 20 shows additional level set rendering
results for an outdoor scene. Table 3, Table 4 and Table 5 show the
complete PSNR, SSIM and LPIPS results in the Instant NGP codebase.
Table 6 shows the complete PSNR results in the ZipNeRF codebase.
Table 7 shows the complete Chamfer distance results on the DTU
dataset.
D Volume relaxation
This section details a heuristic-based volume relaxation of our
method. While we do not claim this to be the only way to relax our
method, it provides a straightforward and eective way to retain
the surface-like properties of the scene while enabling volumetric
blending in regions where the surface representation is insucient.
We propose the following loss function as a relaxed volumetric
version of our loss in the form of Equation 8. The notation
(·)
is
consistent with Appendix A, denoting terms that are dierentiated
during optimization:
L
vol
ray
=
𝑚
𝑖=1
𝑖1
𝑘=1
(1 𝛼
𝑘
)
𝛼
𝑖
𝐿
𝑖
+ (1 𝛼
𝑖
) E
𝑡 >𝑡
𝑖
[𝐿
𝑡
], 𝐿
goal
,
(15)
where the error metric
now compares against a modied target
color 𝐿
goal
:
𝐿
goal
=
𝐿
target
𝐿
prev
𝑇
prev
=
𝐿
target
𝑖1
𝑗=1
𝑗 1
𝑘=1
(1 𝛼
𝑘
)
𝛼
𝑗
𝐿
𝑗
𝑖1
𝑗=1
(1 𝛼
𝑗
)
. (16)
Equation 15 is derived from two key modications to the radiance
eld loss (Equation 8):
We now blend colors instead of error metrics to allow for
volumetric blending for the 𝑖-th sample.
(a) No laplacian
(b) Post-process laplacian
low weight high weight
(c) Decaying laplacian
Fig. 18. For highly underconstrained scenes with shiny surfaces and limited
viewing angles, training our algorithm with an exponentially decaying
Laplacian is more eective than applying Laplacian as a post-processing
step.
Time (s)
Time (s)
Time (s)
Time (s)
BGR K
Loss (Huber)Loss (Huber)Loss (Huber)Loss (Huber)
Ours
NeRF
Ours
NeRF
Ours
NeRF
Ours
NeRF
Fig. 19. Equal time convergence plot. Our method converges at a rate
comparable to NeRF in the Instant NGP codebase. We have a longer tail in
the loss curve since our method spawns more rays per iteration than NeRF.
The
𝑖
-th sample no longer needs to match the target color
𝐿
target
directly. Instead, its goal adjusts for the color contri-
bution of prior samples
𝐿
prev
and the transmittance from the
camera to the 𝑖-th sample 𝑇
prev
.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Empirically, this relaxed loss performs well as a volume reconstruc-
tion algorithm. However, when used to rene a converged surface
scene, this loss often converts the entire scene into a volumetric
representation, even in regions where the surface representation is
already visually adequate. This happens because a surface represen-
tation is essentially a special case of a volume with fewer degrees of
freedom, and tting colors in a volume generally reduces the loss
more easily than tting colors on a surface.
To prevent over-relaxation, we propose a heuristic to detect loca-
tions where volume relaxation is unnecessary. Specically, when
the local loss without blending at a specic position is no worse
than the local loss with blending:
(𝐿
𝑖
, 𝐿
goal
)
𝛼
𝑖
𝐿
𝑖
+ (1 𝛼
𝑖
) E
𝑡 >𝑡
𝑖
[𝐿
𝑡
], 𝐿
goal
, (17)
we use the local loss without blending in Equation 15. This compar-
ison does not introduce any overhead, as all necessary values are
already available. Our experimental results show that this heuristic
is eective in preserving Heaviside-like occupancy values in most
areas while allowing for volumetric blending in challenging regions
(Figure 20).
We highlight again that the volume relaxation step is a heuristic
and not a fundamental part of our method. All results are obtained
without this relaxation in this paper unless explicitly stated.
E Implementation details
All results were generated and measured on a Linux workstation
with an AMD Ryzen 7950X processor and an NVIDIA RTX 4090
graphics card.
Instant NGP codebase. We used the default hyperparameter con-
guration le (base.json) provided by the authors and retained the
original sampling strategy. However, we made two key modica-
tions to the codebase to accommodate our method:
We reduced the ray marching step size from 1
/
1024 to 1
/
2048
to achieve a ner surface resolution.
The maximum buer size for storing temporary samples was
increased from 16
×target batch size
to 128
×target batch size
to accommodate the increased number of rays spawned in
each iteration.
Since INGP does not natively support automatic dierentiation,
we manually implemented the derivative propagation of our method
into the codebase, similar to how the framework trains NeRF.
For the geometry reconstruction experiments shown in Figure 12
(main document), we used a
𝐿
1
loss to improve convergence in dark
regions. Models were trained for 10000 iterations (reduced from the
default 35000), with the Laplacian weight decaying exponentially to
2
×
10
5
. The Laplacian was estimated via nite dierences using
six neighboring samples with an epsilon of 1
/
1024 (approximately
1 mm for a unit cube).
Rendering times were measured without DLSS.
ZipNeRF codebase. We used the default hyperparameter congura-
tion le (360.gin) along with the original adaptive sampling strategy.
As ZipNeRF’s adaptive sampling is tailored for volume reconstruc-
tion, it may not be optimal for our method. However, we deliberately
avoided modifying these components to minimize intrusive changes
and focus on proof-of-concept validation.
Warm start. During training, our algorithm can sometimes push
occupancy values in certain regions (e.g., peripheral or camera-
adjacent areas) too high in early stages, resulting in oaters in the
nal reconstruction. This occurs because the background is insuf-
ciently explored at the beginning, leading to overly aggressive
optimization of temporarily superior candidates. While NeRF en-
counters similar issues, recovery is particularly challenging in our
case since occupancy values of these oaters could approach 1.
For INGP training, we can mitigate this issue by adjusting the
learning rate schedule at the cost of slower convergence. Empirically,
we found it also eective to impose a moving upper bound on
occupancy values, gradually relaxing this constraint during training.
Specically, at iteration
𝑖
, we bound the occupancy value by
𝛼
max
=
0
.
1
+
0
.
9
× 𝑖/
1000. This constraint is active only during the rst
1000 iterations, corresponding to the rst few seconds of training.
Additionally, we observed that our relaxed training strategy is less
prone to oaters. For novel view synthesis tasks, we trained the
relaxed variant for 5000 iterations as a warm start.
The oater issue also pops up in the ZipNeRF codebase. For sim-
plicity, we adopted a NeRF training warm start during the rst 5%
of training iterations and did not bound occupancy values.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Ours Ours (relaxed) NeRF
Surface rendering at varying occupancy levels
Surface / volume
rendering
Fig. 20. Surface rendering at varying level sets of a scene reconstructed by our method and NeRF, using the same hyperparameters. Only the two images
with orange borders are rendered volumetrically. Le: For our method, the surface rendering shows minimal changes across dierent level set thresholds,
indicating that the occupancy field has converged to a near-Heaviside step function on the surface. Middle: The relaxed variant of our algorithm uses
volume representation in challenging regions, such as sub-pixel details (yellow arrow). The overall scene remains surface-like, leading to beer ray marching
performance than NeRF. Right: The NeRF reconstruction is inherently volumetric, thus renderings of level sets do not produce meaningful visualizations.
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.
Table 3. PSNR comparison using the Instant NGP codebase. Ours uses surface rendering, while the relaxed variant and NeRF use volume rendering.
Bicycle Bonsai Counter Garden Kitchen Room Stump Flowers Treehill
Ours 22.53 31.22 26.67 23.81 28.58 29.59 24.16 19.76 21.79
Ours (relaxed) 22.66 31.81 26.95 24.04 29.14 29.75 24.43 19.98 21.97
NeRF 22.66 31.45 26.79 23.97 29.33 29.17 23.96 19.95 21.82
Table 4. SSIM comparison using the Instant NGP codebase.
Bicycle Bonsai Counter Garden Kitchen Room Stump Flowers Treehill
Ours 0.673 0.918 0.872 0.686 0.866 0.896 0.769 0.577 0.692
Ours (relaxed) 0.682 0.927 0.882 0.695 0.878 0.902 0.784 0.590 0.698
NeRF 0.675 0.924 0.877 0.687 0.877 0.893 0.776 0.586 0.692
Table 5. LPIPS comparison using the Instant NGP codebase.
Bicycle Bonsai Counter Garden Kitchen Room Stump Flowers Treehill
Ours 0.578 0.241 0.315 0.547 0.236 0.306 0.475 0.618 0.599
Ours (relaxed) 0.642 0.244 0.335 0.672 0.234 0.324 0.497 0.676 0.645
NeRF 0.658 0.256 0.354 0.625 0.239 0.362 0.514 0.699 0.692
Table 6. PSNR comparison using the ZipNeRF codebase.
Bicycle Bonsai Counter Garden Kitchen Room Stump Flowers Treehill
Ours 24.10 31.24 26.38 26.14 30.22 31.07 25.96 20.99 23.12
NeRF 25.50 33.20 28.16 27.62 32.01 32.44 27.11 22.11 23.85
Table 7. Chamfer Distance comparison on the DTU dataset with NeuS [Wang et al. 2021] and NeuS2 [Wang et al. 2023].
Scan24 Scan37 Scan40 Scan55 Scan63 Scan65 Scan69 Scan83
Ours (1 minute) 0.81 0.77 0.66 0.40 1.08 0.90 0.88 1.42
NeuS (8 hours) 0.83 0.98 0.56 0.37 1.13 0.59 0.60 1.45
NeuS2 (5 minutes) 0.56 0.76 0.49 0.37 0.92 0.71 0.76 1.22
Scan97 Scan105 Scan106 Scan110 Scan114 Scan118 Scan122
Ours 1.20 0.75 0.68 1.07 0.61 0.55 0.63
NeuS 0.95 0.78 0.52 1.43 0.36 0.45 0.45
NeuS2 1.08 0.63 0.59 0.89 0.40 0.48 0.55
SIGGRAPH Conference Papers ’25, August 10–14, 2025, Vancouver, BC, Canada.