SHaRC algorithm integration doesn't require substantial modifications to the existing path tracer code. The core algorithm consists of two passes. The first pass uses sparse tracing to fill the world-space radiance cache using existing path tracer code, second pass samples cached data on ray hit to speed up tracing.
Image 1. Path traced output at 1 path per pixel left and with SHaRC cache usage rightAn implementation of SHaRC using the RTXGI SDK needs to perform the following steps:
At Load-Time
Create main resources:
Hash entries
buffer - structured buffer with 64-bits entries to store the hashesVoxel data
buffer - structured buffer with 128-bit entries which stores accumulated radiance and sample count. Two instances are used to store current and previous frame dataCopy offset
buffer - structured buffer with 32-bits per entry used for data compaction
The number of entries in each buffer should be the same, it represents the number of scene voxels used for radiance caching. A solid baseline for most scenes can be the usage of
⚠️ All buffers should be initially cleared with '0'
At Render-Time
- Populate cache data using sparse tracing against the scene
- Combine old and new cache data, perform data compaction
- Perform tracing with early path termination using cached data
Hash grid
visualization itself doesn’t require any GPU resources to be used. The simplest debug visualization uses world space position derived from the primary ray hit intersection.
GridParameters gridParameters;
gridParameters.cameraPosition = g_Constants.cameraPosition;
gridParameters.logarithmBase = SHARC_GRID_LOGARITHM_BASE;
gridParameters.sceneScale = g_Constants.sharcSceneScale;
float3 color = HashGridDebugColoredHash(positionWorld, gridParameters);
Logarithm base controls levels of detail distribution and voxel size ratio change between neighboring levels, it doesn’t make voxel sizes bigger or smaller on average. To control voxel size use sceneScale
parameter instead. HASH_GRID_LEVEL_BIAS should be used to control at which level near the camera the voxel level get's clamped to avoid getting detailed levels if it is not required.
Instead of the original trace call, we should have the following four passes with SHaRC:
- SHaRC Update - RT call which updates the cache with the new data on each frame. Requires
SHARC_UPDATE 1
shader define - SHaRC Resolve - Compute call which combines new cache data with data obtained on the previous frame
- SHaRC Compaction - Compute call to perform data compaction after previous resolve call
- SHaRC Render/Query - RT call which traces scene paths and performs early termination using cached data. Requires
SHARC_QUERY 1
shader define
The SDK provides shader-side headers and code snippets that implement most of the steps above. Shader code should include SharcCommon.h which already includes HashGridCommon.h
Render Pass | Hash Entries | Voxel Data | Voxel Data Previous | Copy Offset |
---|---|---|---|---|
SHaRC Update | RW | RW | Read | RW* |
SHaRC Resolve | Read | RW | Read | Write |
SHaRC Compaction | RW | RW | ||
SHaRC Render | Read | Read |
Read - resource can be read-only
Write - resource can be write-only
*Buffer is used if SHARC_ENABLE_64_BIT_ATOMICS is set to 0
Each pass requires appropriate transition/UAV barries to wait for the previous stage completion.
⚠️ RequiresSHARC_UPDATE 1
shader define.Voxel Data
buffer should be cleared with0
ifResolve
pass is active
This pass runs a full path tracer loop for a subset of screen pixels with some modifications applied. We recommend starting with random pixel selection for each 5x5 block to process only 4% of the original paths per frame. This typically should result in a good data set for the cache update and have a small performance overhead at the same time. Positions should be different between frames, producing whole-screen coverage over time. Each path segment during the update step is treated individually, this way we should reset path throughput to 1.0 and accumulated radiance to 0.0 on each bounce. For each new sample(path) we should first call SharcInit()
. On a miss event SharcUpdateMiss()
is called and the path gets terminated, for hit we should evaluate radiance at the hit point and then call SharcUpdateHit()
. If SharcUpdateHit()
call returns false, we can immediately terminate the path. Once a new ray has been selected we should update the path throughput and call SharcSetThroughput()
, after that path throughput can be safely reset back to 1.0.
Resolve
pass is performed using compute shader which runs SharcResolveEntry()
for each element. Compaction
pass uses SharcCopyHashEntry()
call.
:tip: Check Resource Binding section for details on the required resources and their usage for each pass
SharcResolveEntry()
takes maximum number of accumulated frames as an input parameter to control the quality and responsivness of the cached data. Larger values can increase the quality at increase response times. staleFrameNumMax
parameter is used to control the lifetime of cached elements, it is used to control cache occupancy
⚠️ SmallstaleFrameNumMax
values can negatively impact performance,SHARC_STALE_FRAME_NUM_MIN
constant is used to prevent such behaviour
⚠️ RequiresSHARC_QUERY 1
shader define
During rendering with SHaRC cache usage we should try obtaining cached data using SharcGetCachedRadiance()
on each hit except the primary hit if any. Upon success, the path tracing loop should be immediately terminated.
To avoid potential rendering artifacts certain aspects should be taken into account. If the path segment length is less than a voxel size(checked using GetVoxelSize()
) we should continue tracing until the path segment is long enough to be safely usable. Unlike diffuse lobes, specular ones should be treated with care. For the glossy specular lobe, we can estimate its "effective" cone spread and if it exceeds the spatial resolution of the voxel grid then the cache can be used. Cone spread can be estimated as:
a
is material roughness squared.
For the rendering step adding debug heatmap for the bounce count can help with understanding cache usage efficiency.
Image 3. Tracing depth heatmap, left - SHaRC off, right - SHaRC on (green - 1 indirect bounce, red - 2+ indirect bounces)Sample count uses SHARC_SAMPLE_NUM_BIT_NUM(18) bits to store accumulated sample number.
:note:
SHARC_SAMPLE_NUM_MULTIPLIER
is used internally to improve precision of math operations for elements with low sample number, every new sample will increase the internal counter by 'SHARC_SAMPLE_NUM_MULTIPLIER'.
SHaRC radiance values are internally premultiplied with SHARC_RADIANCE_SCALE
and accumulated using 32-bit integer representation per component.
:note: SharcCommon.h provides several methods to verify potential overflow in internal data structures.
SharcDebugBitsOccupancySampleNum()
andSharcDebugBitsOccupancyRadiance()
can be used to verify consistency in the sample count and corresponding radiance values representation.
HashGridDebugOccupancy()
should be used to validate cache occupancy. With a static camera around 10-20% of elements should be used on average, on fast camera movement the occupancy will go up. Increased occupancy can negatively impact performance, to control that we can increase the element count as well as decrease the threshold for the stale frames to evict outdated elements more agressivly.
Hash entries
buffer, two Voxel data
and Copy offset
buffers totally require 352 (64 + 128 * 2 + 32) bits per voxel. For