-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve support for cluster simplification #704
Conversation
This is an important building block for Nanite, as it requires a repeated application of buildMeshlets => simplify (with additional algorithms that meshoptimizer currently doesn't support, like grouping meshlets based on proximity). This can be used to refine future improvements to simplification like sparsity, as well as serve as a basic example. For simplicity we use LockBorders flag instead of locking boundary vertices manually via meshopt_simplifyWithAttributes; from production perspective both options are viable.
When simplifying a small subset of the larger mesh, all computations that go over the entire vertex buffer become expensive; this adds up even when done once, and especially when done for every pass. This change introduces a sparse simplification mode that instructs the simplifier to optimize based on the assumption that the subset of the mesh that is being simplified is small. In that case it's worth spending extra time to convert indices into a small 0..U subrange, do all internal processing assuming we are working with a small vertex/index buffer, and remap the indices at the end. While this processing could be done externally, that is less efficient as it requires constant copying of position/attribute data; in constrast, we can do it fairly cheaply. We need to take sparse_remap into account for any code that indexes input position/attribute data; buildPositionRemap needs this as well as it currently works off of original (unscaled) data. buildSparseRemap does need to perform O(dense) work if we want to avoid using hash maps for deduplication/filtering; this change uses a small zeroed array and a large sparsely-initialized array to reduce the cost. This is subject to future improvements (although is fairly performant as it is when simplifying 500-triangle subsets of a 800K triangle mesh).
This reduces the high watermark and may allow reusing the deallocated space for the actual simplifier state. Ideally we would reduce the space consumed here further, but that requires a hash map so may not be ideal.
We have two parts of buildSparseRemap that are still dependent on the number of vertices wrt complexity, filter clearing and revremap allocation. We could replace revremap with a hash map if a large allocation proves to be problematic, but it also might work fine in practice - whereas filter[] clearing is a cost we must pay and for small subsets of large meshes this can be 20% of the entire simplification. To fix this we now use a bit set which is 8x cheaper to clear (the actual addressing gets more expensive but it's a fraction of the cost due to sparsity assumption).
classifyVertices references the input vertex_lock[] which is indexed using dense vertex indices so it needs to remap the access index on the fly as well.
When using sparse simplification, the error is treated as relative to the mesh subset. This is a performance requirement as computing the full mesh extents is too expensive when the subset is small relative to the mesh, but it means that it can be difficult to rely on exact error metrics. There are also cases in general, when not using sparse simplification, when an absolute error is more convenient. These can be achieved right now via meshopt_simplifyScale but that is an extra step that is not always necessary. With this change, when meshopt_SimplifyErrorAbsolute flag is used, we treat the error limit and the output error as an absolute distance (in mesh coordinates), and convert it to/from relative using the internal scale factor.
Previously we couldn't really guarantee a sensible error bound or display the resulting deviation, but meshopt_SimplifyErrorAbsolute together with meshopt_simplifyScale makes it easy, so we incorporate that into simplifyClusters.
We collapse a center vertex which introduces an error and check that the error is close to what we would expect. Note that the distance the vertex travels here is 1.0f, not 0.85f, but the errors are evaluated as distances to triangles which makes it smaller.
Note that this change doesn't rebuild the Wasm bundle; the options should just work once that is done.
The test needs to check that positions, attributes and lock flags are all addressed using proper indexing. We do assume that collapses are going in a specific direction (the input data technically permits two collapse directions for each test), if this becomes a problem due to fp instabilities we can tweak the input data then.
I've tested this in Bevy using bevyengine/bevy#13431 and the following diff (probably could have left 0.5 factor in but I am not sure it's correct to apply it!), and after that change simplification is barely visible in the profile - the overall process of data preparation is still not very fast because Bevy's meshlet connectivity analysis ( patchdiff --git a/crates/bevy_pbr/src/meshlet/from_mesh.rs b/crates/bevy_pbr/src/meshlet/from_mesh.rs
index a5ff00fad..9d95978ee 100644
--- a/crates/bevy_pbr/src/meshlet/from_mesh.rs
+++ b/crates/bevy_pbr/src/meshlet/from_mesh.rs
@@ -58,6 +58,8 @@ impl MeshletMesh {
.map(|m| m.triangle_count as u64)
.sum();
+ let scale = simplify_scale(&vertices);
+
// Build further LODs
let mut simplification_queue = 0..meshlets.len();
let mut lod_level = 1;
@@ -82,7 +84,7 @@ impl MeshletMesh {
for group_meshlets in groups.values().filter(|group| group.len() > 1) {
// Simplify the group to ~50% triangle count
let Some((simplified_group_indices, mut group_error)) =
- simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level)
+ simplify_meshlet_groups(group_meshlets, &meshlets, &vertices, lod_level, scale)
else {
continue;
};
@@ -287,6 +289,7 @@ fn simplify_meshlet_groups(
meshlets: &Meshlets,
vertices: &VertexDataAdapter<'_>,
lod_level: u32,
+ scale: f32,
) -> Option<(Vec<u32>, f32)> {
// Build a new index buffer into the mesh vertex data by combining all meshlet data in the group
let mut group_indices = Vec::new();
@@ -299,7 +302,8 @@ fn simplify_meshlet_groups(
// Allow more deformation for high LOD levels (1% at LOD 1, 10% at LOD 20+)
let t = (lod_level - 1) as f32 / 19.0;
- let target_error = 0.1 * t + 0.01 * (1.0 - t);
+ let target_error_rel = 0.1 * t + 0.01 * (1.0 - t);
+ let target_error = target_error_rel * scale;
// Simplify the group to ~50% triangle count
// TODO: Use simplify_with_locks()
@@ -309,7 +313,7 @@ fn simplify_meshlet_groups(
vertices,
group_indices.len() / 2,
target_error,
- SimplifyOptions::LockBorder,
+ SimplifyOptions::LockBorder | SimplifyOptions::Sparse | SimplifyOptions::ErrorAbsolute,
Some(&mut error),
);
@@ -318,9 +322,6 @@ fn simplify_meshlet_groups(
return None;
}
- // Convert error to object-space and convert from diameter to radius
- error *= simplify_scale(vertices) * 0.5;
-
Some((simplified_group_indices, error))
} For the above to work, meshopt-rs needs to get two extra enum entries and that's it. |
Going to mark this as ready to merge although I want to look into using a hash map for second part of buildSparseRemap as on Windows the large allocation is not as fast as I'd like it to be (buildSparseRemap accounts for ~30% of cluster simplification there compared to ~2% on Linux). |
This helps to test sparse simplification on a large variety of meshes. Drive-by: fix simplifyPoints under address sanitizer when the number of points was below the threshold so &indices[0] would be out of bounds.
Instead of using a large uninitialized allocation, we now use a hash table. Under the assumption that a mesh subset is much smaller, we were relying on the efficiency of repeatedly reallocating a large uninitialized segment of memory, which worked well on some systems (Linux) but not as much on others (Windows). Instead, we now minimize the allocation size. This ends up a little slower when the sparsity assumption does not hold, as we no longer use direct indexing. To minimize the impact, we use a hash function that reduces the amount of avalanche so that sequential indices have fewer hash conflicts; it might also make sense to use an identity function here, or a small multiplicative constant. To minimize the impact further, we don't store pairs in the map, and just store the new index; when doing a lookup, due to the unique construction of the hasher we can lookup the dense index despite the fact that the map only stores sparse indices.
When simplifying a small subset of the larger mesh, all computations
that go over the entire vertex buffer become expensive; this adds up
even when done once, and especially when done for every pass. This
is a critical part of some workflows that combine clusterization and
simplification, notably Nanite-style virtual geometry renderers.
This change introduces a sparse simplification mode that instructs the
simplifier to optimize based on the assumption that the subset of the
mesh that is being simplified is small. In that case it's worth spending
extra time to convert indices into a small 0..U subrange, do all
internal processing assuming we are working with a small vertex/index
buffer, and remap the indices at the end. While this processing could be
done externally, that is less efficient as it requires constant copying
of position/attribute data; in constrast, we can do it fairly cheaply.
When using sparse simplification, the error is treated as relative to
the mesh subset. This is a performance requirement as computing the full
mesh extents is too expensive when the subset is small relative to the
mesh, but it means that it can be difficult to rely on exact error
metrics.
There are also cases in general, when not using sparse simplification,
when an absolute error is more convenient. These can be achieved right
now via
meshopt_simplifyScale
but that is an extra step that is notalways necessary.
The new features can be accessed by adding
meshopt_SimplifySparse
andmeshopt_SimplifyErrorAbsolute
bit flags to simplification options.As an example of a performance delta, the newly added
simplifyClusters
demo call takes 17.7 seconds to simplify a 870K triangle mesh one cluster
at a time, with the new sparse mode it takes ~150 msec (100x faster).