-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace FixedBitSet in Access and FilteredAccess with sorted vectors #16784
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The O(1)'s in the description should be O(n). Adding more elements will always add more to iterate through.
|
||
/// Stores a sorted list of indices with quick implementation for union, difference, intersection. | ||
#[derive(Debug, Clone, Eq, PartialEq)] | ||
pub struct SortedSmallVec<const N: usize>(SmallVec<[usize; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this pub
? Why not pub(crate)
?
This extends to SortedSmallVec::new
and SortedSmallVec::from_vec
sorted_vec | ||
} | ||
|
||
pub(crate) fn ones(&self) -> impl Iterator<Item = usize> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ones
is kinda confusing here. It makes sense in the context of "replacing FixedBitSet", but it doesn't really make sense for this struct.
Can we just say iter
? That's pretty much all this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could even impl IntoIterator for &SortedSmallVec
to allow for x in &ssv
, although that would make you name the iterator type.
|
||
/// Stores a sorted list of indices with quick implementation for union, difference, intersection. | ||
#[derive(Debug, Clone, Eq, PartialEq)] | ||
pub struct SortedSmallVec<const N: usize>(SmallVec<[usize; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like the name here isn't clear. Maybe just SmallVecSet
? SortedSmallVec
doesn't really make it clear that it's a set, and the fact that it's sorted seems like an implementation detail (although this could be made as a guarantee on ones
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the Set
suffix. That better matches names like HashSet
and BTreeSet
. I'd vote for keeping Sorted
in the name and making it part of the contract, since Hash
and BTree
are also implementation details. But SortedSmallVecSet
would be awfully verbose.
} | ||
} | ||
|
||
impl<const N: usize> Extend<usize> for SortedSmallVec<N> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is wrong if other
is unsorted. So we should either make this just a loop of inserts or add a method instead of a trait impl that makes it clear the behavior is undefined if you don't call it with a sorted input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can handle unsorted input and still be reasonably efficient with just:
self.0.extend(other);
self.0.sort();
self.0.dedup();
type Item = usize; | ||
|
||
fn next(&mut self) -> Option<Self::Item> { | ||
let mut res = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need this variable. Just put the var inside the if and return None at the bottom.
fn next(&mut self) -> Option<Self::Item> { | ||
let mut res = None; | ||
while self.i < self.this.len() && self.j < self.other.len() { | ||
if (self.i == 0 || self.this.0[self.i] != self.this.0[self.i - 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this expression. How can [self.i] == [self.i - 1]
? We should never have the same value twice.
(If we change anything here, we should mirror it with Difference
)
impl<'a, const N: usize> From<Intersection<'a, N>> for SortedSmallVec<N> { | ||
fn from(intersection: Intersection<'a, N>) -> Self { | ||
let mut sorted_vec = SortedSmallVec::new_const(); | ||
for value in intersection.into_iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can use a cheaper impl here. Since we know Intersection returns elements in sorted order, we can just SmallVec::from_iter(intersection)
I think.
impl<'a, const N: usize> From<Difference<'a, N>> for SortedSmallVec<N> { | ||
fn from(difference: Difference<'a, N>) -> Self { | ||
let mut sorted_vec = SortedSmallVec::new_const(); | ||
for value in difference.into_iter() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here, we don't need to insert, we can just move elements in order.
.intersect_with(&other.component_read_and_writes); | ||
self.component_read_and_writes = self | ||
.component_read_and_writes | ||
.intersection(&other.component_read_and_writes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why implement union_with
, but not intersect_with
or difference_with
?
} | ||
} | ||
|
||
impl<T: SparseSetIndex> From<Vec<T>> for AccessConflicts { | ||
fn from(value: Vec<T>) -> Self { | ||
Self::Individual(value.iter().map(T::sparse_set_index).collect()) | ||
let mut conflicts = SortedSmallVec::new_const(); | ||
for index in value.iter().map(|a| T::sparse_set_index(a)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to impl FromIterator
so you can continue to use collect()
? I think that could also replace the various From
implementations. And the implementation could call sort()
and dedup()
rather than doing an insertion sort like this, since it wouldn't need the intermediate states to be sorted.
/// Construct an empty vector | ||
/// | ||
/// This is a `const` version of [`SortedSmallVec::new()`] | ||
pub(crate) const fn new_const() -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make a single pub const fun new()
?
/// Duplicates are removed. | ||
pub fn from_vec(vec: Vec<usize>) -> Self { | ||
let mut sorted_vec = Self(SmallVec::with_capacity(vec.len())); | ||
for value in vec { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be more efficient to sort()
and dedup()
the vec
so that you aren't doing an O(N^2) insertion sort.
... Ah, this is only used in some unit tests with three-element Vec
s, so maybe it doesn't matter. Or, if you impl FromIterator
, you can remove this in favor of collect()
.
match self.0[i].cmp(&other.0[j]) { | ||
Ordering::Less => i += 1, | ||
Ordering::Greater => { | ||
self.0.insert(i, other.0[j]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
insert
is O(N), since it shifts all the later elements, so doing it in a loop like this is O(N^2).
If you want to do this in-place, then I think you want to first extend self
to be long enough to hold both sets, then copy in place from one side of the buffer to the other, then shrink anything extra. ... That might be easier to explain with code:
// Make space at the start of the buffer big enough to hold `other`.
// The values don't matter because we're going to overwrite them,
// so we do this by copying the values from `other`
self.0.insert_from_slice(0, &other.0);
let mut i = other.len(); // Index to read from `self`
let mut j = 0; // Index to read from `other`
let mut w = 0; // Index to write to `self`
// Note that we're both reading and writing from `self.0`,
// but `i` is always `>= w`, so we never overwrite the original value.
while i < self.len() && j < other.len() {
match self.0[i].cmp(&other.0[j]) {
Equal => { self.0[w] = self.0[i]; w += 1; i += 1; j += 1; }
Less => { self.0[w] = self.0[i]; w += 1; i += 1 }
Greater => { self.0[w] = other.0[j]; w += 1; j += 1 }
}
}
if j < self.len() { /* something with copy_from_slice (make sure to update w) */ }
if i < self.len() { /* something with split_at_mut and copy_from_slice (make sure to update w) */ }
// Remove any extra space caused by dropping duplicates
self.0.truncate(w);
Another option is simply self.extend(other.ones())
, if you change extend()
to use sort()
. The stdlib sort says it is "very fast in cases where the slice is nearly sorted, or consists of two or more sorted sequences concatenated one after another", so I would expect it to still be O(N), but with a higher constant factor.
|
||
/// Stores a sorted list of indices with quick implementation for union, difference, intersection. | ||
#[derive(Debug, Clone, Eq, PartialEq)] | ||
pub struct SortedSmallVec<const N: usize>(SmallVec<[usize; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth generalizing the usize
to a type parameter? I think SortedSmallVec<A: Array<Item: Copy + PartialEq>>(SmallVec<A>)
might work without needing to change any method bodies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it and decided against it. It isn't needed now and I don't think it'll be needed in the near future. If anyone were to need a generalized version later, it would be easy to implement.
Right now it seems like overengineering.
sorted_vec | ||
} | ||
|
||
pub(crate) fn ones(&self) -> impl Iterator<Item = usize> + '_ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could even impl IntoIterator for &SortedSmallVec
to allow for x in &ssv
, although that would make you name the iterator type.
|
||
/// Stores a sorted list of indices with quick implementation for union, difference, intersection. | ||
#[derive(Debug, Clone, Eq, PartialEq)] | ||
pub struct SortedSmallVec<const N: usize>(SmallVec<[usize; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for the Set
suffix. That better matches names like HashSet
and BTreeSet
. I'd vote for keeping Sorted
in the name and making it part of the contract, since Hash
and BTree
are also implementation details. But SortedSmallVecSet
would be awfully verbose.
} | ||
} | ||
|
||
impl<const N: usize> Extend<usize> for SortedSmallVec<N> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can handle unsorted input and still be reasonably efficient with just:
self.0.extend(other);
self.0.sort();
self.0.dedup();
|
||
/// Stores a sorted list of indices with quick implementation for union, difference, intersection. | ||
#[derive(Debug, Clone, Eq, PartialEq)] | ||
pub struct SortedSmallVec<const N: usize>(SmallVec<[usize; N]>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to add some unit tests for this type.
This ci failure seems real. |
Adopted from #14385.
Objective
As described in the relations RFC.
The access bitsets are currently efficient because the ComponentIds are dense: they are incremented from 0 and should remain small.
With relations, a ComponentId will have some upper bits 1, so the amount of memory allocated in the FixedBitSet to represent the value would be non-trivial.
Solution
We replace
FixedBitSets
with sorted vectors.The vectors should remain relatively small since queries usually don't involve hundreds of components.
These allow us to do union, difference, and intersection operations in O(1).
Inserting a value, however, is O(1).
Testing
TODO (I can't figure out the bencher)
Migration Guide
TODO (is it really necessary?)