Convert cmap shards to an rwmutex #2944
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Was noodling around with the mutex profiler post #2942 (which it pinpointed pretty well, at least when invoked in the right way) and found it pointing surprisingly hard at cmap, which I'd less expected given we have a fair number of shards. Turns out that it hotspots quite a bit - it seems a bit less evenly distributed than I would have thought, but think the main issue is that some targets are just accessed way more (because very many things depend on them). Also turns out our access pattern is very read-heavy; there are ~2 orders of magnitude more reads than writes. It's a bit fiddly since we also technically write on the read path to add the wait channel, but there are far more read accesses that access an existing element than ones that don't.
This converts the mutex to an RWMutex; it slightly trades off the 'get a non existing item' path (which now needs an extra map lookup) for optimising the 'get an existing item' path to be able to happen in parallel. This drops it right out of the contention profile, and the new benchmark is more than twice as good:
vs.
Theoretically we could do slightly better if we had an upgradeable RWMutex (i.e. a path where, we can go from holding the read lock to holding the write lock without unlocking) but I don't think Go's can support that as is (it seems very hard; Boost has something that seems to do it but I'm not really across how it actually works without deadlocking).
I also looked at using
sync.Map
but it dies inSet
when not overwriting and it closes the channel twice (because two things simultaneously retrieve the previous value). I haven't figured out a good way around that but it does seem faster.