-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
memmap refactoring #1265
base: master
Are you sure you want to change the base?
memmap refactoring #1265
Conversation
Hey Sam, just as an fyi and to support the need for this refactor I was doing some emg processing (outside of the spike sorting world) and saw the same thing where I import neo in a function create a reader and then return the analog emg data for post-processing and it seems that as long as my IPython kernel is active the memmap and header information aren't releasing (ie, saturating the RAM eventually). Despite the reader being "cleaned up" after the function if I call the same function on the binary file it runs significantly faster (seems like the header and memmap have been cached). Closing the session finally releases the RAM. On a windows for this work currently (which does annoyingly hold onto files way longer than Mac and Linux do). |
@zm711 thanks for the feedback. So do you this this patch will help your case ? |
@samuelgarcia |
I tried the I do have some The memory getting locked up is a bit inconsistent so it's hard for me to know for sure what is causing it and it is rare enough that hunting it down isn't hugely important to us. So I'm happy to try new implementations to see if anything helps, but I don't want to bog down this PR with memory leak stuff so if we want to continue this we could chat on #684? |
Hello, |
Hi @samuelgarcia The tests in the current master are passing again. You might need to rebase / merge for proper CI testing. |
Hi @samuelgarcia I did not check the memory usage (which I didn't specifically have issues with), but relative to the parent commit (2d63e18) this fixes a For instance this would raise the following error:
And is fixed by this PR. I did not have to check out PR 1602 on spikeinterface for this behavior. Let me know if I can help somehow! Looking forward to seeing it merged |
Salut Tom. |
Hi @samuelgarcia |
memmap = self._memmaps[seg_index, stream_id] | ||
#~ memmap = self._memmaps[seg_index, stream_id] | ||
key = (seg_index, stream_id) | ||
memmap = create_memmap_buffer(*self._memmap_args[key]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use get_memmap_shape
to avoid creating a buffer here?
This is a proposal to change the actual memap behavior in rawio layer.
spikeinterface use neo.rawio a lot with extenssive IO demand with parralel read/computing.
Many users experience big memmory saturation even on big machine when the file are enormous.
This quite a bit OS dependant, mainly windows but even some linux have this problem).
Somehow, the memmap do not release page after the use of one buffer and so the memory increase forever.
The actual strategy we now have in almost all IO when it is binary base :
After consumming the traces chunk the memory should be relased and so the memmory usage should low but in fact in some case it increase until saturating.
@h-mayorquin and @samuelgarcia made some benchmark to find a better strategy.
Here some code examples:
https://gist.github.com/h-mayorquin/8b8d1899c724690ce7c18fb49f56af82
The best strategy we find to try to fix is:
For the 2. we mimic the numpy.memmap but a but simplify to be faster.
The speed of reading is almost the same and the memory problem seems to be fixed.
@maxjuv @TomBugnon @vncntprvst @juliencarponcy: we need more tests!
I started to change spikeglx and openephysrawbinary.
Could you try to combine this branch and also this PR on spikeinterface
And then:
And monitor the memory usage.
Do you still have an infinite memory increase or not ?