-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
findsoln leaks memory when compiled with MPI #12
Comments
You can try to compile the code in |
Ok, I now have a huge output file (200Mb!) with a load of stacktraces. The output is somewhat dodgy due to running on 28 processors, but essentially every section looks something like this
So I suppose something odd is happening with fftw_initialize trying to overwrite existing Fourier transform data? I will try to look into this tomorrow, though I am probably not the best person to debug this. |
Oh by the way, for future reference, I had to compile with |
Thanks for spotting and narrowing down this issue. I'm surprised that a FlowField constructor is being called (#20) within DNS::advance. My original design avoided allocation during time stepping --maybe not completely, there might have been affew temp ChebyCoeffs per call to DNS::advance. Storage for FlowFields was allocated once during construction of a DNSAlgorithm and then just reused in calls to DNSAlgorithm::advance. This code has been refactored some since. Is this a similar allocate-once-before-repeated ::advance, or is it allocating a new FlowField at each time step? |
Yes, Just looking at the code it isn't obvious why this couldn't be done behind the scenes during construction of a That might be a quick fix (and slight optimisation presumably). On the other hand, we should be able to allocate as many |
Here is something of a minimal example that attempts to isolate the problem #include "channelflow/cfmpi.h"
#include "channelflow/flowfield.h"
using namespace chflow;
using namespace std;
vector<FlowField> createFlowField(const vector<FlowField>& fields) { return {fields[0]}; }
//FlowField createFlowField(const FlowField& field) { return FlowField(field); }
int main(int argc, char *argv[]) {
#ifdef HAVE_MPI
cfMPI_Init(&argc, &argv);
{
#endif
FlowField u(30, 30, 30, 1.0, 1.0, 1.0, -1.0, 1.0);
int numLoops = 2000;
for (int i = 1; i <= numLoops; i++) {
vector<FlowField> f(createFlowField({u}));
//FlowField f(createFlowField(u));
if (i % 100 == 0) {
cout << "created " << i << " FlowFields" << endl;
}
}
#ifdef HAVE_MPI
}
cfMPI_Finalize();
#endif
} It leaks a few hundred megabytes, so it's safe to run on a local machine. I think the commented lines can probably replace the lines above them to make things clearer without missing the issue. |
I doubt this has anything to do with it, but I'd change the grid size in those FlowField constructions to 32, 33, 32. Ny must be odd, and there's no good reason to set Nx, Nz to 30 = 2 x 3 x 5 when 32 = 2^5 is right next door. |
This is triggered in findsoln but not simulateflow. I suspect that's because findsoln builds a new DNS Algorithm for each evaluation of f^T(u), i.e. for each GMRES iteration (suboptimal but a relatively small overhead). Which would point to createRHS allocating FlowFields on the first call to DNSAlgorithm::advance when it replaces an array of empty (0x0x0) Flowfields with Flowfields allocated to the right size. Just guessing from experience; will have to look in detail at the code to be sure. |
In that case I would expect the memory to be leaked in one big chunk at the start of each GMRES iteration. But in fact the memory footprint accumulates steadily throughout time integration. I don't think I understand yet the difference between the The fftw memory in the revamped code is managed with unique pointers, which I don't really know much about, but they seem to be responsible for tidying up after themselves when they go out of scope by calling things like |
I'm attaching the stacktrace from the above minimal example program for future reference. |
Quick update: rather miraculously I don't have memory leaks any more. (Or at least if I do, they are substantially less severe.) Best guess is that the fftw library got updated on our local HPC and that is what has changed. I don't have time to do any detective work (until next week maybe) but wanted to point this out. It might mean the problem can be "solved" by placing constraints on fftw... |
Describe the bug
Memory usage for parallel findsoln runs increases over time. This is becoming a problem for me during high resolution solves. Can anyone reproduce this?
Expected Result
It should behave as for findsoln compiled in serial: memory constant during GMRES steps and increasing during the hookstep procedure, after which any memory allocated for the hookstep is deleted.
Actual Result
The footprint increases rapidly for the duration of the solve, including during individual time-integrations (calls to f).
Steps to reproduce the issue
It's not necessary to take any GMRES steps to see this happening. Memory starts to leak on the first evaluation of the residual. To reproduce, run findsoln on any state, e.g. if I set the period very high to prolong the computation of f...
and run top, I can see the memory increasing during the time integration steps, which does not happen for serial builds.
It's worth pointing out that simulateflow does not seem to have this problem. I can run that in parallel and memory usage is constant throughout. This is where I am stuck for the moment. While integrating, findsoln is looping through what is currently lines 749-768 of cfdsi.cpp. Well that seems to me to be essentially the same thing as lines 117-145 of simulateflow.cpp, which works fine. I suppose this must be a red herring and the real leak is somewhere else - my MPI is probably too rusty to debug this.
Information on your system
I am testing this on an HPC node with OpenMPI 2.0.1.
The text was updated successfully, but these errors were encountered: