findsoln leaks memory when compiled with MPI #12

jakelangham · 2018-12-20T15:22:56Z

Describe the bug

Memory usage for parallel findsoln runs increases over time. This is becoming a problem for me during high resolution solves. Can anyone reproduce this?

Expected Result

It should behave as for findsoln compiled in serial: memory constant during GMRES steps and increasing during the hookstep procedure, after which any memory allocated for the hookstep is deleted.

Actual Result

The footprint increases rapidly for the duration of the solve, including during individual time-integrations (calls to f).

Steps to reproduce the issue

It's not necessary to take any GMRES steps to see this happening. Memory starts to leak on the first evaluation of the residual. To reproduce, run findsoln on any state, e.g. if I set the period very high to prolong the computation of f...

$ mpirun -np 28 findsoln -eqb -R 410 -T 5000 eq1.nc
61506 unknowns
Computing G(x)
f^T: ....10...etc

and run top, I can see the memory increasing during the time integration steps, which does not happen for serial builds.

It's worth pointing out that simulateflow does not seem to have this problem. I can run that in parallel and memory usage is constant throughout. This is where I am stuck for the moment. While integrating, findsoln is looping through what is currently lines 749-768 of cfdsi.cpp. Well that seems to me to be essentially the same thing as lines 117-145 of simulateflow.cpp, which works fine. I suppose this must be a red herring and the real leak is somewhere else - my MPI is probably too rusty to debug this.

Information on your system

I am testing this on an HPC node with OpenMPI 2.0.1.

mirkofarano · 2018-12-20T16:32:44Z

You can try to compile the code in debug mode and adding the following arguments to the cmake command:
-DCMAKE_CXX_FLAGS=“-fsanitize=address -fsanitize=undefined -fsanitize=leak”
Run the same simulation. If there is a memory leak it should tell you where it is.

jakelangham · 2018-12-21T21:04:31Z

Ok, I now have a huge output file (200Mb!) with a load of stacktraces. The output is somewhat dodgy due to running on 28 processors, but essentially every section looks something like this

Direct leak of 2525600 byte(s) in 2255 object(s) allocated from:
    #0 0x4e2d40 in __interceptor_malloc ../../../../libsanitizer/asan/asan_malloc_linux.cc:62
    #1 0x2aca4fa43156 in mca_coll_sync_comm_query (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0xeb156)
    #2 0x2aca4fa09d82 in mca_coll_base_comm_select (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0xb1d82)
    #3 0x2aca4f9ac30d in ompi_comm_activate_nb_complete (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0x5430d)
    #4 0x2aca4f9ae4ab in ompi_comm_request_progress (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0x564ab)
    #5 0x2aca54c38ceb in opal_progress (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libopen-pal.so.20+0x38ceb)
    #6 0x2aca4f9ae0d4 in ompi_comm_activate (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0x560d4)
    #7 0x2aca4f9a96bf in ompi_comm_dup_with_info (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0x516bf)
    #8 0x2aca4f9d9b0e in MPI_Comm_dup (/mnt/storage/easybuild/software/OpenMPI/2.0.2-GCC-6.3.0-2.27/lib/libmpi.so.20+0x81b0e)
    #9 0x2aca4d47fe03 in mkplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x996e03)
    #10 0x2aca4d48798b in mkplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x99e98b)
    #11 0x2aca4d487dbe in fftw_mkplan_d (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x99edbe)
    #12 0x2aca4d485635 in mkplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x99c635)
    #13 0x2aca4d48798b in mkplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x99e98b)
    #14 0x2aca4d48a0a7 in mkplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x9a10a7)
    #15 0x2aca4d48a2b2 in fftw_mkapiplan (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x9a12b2)
    #16 0x2aca4d47ccef in plan_guru_rdft2 (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x993cef)
    #17 0x2aca4d47df0d in fftw_mpi_plan_many_dft_c2r (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x994f0d)
    #18 0x2aca4d3017c9 in chflow::FlowField::fftw_initialize(unsigned int) /mnt/storage/home/jl17781/git/channelflow/channelflow/flowfield.cpp:598
    #19 0x2aca4d300d33 in chflow::FlowField::resize(int, int, int, int, double, double, double, double, chflow::CfMPI*, unsigned int) /mnt/storage/home/jl17781/git/channelflow/channelflow/flowfield.cpp:575
    #20 0x2aca4d2f2f67 in chflow::FlowField::FlowField(chflow::FlowField const&) /mnt/storage/home/jl17781/git/channelflow/channelflow/flowfield.cpp:99
    #21 0x2aca4d283f34 in void std::_Construct<chflow::FlowField, chflow::FlowField const&>(chflow::FlowField*, chflow::FlowField const&) /mnt/storage/easybuild/software/GCCcore/7.2.0/include/c++/7.2.0/bits/stl_construct.h:75
    #22 0x2aca4d2c9033 in chflow::FlowField* std::__uninitialized_copy<false>::__uninit_copy<chflow::FlowField const*, chflow::FlowField*>(chflow::FlowField const*, chflow::FlowField const*, chflow::FlowField*) /mnt/storage/easybuild/software/GCCcore/7.2.0/include/c++/7.2.0/bits/stl_uninitialized.h:83
    #23 0x2aca4d2c8b65 in chflow::FlowField* std::uninitialized_copy<chflow::FlowField const*, chflow::FlowField*>(chflow::FlowField const*, chflow::FlowField const*, chflow::FlowField*) /mnt/storage/easybuild/software/GCCcore/7.2.0/include/c++/7.2.0/bits/stl_uninitialized.h:134
    #24 0x2aca4d2c8308 in chflow::FlowField* std::__uninitialized_copy_a<chflow::FlowField const*, chflow::FlowField*, chflow::FlowField>(chflow::FlowField const*, chflow::FlowField const*, chflow::FlowField*, std::allocator<chflow::FlowField>&) /mnt/storage/easybuild/software/GCCcore/7.2.0/include/c++/7.2.0/bits/stl_uninitialized.h:289
    #25 0x2aca4d2c7120 in void std::vector<chflow::FlowField, std::allocator<chflow::FlowField> >::_M_range_initialize<chflow::FlowField const*>(chflow::FlowField const*, chflow::FlowField const*, std::forward_iterator_tag) (/mnt/storage/home/jl17781/git/channelflow/build/channelflow/libchflow.so+0x7de120)
    #26 0x2aca4d2c48e8 in std::vector<chflow::FlowField, std::allocator<chflow::FlowField> >::vector(std::initializer_list<chflow::FlowField>, std::allocator<chflow::FlowField> const&) /mnt/storage/easybuild/software/GCCcore/7.2.0/include/c++/7.2.0/bits/stl_vector.h:387
    #27 0x2aca4d2aebe8 in chflow::NSE::createRHS(std::vector<chflow::FlowField, std::allocator<chflow::FlowField> > const&) const /mnt/storage/home/jl17781/git/channelflow/channelflow/nse.cpp:577
    #28 0x2aca4d2589f4 in chflow::MultistepDNS::advance(std::vector<chflow::FlowField, std::allocator<chflow::FlowField> >&, int) /mnt/storage/home/jl17781/git/channelflow/channelflow/dnsalgo.cpp:207
    #29 0x2aca4d2d5da0 in chflow::DNS::advance(std::vector<chflow::FlowField, std::allocator<chflow::FlowField> >&, int) /mnt/storage/home/jl17781/git/channelflow/channelflow/dns.cpp:164

So I suppose something odd is happening with fftw_initialize trying to overwrite existing Fourier transform data? I will try to look into this tomorrow, though I am probably not the best person to debug this.

jakelangham · 2018-12-21T21:10:35Z

Oh by the way, for future reference, I had to compile with -static-libstdc++ -static-libasan too, to get a detailed stacktrace. (Without, it just lists the MPI calls.)

johnfgibson · 2018-12-21T21:52:19Z

Thanks for spotting and narrowing down this issue. I'm surprised that a FlowField constructor is being called (#20) within DNS::advance. My original design avoided allocation during time stepping --maybe not completely, there might have been affew temp ChebyCoeffs per call to DNS::advance. Storage for FlowFields was allocated once during construction of a DNSAlgorithm and then just reused in calls to DNSAlgorithm::advance.

This code has been refactored some since. Is this a similar allocate-once-before-repeated ::advance, or is it allocating a new FlowField at each time step?

jakelangham · 2018-12-22T00:51:46Z

Yes, FlowFields are allocated at each call to DNSAlgorithm::advance that store the full rhs (whereas previously the rhs would have been constructed separately for each Fourier mode). The relevant line for the multistep algorithm is 205 in dnsalgo.cpp (not 207 as it says it says in the stacktrace). This ends up calling a copy constructor which shapes the rhs to match the field being time stepped and that's where the fft allocations come in.

Just looking at the code it isn't obvious why this couldn't be done behind the scenes during construction of a DNSAlgorithm - i.e. they all need a rhs so why not let them persist? Maybe the refactored code allows you to hit a DNSAlgorithm with any field you like, regardless of resolution, geometry etc, though I'm not sure about that.

That might be a quick fix (and slight optimisation presumably). On the other hand, we should be able to allocate as many FlowFields as we like without leaking. Perhaps I can create a minimal example now that just copies a load of fields around.

jakelangham · 2018-12-22T12:34:11Z

Here is something of a minimal example that attempts to isolate the problem

#include "channelflow/cfmpi.h"
#include "channelflow/flowfield.h"

using namespace chflow;
using namespace std;

vector<FlowField> createFlowField(const vector<FlowField>& fields) { return {fields[0]}; }
//FlowField createFlowField(const FlowField& field) { return FlowField(field); }

int main(int argc, char *argv[]) {

#ifdef HAVE_MPI
    cfMPI_Init(&argc, &argv);
    {
#endif

    FlowField u(30, 30, 30, 1.0, 1.0, 1.0, -1.0, 1.0);

    int numLoops = 2000;
    for (int i = 1; i <= numLoops; i++) {
        vector<FlowField> f(createFlowField({u}));
        //FlowField f(createFlowField(u));

        if (i % 100 == 0) {
            cout << "created " << i << " FlowFields" << endl;
        }
    }

#ifdef HAVE_MPI
    }
    cfMPI_Finalize();
#endif
}

It leaks a few hundred megabytes, so it's safe to run on a local machine. I think the commented lines can probably replace the lines above them to make things clearer without missing the issue.

johnfgibson · 2018-12-22T14:48:20Z

I doubt this has anything to do with it, but I'd change the grid size in those FlowField constructions to 32, 33, 32. Ny must be odd, and there's no good reason to set Nx, Nz to 30 = 2 x 3 x 5 when 32 = 2^5 is right next door.

johnfgibson · 2018-12-22T16:58:45Z

This is triggered in findsoln but not simulateflow. I suspect that's because findsoln builds a new DNS Algorithm for each evaluation of f^T(u), i.e. for each GMRES iteration (suboptimal but a relatively small overhead). Which would point to createRHS allocating FlowFields on the first call to DNSAlgorithm::advance when it replaces an array of empty (0x0x0) Flowfields with Flowfields allocated to the right size. Just guessing from experience; will have to look in detail at the code to be sure.

jakelangham · 2018-12-22T17:26:04Z

In that case I would expect the memory to be leaked in one big chunk at the start of each GMRES iteration. But in fact the memory footprint accumulates steadily throughout time integration.

I don't think I understand yet the difference between the findsoln and simulateflow cases. The rhs FlowField is initialised on every call to DNSAlgorithm::advance. Moreover, it doesn't replace an empty FlowField - it's initialised by a copy constructor.

The fftw memory in the revamped code is managed with unique pointers, which I don't really know much about, but they seem to be responsible for tidying up after themselves when they go out of scope by calling things like fftw_destroy_plan - precisely to ensure this sort of thing doesn't happen. I will read up on them a bit to see if I can spot any bugs with the behind-the-scenes implementation of FlowFields.

jakelangham · 2018-12-22T17:30:51Z

I'm attaching the stacktrace from the above minimal example program for future reference.

stacktrace.txt

jakelangham · 2019-03-21T19:04:10Z

Quick update: rather miraculously I don't have memory leaks any more. (Or at least if I do, they are substantially less severe.)

Best guess is that the fftw library got updated on our local HPC and that is what has changed.

I don't have time to do any detective work (until next week maybe) but wanted to point this out. It might mean the problem can be "solved" by placing constraints on fftw...

oneZeroOneZeroOne mentioned this issue Oct 5, 2019

cmake install fails on missing modules/viscoelastic/veutils.h #21

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findsoln leaks memory when compiled with MPI #12

findsoln leaks memory when compiled with MPI #12

jakelangham commented Dec 20, 2018

mirkofarano commented Dec 20, 2018

jakelangham commented Dec 21, 2018

jakelangham commented Dec 21, 2018

johnfgibson commented Dec 21, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Dec 22, 2018

johnfgibson commented Dec 22, 2018

johnfgibson commented Dec 22, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Mar 21, 2019

findsoln leaks memory when compiled with MPI #12

findsoln leaks memory when compiled with MPI #12

Comments

jakelangham commented Dec 20, 2018

mirkofarano commented Dec 20, 2018

jakelangham commented Dec 21, 2018

jakelangham commented Dec 21, 2018

johnfgibson commented Dec 21, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Dec 22, 2018

johnfgibson commented Dec 22, 2018

johnfgibson commented Dec 22, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Dec 22, 2018

jakelangham commented Mar 21, 2019