Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

HolyLow · 2021-01-28T09:41:10Z

I am trying to export a large network with nrnbbcore_write, but the program gets killed because it requires more memory than the machine's DRAM could provide.
So if the network size grows so large that it can not be generated by a single machine, what should I do to support such a large network with neuron?
In simulation phase, I could use CoreNeuron to distribute the simulation to a bunch of machines. But in network exportion phase (with nrnbbcore_write), is it possible to distribute the network exportion procedure to different machines? How could I realize that?

nrnhines · 2021-01-28T12:07:46Z

This is the original use case for CoreNEURON (i.e. the model is too large for NEURON to build at one time.) CoreNEURON requires 7-fold less memory than CoreNEURON for large models. At least that was the case a few years ago. Since then most of the effort has gone into performance improvements. @pramodk can speak to the most current memory usage results. Anyway, the strategy is to have NEURON build a sequence of model subsets and generate the files for each subset,destroy the subset, and go on to the next subset in the sequence. It is up to you how many subsets to divide the model. On a parallel machine, setup efficiency is best if the model is divided into at least nhost subsets and load balance may be best served if it is a multiple of nhost. This is a fairly straightforward NEURON programming problem as most parallel models alrready are cell gid based in terms of distribution on the machine and a process generally only creates a model subset based on its list of gids. Whether you create subsets of size of a single cell or a million cells is up to you and memory resource. The only issue that is a little out of the run of the mill is the destruction of the model after writing its files. But the key is to first release all the gids with pc.gid_clear(), then destroy the netcons, then the cells.

pramodk · 2021-01-28T21:05:22Z

So if the network size grows so large that it can not be generated by a single machine, what should I do to support such a large network with neuron?
In simulation phase, I could use CoreNeuron to distribute the simulation to a bunch of machines.

@HolyLow : before going int o details, just a naive question to clarify : you wrote that the model is large that can fit on a single machine but could use CoreNeuron to distribute the simulation to a bunch of machines.

My question is : if you have multiple machines, you can run NEURON also on multiple machines to generate the model and then run CoreNEURON also on the same number of machines? Is this how you are running now?

From the wording I got the impression that you run NEURON on a single machine and then run CoreNEURON on single or multiple machine. If you could clarify this then that will be helpful.

HolyLow · 2021-01-29T02:35:55Z

@pramodk Yes, currently I am running NEURON on a single machine and CoreNEURON on multi-machines for some reason. So are you suggesting that the NEURON exportion procedure could also be carried out on multi-machines, and if I applied it to multi-machines, the memory occupation problem could be solved?
@nrnhines Could you kindly provide me some material, such as a manual or an example, to guide me how to achieve dividing the model into sub-models and generate one-by-one each time?

pramodk · 2021-01-31T07:22:23Z

So are you suggesting that the NEURON exportion procedure could also be carried out on multi-machines, and if I applied it to multi-machines, the memory occupation problem could be solved?

Yes. Are you running NEURON with MPI already or just threads? Like CoreNEURON, you can also run NEURON on multiple compute nodes / machines and then there will be more memory available to finish model building step.

P.S. Michael's mentions another option where only part of the model can be setup one at a time but we don't have public example yet. First you can try multiple machines option mentioned above.

nrnhines · 2021-02-03T13:03:22Z

Is it the case that your model setup on an mpi cluster does not need global collective communication. I.e. that one can even envision building each subset of the model, writing the files, and destroying the subset, without requiring that the entire model exist at once? Anyway, one strategy is

cvode.cache_efficient(1)
gidgroups = [h.Vector() for _ in range(nsubset)] # used to write files.dat at end
for isubset in range(nsubset):
  gids = range(isubset, ncell, nsubset) # round robin distribution, but use whatever you prefer
  build_subset(gids) # just like a single rank on an nhost cluster
  pc.bbcorewrite("./coredat",  gidgroups[isubset])
  teardown()

def teardown():
  pc.gid_clear()
  # delete your NetCons list
  # delete your Cells list
  assert (h.List("NetCon").count() == 0)
  assert (len([s for s in h.allsec()]) == 0)

# write out the files.dat file (see required format at https://neuronsimulator.github.io/nrn/py_doc/modelspec/programmatic/network/parcon.html#ParallelContext.nrnbbcore_write

I did not execute so there may be syntax errors but the idea is sound. I need to follow through with a complete example for the ringtest or some other standard example model to be sure I got it right.

alexsavulescu · 2022-02-11T09:41:29Z

@nrnhines do we still need to merge neuronsimulator/ringtest#18 ? Looks like this issue can be closed following #964.

nrnhines · 2022-02-11T15:13:11Z

@alexsavulescu

still need to merge neuronsimulator/ringtest#18 ?

I believe we do. https://github.com/neuronsimulator/ringtest/pull/18/files has the test_submodel.py line 39

    pc.nrncore_write("./coredat",  isubmodel != 0)

that makes use of #964
So it is basically a test of the #964

HolyLow changed the title ~~Network size exceeds the DRAM capacity and program get killed when exporting the network with nrnbbcore_write~~ Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write Jan 28, 2021

pramodk mentioned this issue Jan 31, 2021

Update CoreNEURON documentation #950

Merged

nrnhines mentioned this issue Feb 3, 2021

Example of how to write data files as series of submodels. neuronsimulator/ringtest#18

Open

alexsavulescu added the question label Feb 11, 2022

HolyLow closed this as completed Dec 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

HolyLow commented Jan 28, 2021

nrnhines commented Jan 28, 2021

pramodk commented Jan 28, 2021

HolyLow commented Jan 29, 2021

pramodk commented Jan 31, 2021

nrnhines commented Feb 3, 2021

alexsavulescu commented Feb 11, 2022

nrnhines commented Feb 11, 2022

Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

Network size exceeds the DRAM capacity and program gets killed when exporting the network with nrnbbcore_write #943

Comments

HolyLow commented Jan 28, 2021

nrnhines commented Jan 28, 2021

pramodk commented Jan 28, 2021

HolyLow commented Jan 29, 2021

pramodk commented Jan 31, 2021

nrnhines commented Feb 3, 2021

alexsavulescu commented Feb 11, 2022

nrnhines commented Feb 11, 2022