Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BBPBGLIB-1069] Added nodes suggestions and more improvements #64

Merged
merged 4 commits into from
Oct 30, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/architecture.rst
Original file line number Diff line number Diff line change
Expand Up @@ -329,6 +329,16 @@ will get a summary of the estimated memory used for cells and synapses, includin
memory necessary to load libraries and neurodamus data structures.
A grand total is provided to the user as well as a per-cell type and per-synapse type breakdown.

At the end of the execution the user will also be provided with a suggestion on how many nodes
to use in order to run the simulation with the given circuit on the given machine.
Keep in mind that this is just a suggestion and the user is free to use a different number of nodes
if he/she wishes to do so. The suggestion is based on the assumption that the user wants to run
the simulation on the same kind of machine used to run the dry run. The suggestion is also based
on the assumption that the user wants to use all the available memory on each node for the simulation.
The node estimate takes into account the memory usage of the cells and synapses as well as the
variable usage of memory "overhead" that is fixed for each rank but varies depending on the number
of ranks used.

In this paragraph we will go a bit more into details on how the estimation is done.

Below you can see the workflow of the dry run mode:
Expand Down
2 changes: 1 addition & 1 deletion neurodamus/cell_distributor.py
Original file line number Diff line number Diff line change
Expand Up @@ -313,7 +313,7 @@ def store_metype_stats(metype, n_cells):
memory_allocated = end_memory - prev_memory
log_all(logging.DEBUG, " * METype %s: %.1f KiB averaged over %d cells",
metype, memory_allocated/n_cells, n_cells)
memory_dict[metype] = memory_allocated / n_cells
memory_dict[metype] = max(0, memory_allocated / n_cells)
prev_memory = end_memory

for gid, cell_info in gid_info_items:
Expand Down
1 change: 1 addition & 0 deletions neurodamus/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -1958,6 +1958,7 @@ def run(self):
if SimConfig.dry_run:
log_stage("============= DRY RUN (SKIP SIMULATION) =============")
self._dry_run_stats.display_total()
self._dry_run_stats.display_node_suggestions()
return
if not SimConfig.simulate_model:
self.sim_init()
Expand Down
66 changes: 66 additions & 0 deletions neurodamus/utils/memory.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@
import math
import os
import json
import psutil
import multiprocessing

from ..core import MPI, NeurodamusCore as Nd, run_only_rank0

Expand Down Expand Up @@ -271,3 +273,67 @@ def display_total(self):
grand_total = pretty_printing_memory_mb(grand_total)
logging.info("| {:<40s} | {:>12s} |".format("GRAND TOTAL", grand_total))
logging.info("+{:-^57}+".format(""))

def total_memory_available():
"""
Returns the total memory available in the system in MB
"""
try:
virtual_memory = psutil.virtual_memory()
return virtual_memory.total / (1024 * 1024) # Total available memory in MB
except Exception as e:
logging.error(f"Error: {e}")
return None

@run_only_rank0
def suggest_nodes(self, margin):
"""
A function to calculate the suggested number of nodes to run the simulation
The function takes into account the fact that the memory overhead is
variable with the amount of ranks the simulation it's ran with.
One can also specify a custom margin to add to the memory usage.
"""

try:
ranks_per_node = os.cpu_count()
except AttributeError:
ranks_per_node = multiprocessing.cpu_count()

full_overhead = self.base_memory * ranks_per_node

# initialize variable for iteration
est_nodes = 0
prev_est_nodes = None
max_iter = 5
iter_count = 0

while (prev_est_nodes is None or est_nodes != prev_est_nodes) and iter_count < max_iter:
prev_est_nodes = est_nodes
st4rl3ss marked this conversation as resolved.
Show resolved Hide resolved
mem_usage_per_node = full_overhead + self.cell_memory_total + self.synapse_memory_total
mem_usage_with_margin = mem_usage_per_node * (1 + margin)
est_nodes = math.ceil(mem_usage_with_margin / DryRunStats.total_memory_available())
full_overhead = self.base_memory * ranks_per_node * est_nodes
iter_count += 1

return est_nodes

@run_only_rank0
def display_node_suggestions(self):
"""
Display suggestions for how many nodes are approximately
necessary to run the simulation based on the memory available
on the current node.
"""
node_total_memory = DryRunStats.total_memory_available()
if node_total_memory is None:
logging.warning("Unable to get the total memory available on the current node.")
return
ferdonline marked this conversation as resolved.
Show resolved Hide resolved
suggested_nodes = self.suggest_nodes(0.3)
logging.info(f"Based on the memory available on the current node, "
f"it is suggested to use at least {suggested_nodes} node(s).")
logging.info("This is just a suggestion and the actual number of nodes "
"needed to run the simulation may be different.")
logging.info(f"The calculation was based on a total memory available of "
f"{pretty_printing_memory_mb(node_total_memory)} on the current node.")
logging.info("Please remember that it is suggested to use the same class of nodes "
"for both the dryrun and the actual simulation.")