Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEAT-#308: Add monitor to each host in the cluster #309

Closed
wants to merge 5 commits into from

Conversation

Retribution98
Copy link
Collaborator

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 .
  • passes black .
  • signed commit with git commit -s
  • Resolves Add a service monitor process to each host in the cluster #308
  • tests added and passing
  • module layout described at docs/developer/architecture.rst is up-to-date

hosts = MpiHosts.get()
info = MPI.Info.Create()
if hosts:
host_list = hosts.split(",") if hosts is not None else ["localhost"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
host_list = hosts.split(",") if hosts is not None else ["localhost"]
hosts = hosts.split(",") if hosts is not None else ["localhost"]

Also, users should already pass hosts like a.b.c.d,a1.b1.c1.d1[,...]. Why do we need to call split here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"hosts" is another variable on the 160 row
"host_list" is required below to configure the special "--host" option for OpenMPI

unidist/core/backends/mpi/core/controller/api.py Outdated Show resolved Hide resolved
@@ -78,6 +78,22 @@ def log_operation(op_type, status):
)


def is_internal_host_communication_supported():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def is_internal_host_communication_supported():
def is_shared_memory_supported():

@@ -78,6 +78,22 @@ def log_operation(op_type, status):
)


def is_internal_host_communication_supported():
"""
Check if the Unidist on MPI support internal host communication
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Check if the Unidist on MPI support internal host communication
Check if MPI supports shared memory.

unidist/core/backends/mpi/core/communication.py Outdated Show resolved Hide resolved
unidist/core/backends/mpi/core/communication.py Outdated Show resolved Hide resolved
unidist/core/backends/mpi/core/communication.py Outdated Show resolved Hide resolved
unidist/core/backends/mpi/core/communication.py Outdated Show resolved Hide resolved
Comment on lines +145 to +142
self.__host_rank_by_rank = defaultdict(None)
self.__host_by_rank = defaultdict(None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.__host_rank_by_rank = defaultdict(None)
self.__host_by_rank = defaultdict(None)
self._host_rank_by_rank = defaultdict(None)
self._host_by_rank = defaultdict(None)

Add inline comments what is this for.

@YarShev
Copy link
Collaborator

YarShev commented Jul 11, 2023

This PR is an integral part of #286. It doesn't make sense to merge these changes without #286. So closing this PR in favor of #286.

@YarShev YarShev closed this Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a service monitor process to each host in the cluster
2 participants