-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interoperability with fork() #223
Comments
We should make a statement, IMO. There are also other scenarios where clarification would be helpful. For example, what if process launched by the job launcher does not call shmem_init but a process forked from the launched process calls shmem_init ? Should that be valid ? Nick, would like to discuss this in the next threads meeting ? |
Yes, I generally think this should be a valid scenario. I have used/relied on this many times to write wrapper applications that create and monitor (e.g., hardware counters) the SHMEM application as a child process. I know of other applications for which others have used a similar pattern. That said, I'm not sure the OpenSHMEM Specification can or should address this. The situation you describe spans the parallel process launcher, the parent (or primary) process, and child process(es). The systems that I use generally all run Slurm, so I'm relying on an assumption that Slurm informs |
Here is the test program that I promised... #include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <spawn.h>
#include <shmem.h>
extern char **environ;
int main(int argc, char *argv[])
{
if (argc == 1)
{
printf("Hello, from the child process\n");
return 0;
}
shmem_init();
const int spawn_mode = atoi(argv[1]);
char * spawn_prog = argv[argc >= 3 ? 2 : 0];
int rc, errnum;
pid_t pid = 0;
char *child_argv[2] = { spawn_prog, NULL };
switch (spawn_mode)
{
case 1:
case 2:
errno = 0;
pid = (spawn_mode == 1 ? fork() : vfork());
errnum = errno;
if (pid == -1)
{
fprintf(stderr, "ERROR: %s returned -1 (%d: %s)\n",
(spawn_mode == 1 ? "fork()" : "vfork()"),
errnum, strerror(errnum));
exit(1);
}
else if (pid == 0)
{
errno = 0;
rc = execvp(spawn_prog, child_argv);
errnum = errno;
if (rc == -1)
{
fprintf(stderr, "ERROR: execvp returned -1 (%d: %s)\n",
errnum, strerror(errnum));
_Exit(1); // _Exit is needed for vfork()
}
}
break;
case 3:
rc = posix_spawn(&pid, spawn_prog, NULL, NULL, child_argv, environ);
if (rc != 0)
{
fprintf(stderr, "ERROR: posix_spawn returned non-zero (%d: %s)\n",
rc, strerror(rc));
exit(1);
}
break;
default:
fprintf(stderr, "ERROR: invalid spawn mode\n");
exit(1);
}
int status;
waitpid(pid, &status, 0);
if (WIFEXITED(status))
{
rc = WEXITSTATUS(status);
printf("%d/%d: child exited with status %d\n",
rc, shmem_my_pe(), shmem_n_pes());
}
else if (WIFSIGNALED(status))
{
rc = WTERMSIG(status);
printf("%d/%d: child terminated by signal %d\n",
rc, shmem_my_pe(), shmem_n_pes());
}
shmem_finalize();
return 0;
} Where running it (successfully) should show something like: $ srun -n 2 ./fork-test 1 # 1 -> fork, 2 -> vfork, 3 -> posix_spawn
Hello, from the child process
Hello, from the child process
0/2: child exited with status 0
1/2: child exited with status 0 |
Recently, I've worked with some other users who have needed to spawn child subprocesses from their OpenSHMEM applications. This process took us through three different implementation paths:
fork()
→exec()
. Unfortunately, this does not seem to work portably [*], as some SHMEM libraries seg-fault at thefork()
call.vfork()
→exec()
worked portably [*], butvfork()
was marked obsolete in POSIX-2001, removed from POSIX-2008, and is not without its critics (ref: 1, 2).posix_spawn()
seems to work portably [*]. (Somewhat ironically, it usesvfork()
under the hood on Linux.)`[*] where "portably" means "across all the OpenSHMEM implementations to which I have access, all of which are running on Linux".
To be clear, the child processes -- either after
fork()
or afterexec()
-- will not make any calls into the OpenSHMEM library. The application is not callingfork()
without anexec()
. (Of secondary interest may be the case in which the application callsfork()
but does not callexec()
. I haven't seen applications that do this, since I would think threads would be preferred.)Should the OpenSHMEM specification take a position on interoperability with
fork()
?The text was updated successfully, but these errors were encountered: