Interoperability with fork() #223

nspark · 2018-06-11T20:08:36Z

Recently, I've worked with some other users who have needed to spawn child subprocesses from their OpenSHMEM applications. This process took us through three different implementation paths:

Typically, one would call fork()→exec(). Unfortunately, this does not seem to work portably [*], as some SHMEM libraries seg-fault at the fork() call.
In the course of experimentation, users found that calling vfork()→exec() worked portably [*], but vfork() was marked obsolete in POSIX-2001, removed from POSIX-2008, and is not without its critics (ref: 1, 2).
Using posix_spawn() seems to work portably [*]. (Somewhat ironically, it uses vfork() under the hood on Linux.)`

[*] where "portably" means "across all the OpenSHMEM implementations to which I have access, all of which are running on Linux".

To be clear, the child processes -- either after fork() or after exec() -- will not make any calls into the OpenSHMEM library. The application is not calling fork() without an exec(). (Of secondary interest may be the case in which the application calls fork() but does not call exec(). I haven't seen applications that do this, since I would think threads would be preferred.)

Should the OpenSHMEM specification take a position on interoperability with fork()?

The text was updated successfully, but these errors were encountered:

manjugv · 2018-06-25T14:03:56Z

We should make a statement, IMO. There are also other scenarios where clarification would be helpful. For example, what if process launched by the job launcher does not call shmem_init but a process forked from the launched process calls shmem_init ? Should that be valid ?

Nick, would like to discuss this in the next threads meeting ?

nspark · 2018-06-25T18:59:03Z

For example, what if process launched by the job launcher does not call shmem_init but a process forked from the launched process calls shmem_init ? Should that be valid ?

Yes, I generally think this should be a valid scenario. I have used/relied on this many times to write wrapper applications that create and monitor (e.g., hardware counters) the SHMEM application as a child process. I know of other applications for which others have used a similar pattern. That said, I'm not sure the OpenSHMEM Specification can or should address this. The situation you describe spans the parallel process launcher, the parent (or primary) process, and child process(es).

The systems that I use generally all run Slurm, so I'm relying on an assumption that Slurm informs shmem_init() (and thereby PMI_Init()) through environment variables. By clearing or modifying the environment of the child process, a child might not be able to properly initialize the SHMEM library. Since I think it means extending further into the nuances of job launchers and resource managers, I'm not comfortable saying that the OpenSHMEM Specification should make a claim that this should always be a valid scenario. At most, I think this should be "implementation defined" behavior.

nspark · 2018-06-26T20:32:51Z

Here is the test program that I promised...

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <spawn.h>
#include <shmem.h>

extern char **environ;

int main(int argc, char *argv[])
{
  if (argc == 1)
  {
    printf("Hello, from the child process\n");
    return 0;
  }

  shmem_init();

  const int spawn_mode = atoi(argv[1]);
  char * spawn_prog = argv[argc >= 3 ? 2 : 0];

  int rc, errnum;
  pid_t pid = 0;
  char *child_argv[2] = { spawn_prog, NULL };
  switch (spawn_mode)
  {
  case 1:
  case 2:
    errno = 0;
    pid = (spawn_mode == 1 ? fork() : vfork());
    errnum = errno;
    if (pid == -1)
    {
      fprintf(stderr, "ERROR: %s returned -1 (%d: %s)\n",
              (spawn_mode == 1 ? "fork()" : "vfork()"),
              errnum, strerror(errnum));
      exit(1);
    }
    else if (pid == 0)
    {
      errno = 0;
      rc = execvp(spawn_prog, child_argv);
      errnum = errno;
      if (rc == -1)
      {
        fprintf(stderr, "ERROR: execvp returned -1 (%d: %s)\n",
                errnum, strerror(errnum));
        _Exit(1); // _Exit is needed for vfork()
      }
    }
    break;
  case 3:
    rc = posix_spawn(&pid, spawn_prog, NULL, NULL, child_argv, environ);
    if (rc != 0)
    {
      fprintf(stderr, "ERROR: posix_spawn returned non-zero (%d: %s)\n",
              rc, strerror(rc));
      exit(1);
    }
    break;
  default:
    fprintf(stderr, "ERROR: invalid spawn mode\n");
    exit(1);
  }

  int status;
  waitpid(pid, &status, 0);
  if (WIFEXITED(status))
  {
    rc = WEXITSTATUS(status);
    printf("%d/%d: child exited with status %d\n",
           rc, shmem_my_pe(), shmem_n_pes());
  }
  else if (WIFSIGNALED(status))
  {
    rc = WTERMSIG(status);
    printf("%d/%d: child terminated by signal %d\n",
           rc, shmem_my_pe(), shmem_n_pes());
  }

  shmem_finalize();

  return 0;
}

Where running it (successfully) should show something like:

$ srun -n 2 ./fork-test 1 # 1 -> fork, 2 -> vfork, 3 -> posix_spawn
Hello, from the child process
Hello, from the child process
0/2: child exited with status 0
1/2: child exited with status 0

jdinan mentioned this issue Jul 5, 2018

Fork/Spawn Unit Test Sandia-OpenSHMEM/SOS#720

Open

minsii mentioned this issue Sep 4, 2018

Interoperability with other programming models #243

Closed

manjugv assigned nspark Jun 28, 2021

nspark linked a pull request Jul 6, 2021 that will close this issue

Specify interoperability with fork and subprocess creation #474

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Interoperability with fork() #223

Interoperability with fork() #223

nspark commented Jun 11, 2018

manjugv commented Jun 25, 2018

nspark commented Jun 25, 2018

nspark commented Jun 26, 2018

Interoperability with fork() #223

Interoperability with fork() #223

Comments

nspark commented Jun 11, 2018

manjugv commented Jun 25, 2018

nspark commented Jun 25, 2018

nspark commented Jun 26, 2018