Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InputOutputLocalProcess limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

Open
ThomasBreuer opened this issue Mar 15, 2024 · 5 comments
Labels
kind: bug Issues describing general bugs, and PRs fixing them

Comments

@ThomasBreuer
Copy link
Contributor

When playing with the manual example for InputOutputLocalProcess (see #5673 for the background),
I noticed that the results are not reliable. The following happens in the current master branch.
First define a function that executes the lines of the manual example.

test:= function( len )
  local d, f, s, x, y;

  d := DirectoryCurrent();;
  f := Filename(DirectoriesSystemPrograms(), "rev");;
  s := InputOutputLocalProcess(d,f,[]);
# WriteLine(s,"The cat sat on the mat");
# ReadLine(s);
  x := ListWithIdenticalEntries(len,'x');;
  ConvertToStringRep(x);
  WriteLine(s,x);
  WriteByte(s,INT_CHAR('\n'));
  y := ReadAll(s);;
  CloseStream(s);
  if Length(y) <> len + 2 then # something went wrong
    Print( "length ", Length(y), ": ", y, "\n" );
  fi;
end;

Then call this function enough often.

gap> for i in [ 1 .. 1000 ] do test(45); od;
length 16: xxxxxxxxxxxxxxxx
length 46: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

When I remove the first two # characters from the function, I even get the following.

 gap> for i in [ 1 .. 1000 ] do test(45); od;
length 54: ac ehT
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

What is going on here?

@ThomasBreuer ThomasBreuer added the kind: bug Issues describing general bugs, and PRs fixing them label Mar 15, 2024
@ChrisJefferson
Copy link
Contributor

I think this is (a little poorly) explained in the documentation -- when reading from a program, ReadAll reads until there is no more immediately available output, but there might be more output to come later (in this case, that 'later' is probably nanoseconds, but we've already returned).

This isn't easily fixable without breaking backwards compatability. If we changed ReadAll to keep reading until "end of stream", but then your program would hang, as rev won't close the stream until we close it's input.

@ThomasBreuer
Copy link
Contributor Author

O. k., but then the manual example is at least misleading, it should better show how one can get reliable output.

@fingolfin
Copy link
Member

I think the current implementation of iostream is fundamentally limited making it difficult if not impossible to use safely in general. The first issue that immediately comes to mind is this:

InstallMethod(IsEndOfStream, "iostream",
        [IsInputOutputStreamByPtyRep and IsInputOutputStream],
        stream -> # stream![4] or
        IS_BLOCKED_IOSTREAM(stream![1]) );

plus that:

static Obj FuncIS_BLOCKED_IOSTREAM(Obj self, Obj stream)
{
    UInt pty = HashLockStreamIfAvailable(stream);

    int isBlocked = (PtyIOStreams[pty].blocked || PtyIOStreams[pty].changed ||
                     !PtyIOStreams[pty].alive);
    HashUnlock(PtyIOStreams);
    return isBlocked ? True : False;
}

So as far as I can tell (I may have missed something!) there is no way to distinguish between an actual "end of stream" (the child process terminated), versus "there is no data right now but there soon might be" versus some other event.

Of course we could (and probably should) improve this by adding additional IsBLA functions that allow querying the status of such a stream in a more fine grained way.

But even then there are limitations to what we can do... Sure, it should be sufficient for "simple" communications where all messages between GAP and the subprocess are short, everything is fine: you write your message out all at once; it nicely fits into the OS buffers and so the subprocess can read it, process it, and then writes its answer in one go, and we read it all at once.

But already for rev this is not quite enough: our input messages essentially are byte sized and the output messages are byte sized, and to deal with this correctly we either need to use specific knowledge of this process ("we sent it N bytes, so output will be complete after we received exactly N bytes output"), or fragile hacks ("wait for 100ms and then surely we got all the data").

What is missing is something akin to POSIX select or async/await or so that allows us to simultaneously feed data to the subprocess, and to handle the data we receive from it.

This is indeed more problematic for things like e.g. coset enumeration were a tiny input can lead to gigantic output. There we need to rely on e.g. markers in the output to know when it is complete. So something like while child still is running (not crashed) try to read more data, until the data contains the end marker)

But for arbitrary programs this doesn't work (there are no end markers). Dunno what to do there. But maybe I am worrying about this needlessly.

@ThomasBreuer do you have a concrete application in mind?

@ThomasBreuer
Copy link
Contributor Author

@fingolfin No, I do not need InputOutputLocalProcess urgently.
The starting point was that GAP advertises the function, that there is one example in the manual (using the revprogram), and that this example is apparently not testable. According to Chris' and your comments, it is not easy to make this example actually work as a user might expect. (We could improve the documentation in the sense that we describe what the problems are.)

@fingolfin
Copy link
Member

Here is another thing that seems broken or at least missing in InputOutputLocalProcess: the kernel function CLOSE_PTY_IOSTREAM very carefully gets the exit code of the subprocess -- and then discards it.

As a result, as far as I know, it is impossible to see if such a subprocess signaled an error or not. That makes it impossible to (safely) use for various applications. Right now I encountered this in the xgap test setup which uses InputOutputLocalProcess and is pretty much broken right now.

@fingolfin fingolfin changed the title strange behaviour of InputOutputLocalProcess InputOutputLocalProcess limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code Nov 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind: bug Issues describing general bugs, and PRs fixing them
Projects
None yet
Development

No branches or pull requests

3 participants