`InputOutputLocalProcess` limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

ThomasBreuer · 2024-03-15T10:00:17Z

When playing with the manual example for InputOutputLocalProcess (see #5673 for the background),
I noticed that the results are not reliable. The following happens in the current master branch.
First define a function that executes the lines of the manual example.

test:= function( len )
  local d, f, s, x, y;

  d := DirectoryCurrent();;
  f := Filename(DirectoriesSystemPrograms(), "rev");;
  s := InputOutputLocalProcess(d,f,[]);
# WriteLine(s,"The cat sat on the mat");
# ReadLine(s);
  x := ListWithIdenticalEntries(len,'x');;
  ConvertToStringRep(x);
  WriteLine(s,x);
  WriteByte(s,INT_CHAR('\n'));
  y := ReadAll(s);;
  CloseStream(s);
  if Length(y) <> len + 2 then # something went wrong
    Print( "length ", Length(y), ": ", y, "\n" );
  fi;
end;

Then call this function enough often.

gap> for i in [ 1 .. 1000 ] do test(45); od;
length 16: xxxxxxxxxxxxxxxx
length 46: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

When I remove the first two # characters from the function, I even get the following.

 gap> for i in [ 1 .. 1000 ] do test(45); od;
length 54: ac ehT
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

What is going on here?

The text was updated successfully, but these errors were encountered:

ChrisJefferson · 2024-03-15T10:22:21Z

I think this is (a little poorly) explained in the documentation -- when reading from a program, ReadAll reads until there is no more immediately available output, but there might be more output to come later (in this case, that 'later' is probably nanoseconds, but we've already returned).

This isn't easily fixable without breaking backwards compatability. If we changed ReadAll to keep reading until "end of stream", but then your program would hang, as rev won't close the stream until we close it's input.

ThomasBreuer · 2024-03-15T10:28:37Z

O. k., but then the manual example is at least misleading, it should better show how one can get reliable output.

fingolfin · 2024-03-18T10:21:41Z

I think the current implementation of iostream is fundamentally limited making it difficult if not impossible to use safely in general. The first issue that immediately comes to mind is this:

InstallMethod(IsEndOfStream, "iostream",
        [IsInputOutputStreamByPtyRep and IsInputOutputStream],
        stream -> # stream![4] or
        IS_BLOCKED_IOSTREAM(stream![1]) );

plus that:

static Obj FuncIS_BLOCKED_IOSTREAM(Obj self, Obj stream)
{
    UInt pty = HashLockStreamIfAvailable(stream);

    int isBlocked = (PtyIOStreams[pty].blocked || PtyIOStreams[pty].changed ||
                     !PtyIOStreams[pty].alive);
    HashUnlock(PtyIOStreams);
    return isBlocked ? True : False;
}

So as far as I can tell (I may have missed something!) there is no way to distinguish between an actual "end of stream" (the child process terminated), versus "there is no data right now but there soon might be" versus some other event.

Of course we could (and probably should) improve this by adding additional IsBLA functions that allow querying the status of such a stream in a more fine grained way.

But even then there are limitations to what we can do... Sure, it should be sufficient for "simple" communications where all messages between GAP and the subprocess are short, everything is fine: you write your message out all at once; it nicely fits into the OS buffers and so the subprocess can read it, process it, and then writes its answer in one go, and we read it all at once.

But already for rev this is not quite enough: our input messages essentially are byte sized and the output messages are byte sized, and to deal with this correctly we either need to use specific knowledge of this process ("we sent it N bytes, so output will be complete after we received exactly N bytes output"), or fragile hacks ("wait for 100ms and then surely we got all the data").

What is missing is something akin to POSIX select or async/await or so that allows us to simultaneously feed data to the subprocess, and to handle the data we receive from it.

This is indeed more problematic for things like e.g. coset enumeration were a tiny input can lead to gigantic output. There we need to rely on e.g. markers in the output to know when it is complete. So something like while child still is running (not crashed) try to read more data, until the data contains the end marker)

But for arbitrary programs this doesn't work (there are no end markers). Dunno what to do there. But maybe I am worrying about this needlessly.

@ThomasBreuer do you have a concrete application in mind?

ThomasBreuer · 2024-03-18T12:50:34Z

@fingolfin No, I do not need InputOutputLocalProcess urgently.
The starting point was that GAP advertises the function, that there is one example in the manual (using the revprogram), and that this example is apparently not testable. According to Chris' and your comments, it is not easy to make this example actually work as a user might expect. (We could improve the documentation in the sense that we describe what the problems are.)

fingolfin · 2024-11-06T22:06:51Z

Here is another thing that seems broken or at least missing in InputOutputLocalProcess: the kernel function CLOSE_PTY_IOSTREAM very carefully gets the exit code of the subprocess -- and then discards it.

As a result, as far as I know, it is impossible to see if such a subprocess signaled an error or not. That makes it impossible to (safely) use for various applications. Right now I encountered this in the xgap test setup which uses InputOutputLocalProcess and is pretty much broken right now.

ThomasBreuer added the kind: bug Issues describing general bugs, and PRs fixing them label Mar 15, 2024

fingolfin changed the title ~~strange behaviour of InputOutputLocalProcess~~ InputOutputLocalProcess limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code Nov 6, 2024

fingolfin mentioned this issue Nov 8, 2024

Change tst/testall.g to quit GAP with exit status indicating outcome gap-packages/xgap#31

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`InputOutputLocalProcess` limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

`InputOutputLocalProcess` limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

ThomasBreuer commented Mar 15, 2024

ChrisJefferson commented Mar 15, 2024

ThomasBreuer commented Mar 15, 2024

fingolfin commented Mar 18, 2024

ThomasBreuer commented Mar 18, 2024

fingolfin commented Nov 6, 2024

InputOutputLocalProcess limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

InputOutputLocalProcess limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

Comments

ThomasBreuer commented Mar 15, 2024

ChrisJefferson commented Mar 15, 2024

ThomasBreuer commented Mar 15, 2024

fingolfin commented Mar 18, 2024

ThomasBreuer commented Mar 18, 2024

fingolfin commented Nov 6, 2024

`InputOutputLocalProcess` limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676

`InputOutputLocalProcess` limitations: can't distinguish "no data available" and "subprocess terminated"; no way to query exit code #5676