Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing Unix bindings #42

Open
kit-ty-kate opened this issue Sep 21, 2024 · 5 comments
Open

Missing Unix bindings #42

kit-ty-kate opened this issue Sep 21, 2024 · 5 comments

Comments

@kit-ty-kate
Copy link
Contributor

While trying to use miou i encountered some roadblocks. These functions are missing from Miou_unix:

  • opendir
  • closedir
  • readdir
  • openfile
  • mkdir
  • stat
  • unlink
  • rmdir

I feel like having these functions in Miou_unix would be highly valuable.

@kit-ty-kate
Copy link
Contributor Author

After looking around a little more i may start to understand that syscalls non-associated with file descriptors may not be suitable to be part of Miou_unix, and that Lwt seems to use a trick based off a special job queue for those kind of functions. (cc @raphael-proust in case you're interested in the discussion or if i'm saying total garbage, which is probably the case)

If my understanding is correct, i would argue that having these functions in could still be useful even if they are defined the following way:

let readdir dir =
  Miou.yield ();
  Unix.readdir dir

This way, even if those functions block a little bit, we're mostly assured that most jobs have been taken care of semi-recently.

@dinosaure
Copy link
Contributor

The issue is in fact more subtle than that and the choices made can lead to unwanted Miou behaviour. The idea of Miou.file_descr remains centred on sockets (and particularly on the non_blocking flag) in which suspension may be necessary as soon as reading or writing is involved because these can block the process.

The same applies, for example, to file-descrs from Unix.pipe, where reading from one is only available (and will not block) after writing to the other.

What is certain here is that we are talking about a possible suspension due to the fact that the Unix.read and Unix.write (or Unix.connect and Unix.accept) functions can block.

This is where there may be a notable difference. As far as files are concerned (which are also represented via a file-descr), we can be generally sure that Unix.read/Unix.write on them will never block! So the suspend and resume mechanism via select() is largely useless (and may even degrade performance). Worse, Miou's task management could set up a sort of openfile barrier and try to open trillions of files at once, which would lead to an EMFILE error (this is currently the problem with eio with this type of code, where you have to open lots of files in order to calculate the merkle-tree of a folder).

However, even if reading or writing is not blocked, these operations can take a long time. So the issue isn't one of suspending and resuming, but of cooperation when there are several tasks reading/writing files running cooperatively - it should be noted that this cooperation problem no longer exists when these tasks are running in parallel!

As a result:

  • using Miou.file_descr to represent the file_descr of a file does nothing more than allow you to use the same Miou_unix.{read,write} as for sockets
  • using Miou_unix.{read,write} on a file inevitably leads to the use of suspend and resume mechanisms which can degrade performance (compared to using Unix.{read,write} directly)
  • the real issue is cooperation, which in your code takes the form of using Miou.yield
  • co-operation (and Miou.yield) only becomes necessary in the case of Miou.async (with MIOU_DOMAINS=0) and does not concern tasks launched in parallel. In other words, systematically using Miou.yield for these functions could degrade performance when the user prefers to use parallelization (Miou.call) to launch tasks.

More generally, it might be a good idea (and could be documented!) to use Unix.file_descr, Unix.read and Unix.write (or Stdlib.open_{in,out}, Stdlib.input & Stdlib.output) directly when manipulating files. The question about Miou.yield is also thorny and this choice must be made taking into account the design of the application (use of Miou.async or Miou.call, or both...).

So, I would perhaps be more inclined to document all this properly in Miou_unix, explaining that this module is particularly concerned with sockets and that the other operations (such as mkdir, readdir, openfile) should rather be done directly with the Unix module (while explaining the interest of cooperation and the use of Miou.yield or the possible parallelization of these operations with Miou.call). WDYT?

@kit-ty-kate
Copy link
Contributor Author

will never block!

after reading this, i think having a job queue on another thread for IO purpose might be nice to have in miou.

IO functions can take time (e.g. hardware is busy, network file systems like NFS, low priority program, …) and i feel like for most applications of Miou (e.g. web servers, …) such delay can quickly become a huge problem.

Miou.yield would alleviate some of that issue but not for longer IO delays. Miou.call is a bad idea as it rely on the existence of more than 1 domain and would fail on single core processors, so i think using an IO queue using Thread (maybe one per domain) for non-blocking IO, is probably the best solution for this. However this adds complexity so maybe Miou.yield is a reasonable middle-ground for now. I've updated miou_io to reflect that.

In any case, having some sort of documentation explaining all this in Miou_unix would be highly appreciated. I don't think i'm going the only one with that sort of question when porting programs from lwt.

@dinosaure
Copy link
Contributor

i think using an IO queue using Thread (maybe one per domain) for non-blocking IO, is probably the best solution for this.

That's exactly the design I have in mind and it's probably (along with discussions with other people), the most interesting design. At this stage, we could improve Miou_unix (Miou doesn't need changing) and, indeed, spawn one thread per domain in order to manage these syscalls, which can be long. But this would require quite a lot of work - moreover, I'd be more in favour of proposing a new module like Miou_thread rather than modifying Miou_unix, which has the advantage of being very simple.

In any case, having some sort of documentation explaining all this in Miou_unix would be highly appreciated.

The documentation is available here: #43. If you have any comments or improvements.

@raphael-proust
Copy link

Thank you very much ❤️ for pinging me in this thread. I'm interested in this topic, I'm quite likely to use Miou for some personal projects, and I often receive questions regarding Lwt alternatives…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants