Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple hawk calls go in race condition for configuration and compilation #45

Open
melrief opened this issue Aug 12, 2013 · 13 comments
Open
Assignees
Labels

Comments

@melrief
Copy link
Collaborator

melrief commented Aug 12, 2013

I'm experiencing some problems when many hawk instances are created. You can recreate it by pipelining many hawk calls:

> hawk -e '[1..100]' | hawk -m 'P.replicate 10'
hawk: /tmp/3574191538758045375: removeLink: does not exist (No such file or directory)

note that not always the error is raised, but I saw that if I lock the part in which we interpret the expression, the error disappear. Probably two instances of hint will work on the same file (that uses a timestamp as filename I think) if the computer is very fast. If this is a hint bug, we should consider opening a bug report on hint.

I think there is a similar error on the configuration. Only one process should compile Prelude.hs, the others should wait for the result.

The locking part is a tricky one. POSIX should have lockable files, but this is not a portable solution. We need a library that abstract locks on files.

This is related to #43: with a centralized demon we could implement a queue system to avoid race conditions. Yet I think we should work with locks and avoid the daemon for now.

@gelisam
Copy link
Collaborator

gelisam commented Aug 12, 2013

Hmm, that makes me realize a potential problem with the daemon strategy. What if we have:

> hawk -e [1..] | hawk -m 'id'

Then the deamon cannot serve one request until completion before serving after another, it must serve both requests in parallel. Which kind of defeats the idea of having a daemon which is always ready to execute a piece of code in the pre-loaded environment.

@melrief
Copy link
Collaborator Author

melrief commented Aug 12, 2013

Currently there are three phases of a hawk run:

  • compile the configuration to create the cache
  • interpret the user expression using hint with the cache as context
  • run the interpreted expression on the input to produce the output

Of these three steps only the first one should suffer race conditions problem. The others two should be, in my opinion, perfectly parallelizable. And this should be independent from having a daemon or not. I can't understand why hint can't interpret in parallel two expressions. We should investigate more on this issue, to me it seems a hint problem.

For now a solution could be to lock the first step such that only one process compile the configuration and compile the runInterpreter call (the second step). The third step works in parallel.

@melrief
Copy link
Collaborator Author

melrief commented Aug 13, 2013

I added this to the first milestone, we can't afford to release a program that goes in race condition when many instances are running together

@ghost ghost assigned melrief Aug 13, 2013
@melrief
Copy link
Collaborator Author

melrief commented Aug 13, 2013

I was able to isolate the problem, it is hint as I suspected. I opened a bug report. For now I will try to lock the compilation part such that no race condition can happen, but in future I hope we will be able to run in parallel the interpreter.

@melrief
Copy link
Collaborator Author

melrief commented Aug 15, 2013

Added a new branch lock, which locks both the configuration compilation and the expression interpretation. I haven't merged it yet in the merge branch because I don't know how to test it well. @gelisam I don't have windows nor cygwin, could you please try to use the executable from that branch under cygwin when you have time and report if many instances of hawk pipelined can still raise problems? For example:

> hawk -e '[1 .. 10000]' | hawk -m 'head' | hawk -d 'P.filter (P.not . B.null)' | hawk -m '"The first letter is: " `append`'

with prelude.hs:

import qualified Data.ByteString.Lazy as B

@gelisam
Copy link
Collaborator

gelisam commented Aug 15, 2013

I could reproduce the problem on cygwin earlier, but not since I have re-compiled? Which is strange, given that it is not your lock branch which I re-compiled!

I will try harder to reproduce the issue.

@gelisam
Copy link
Collaborator

gelisam commented Aug 15, 2013

Ah! Of course, the issue only occurs if prelude.hs has just been changed. Which is why I could only reproduce it on my very first run. The re-compilation had nothing to do with it!

@gelisam
Copy link
Collaborator

gelisam commented Aug 15, 2013

Damn, I'm having trouble installing the new unix dependency because of this problem.

@gelisam
Copy link
Collaborator

gelisam commented Aug 15, 2013

Okay, I tracked down the error, but I don't know how to fix it yet.

cabal fetches unix-2.6.0.1.tar.gz, unpacks it and runs its ./configure script. This configure scripts runs cygwin's gcc, finds -lrt and -ldl, and tells cabal that everything is fine, it just needs to link with -lrt and -ldl. Cabal then hands over the information to ghc's mingw-based gcc, which cannot find either.

I'm not sure what those rt and dl dependencies are, but apparently I need to install them as mingw packages, even though I didn't explicitly install mingw myself.

@gelisam
Copy link
Collaborator

gelisam commented Aug 22, 2013

Maybe this solution might be more cross-platform? I'll need to test on my work machine, I don't use Windows at home either.

@gelisam
Copy link
Collaborator

gelisam commented Aug 22, 2013

Nah, that code has an obvious race condition.

@gelisam
Copy link
Collaborator

gelisam commented Aug 22, 2013

A bit lower on this page, however, are examples in other languages in which a socket number is used as a flag. Would this work?

@gelisam
Copy link
Collaborator

gelisam commented Aug 23, 2013

I have pushed a different solution to the lock branch, please check it out. So far I have only checked it on cygwin.

Since I use a socket as a lock indicator, I am concerned about the impact on performance: is the socket number released immediately after it is closed, or is there a delay?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants