Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output file race condition upgrading from RADICAL 1.60 to RADICAL 1.83 #3722

Closed
benclifford opened this issue Dec 14, 2024 · 3 comments
Closed
Labels

Comments

@benclifford
Copy link
Collaborator

Describe the bug

I see what I think is a notification-of-completion race condition in RADICAL pilot 1.83 (upgrading from 1.60, the current Parsl pinned version in setup.py)

Various file related tests fail after the upgrade. For example:

$ pip install radical.pilot==1.83 radical.utils==1.83 
$ pytest parsl/tests/test_docs/test_workflow1.py --config parsl/tests/configs/local_radical.py 

fails with

========================================================== FAILURES ===========================================================
_______________________________________________________ test_procedural _______________________________________________________

N = 2

    @pytest.mark.staging_required
    def test_procedural(N=2):
        """Procedural workflow example from docs on
        Composing a workflow
        """
    
        if os.path.exists('output.txt'):
            os.remove('output.txt')
    
        message = generate(N)
    
        saved = save(message, outputs=[File('output.txt')])
    
>       with open(saved.outputs[0].result().filepath, 'r') as f:
E       FileNotFoundError: [Errno 2] No such file or directory: 'output.txt'

parsl/tests/test_docs/test_workflow1.py:38: FileNotFoundError

I see output.txt on manual inspection, and if I add a long sleep right before that open, the test does not fail.

diff --git a/parsl/tests/test_docs/test_workflow1.py b/parsl/tests/test_docs/test_workflow1.py
index 271baab4d8..e80edf32f1 100644
--- a/parsl/tests/test_docs/test_workflow1.py
+++ b/parsl/tests/test_docs/test_workflow1.py
@@ -35,6 +35,8 @@ def test_procedural(N=2):
 
     saved = save(message, outputs=[File('output.txt')])
 
+    import time
+    time.sleep(10)
     with open(saved.outputs[0].result().filepath, 'r') as f:
         item = int(f.read().strip())
         assert item <= N, "Expected file to contain int <= N"

This file is created by the app in a working directory, not directly in place, and moved into its output resting location by radical. So this is very reminiscent of other problems I've seen in Parsl of claiming a task is finished when actually there is a little bit more to happen. See #1279 for example.

This blocks upgrading Parsl to 1.83, which I was just looking at doing as part of #3646 Python 3.13 support.

cc @AymenFJA @andre-merzky

To Reproduce
as above

Expected behavior
test should pass, as it does right now with parsl master dependencies

Environment
my laptop, and parsl CI.

@benclifford
Copy link
Collaborator Author

digging in slightly more, this test passes with radical.pilot==1.62 and fails with radical.pilot==1.81 - that's the closest two release versions I can bisect between, I think.

@andre-merzky
Copy link

Woah, that's a funny one! I can reproduce this in RP, so a fix should be upcoming shortly. Thanks for noting this!

@andre-merzky
Copy link

This is fixed in radical-cybertools/radical.pilot#3289, we'll release that ASAP.

github-merge-queue bot pushed a commit that referenced this issue Dec 17, 2024
This brings in fixes for a few issues that are fixed on the radical side
of things:

#3722 - a race condition on task completion
#3708 - cleaner shutdown handling as part of #3397
#3646 - Python 3.13 support

# Changed Behaviour

whatever has changed in radical-pilot

# Fixes

#3722

## Type of change

- Bug fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants