-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CBRAIN] CARMIN has no API to 'prepare' a file for a pipeline #107
Comments
Hi @prioux, a few comments on this issue:
The paths in CARMIN are meant to be defined by the platform. There is no requirement that these paths actually correspond to real paths on any server, although a platform may decide to do so. In VIP for instance, paths are logical names registered in a file catalog where they are associated with physical location(s). The platform has the liberty to define their paths so that there are convenient for them and their users. In case of CBRAIN, it may make sense to have paths of the following form:
|
As @glatard said, the data module has been designed around the notion of path, but with a lot of liberty given to the implementing platform to use these paths as it wants. The idea is that each platform should choose them to mirror how it stores and identifies the files internally. That's why it's not clear whether paths are relative or absolute, it's not defined in CARMIN. In the end what is necessary is that each path should identify uniquely a file. As Tristan said, it's easy in VIP as we use internally a catalog as the database to map each logical file to its physical location and the key/identifier is a path. Discussion on @prioux suggestions@prioux There's something I don't get about what you did in CBRAIN :
How does the user do step 2 ? What information does he have to find the file he uploaded ? Is the path used in step 1 useful or is this path discarded after step 1 ? About your first suggestion :
I don't get it. It would mean that you could identify the file with its path, which is the point of the current CARMIN data spec. Why couldn't you use this path as execution input then ? About your second suggestion :
Similarly to my comment on the first suggestion, where does this path come from ? I get the preparation stuff but I can't see where this answers this discussed issue. Proposition of a solutionI agree with @glatard in that ID-based platforms should have ID-based paths.
The issue is that it's not currently possible with CARMIN where it's the user who chooses the path whereas the ID is generated by the platform. So we need a way for the user to give the file content and the file name, and let the platform decide of the ID and the path that includes that ID and return it. I think it would be complicated to add this feature in the current The solution I like most would be to add a |
Note: this issue will be part of a series describing limitations and questions that arose while implementing CARMIN within CBRAIN.
CARMIN has a very simple data management model: everything is just files and directories that are stored server-side under an abstract 'root' chosen by the people who deployed the CARMIN server.
Launching pipelines on these files imply providing their paths as arguments to the pipeline's parameters. It is not clear if the paths are expected to be relative to the server's data root (e.g.
some/stuff/file.txt
) or absolute (e.g./mnt/nfs/data1/carmin_data/bradley/some/stuff/file.txt
). In the later case, how does the CARMIN API user even get that path? In the former case, how can the CARMIN API user even be sure thatsome/stuff/file.txt
will be an appropriate argument for any of the pipelines? Is it expected that all pipelines will run with their cwd set to the root directory of CARMIN storage area?In CBRAIN, data files are registered and given a unique numerical ID. I'm not going to go into the details for our framework, but the basic idea is that data files don't have a fixed path. The path is determined at the moment of the pipeline's start, because the pipeline can be executed on any number of remote servers that have distinct file system configuration (think supercomputer clusters). A CBRAIN pipeline (task) asks for a file to be 'synchronized' by ID, whichs brings a copy of the file to the remote server, and then its local path is provided to the pipeline.
So right now to use a CARMIN pipeline in CBRAIN, one has to:
Steps 1 and 3 work in CARMIN, step 2 doesn't have any CARMIN API equivalent.
What we would need in CARMIN is an extension to an existing call:
GET /PATH
: the JSON record should contain an entry for a platform-specific ID associated with the pathA more generic solution that other implementer woudl probably like (but that CBRAIN doesn't need) would be:
PUT /executions/{executionIdentifier}/preparePath/some/stuff/file.txt
which would tell the server side to prepare the path/some/stuff/file.txt
specifically for the execution by taskexecutionIdentifier
.The text was updated successfully, but these errors were encountered: