-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix many issues with examples #670
base: wdl-1.2
Are you sure you want to change the base?
Conversation
@stxue1 @adamnovak would also appreciate your reviews |
SPEC.md
Outdated
} | ||
``` | ||
|
||
An execution engine may support [other ways](#input-and-output-formats) to specify `File` and `Directory` inputs (e.g., as URIs), but prior to task execution it must [localize inputs](#task-input-localization) so that the runtime value of a `File`/`Directory` variable is a local path. | ||
Within a WDL file, literal values for files and directories may only be paths that are local to the execution environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So an engine is not allowed to support something like:
inputs {
File genome_reference = "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz"
}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I've updated the spec to say that the execution engine is only required to support paths - the implication being it's also free to support URLs but that may not be portable across engines.
I'm open to expanding this in WDL 1.3 to require engines to support http(s) URLs as well. This would be consistent with the requirement to support http(s) URLs in imports.
SPEC.md
Outdated
|
||
If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task. | ||
If a mount point is specified, then it must be an absolute path to a location in the host environment (i.e., within the container). The specified path either must not already exist in the host environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've always seen "host environment" in the context of containerized applications used to mean the system outside the container; not the system within the container. If I am on a machine booted into Fedora and I run a WDL workflow that calls for an ubuntu:22.04
container to run, the "host" as I understand it would be the Fedora environment.
Also, I don't think that containers have a notion of some amount of space being available at a path in the container. I don't think I can make an image that has 20 GiB of free space at /mnt/freespace
and upload it to Docker Hub, as far as I know. So I don't think saying must be empty and have at least the requested amount of space available
is different from just saying must be empty
. You can mount at paths that don't exist in your container image, or you can mount at paths that exist as empty directories, but you can't mount at paths that are files or paths that are directories that have contents.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should be more consistent with usage. I've updated the spec such that all uses of "execution environment" refers to the context in which the task is executed, and "host" refers to the system that hosts the execution environment.
@@ -3386,7 +3419,7 @@ A document is imported using it's [URI](https://en.wikipedia.org/wiki/Uniform_Re | |||
* `https://` | |||
* 🗑 `file://` - Using the `file://` protocol for local imports can be problematic. Its use is deprecated and will be removed in WDL 2.0. | |||
|
|||
In the event that there is no protocol specified, the import is resolved **relative to the location of the current document**. In the primary WDL document, a protocol-less import is relative to the host file system. If a protocol-less import starts with `/` it is interpreted as relative to the root of the host in the resolved URI. | |||
In the event that there is no protocol specified, the import is resolved **relative to the location of the current document**. In the primary WDL document, a protocol-less import is relative to the folder that contains the primary WDL file. If a protocol-less import starts with `/` it is interpreted as relative to the root of the file system that contains the primary WDL file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language here makes an assumption that path resolution happens on disk. This is not always true, as would be the case for resolving the imports of a Dockstore workflow. Its common practice to use relative import uris, but when resolving the workflow you would resolve it against the https://
context
@@ -5081,9 +5119,11 @@ Test config: | |||
* `Array[String]` - An array of disk specifications. | |||
* Default value: `1 GiB` | |||
|
|||
The `disks` attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location. When the `disks` attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command. | |||
The `disks` attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location with both read and write permissions. When the `disks` attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read and write permissions
I am worried this requirement constitutes a breaking change. I am not 1--% clear how cromwell works with data-disks and other existing disks but I would not be surprised if it treated some mounts as read-only
|
||
If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task. | ||
If a mount point is specified, then it must be an absolute path to a location in the execution environment (i.e., within the container). The specified path either must not already exist in the execution environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specified path either must not already exist in the execution environment, or it must be empt
I think this is technically hard to achieve when using a container technology like Docker, which simply allows overriding paths.
@@ -7957,7 +7996,7 @@ File join_paths(File, Array[String]+) | |||
File join_paths(Array[String]+) | |||
``` | |||
|
|||
Joins together two or more paths into an absolute path in the host filesystem. | |||
Joins together two or more paths into an absolute path in the execution environment's filesystem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should define these somewhere. Execution environment can mean multiple different things in different contexts. IE in this case it could be
- the trusted compute environment ie a cloud project
- a VM
- The running docker container
|
||
It is up to the execution engine to resolve input files and directories and stage them into the execution environment. The execution engine is free to specify the values that are allowed for `File` and `Directory` parameters, but at a minimum it is required to support POSIX absolute file paths (e.g., `/path/to/file`). | ||
|
||
It is strongly recommended that input files and directories be specified as absolute paths to local files or as URLs. If relative paths are allowed, then it is suggested that they be resolved relative to the directory that contains the input JSON file (if a file is provided) or to the working directory in which the workflow is initially launched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should change the suggestion from URL to
URI` to capture use cases of cloud URIs
Checklist
Closes #672