Fix many issues with examples #670

jdidion · 2024-06-26T01:16:46Z

Checklist

Pull request details were added to CHANGELOG.md

Closes #672

jdidion · 2024-07-24T23:39:53Z

@stxue1 @adamnovak would also appreciate your reviews

adamnovak · 2024-07-25T20:37:08Z

SPEC.md

 }
 ```

-An execution engine may support [other ways](#input-and-output-formats) to specify `File` and `Directory` inputs (e.g., as URIs), but prior to task execution it must [localize inputs](#task-input-localization) so that the runtime value of a `File`/`Directory` variable is a local path.
+Within a WDL file, literal values for files and directories may only be paths that are local to the execution environment.


So an engine is not allowed to support something like:

inputs { File genome_reference = "https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/001/405/GCA_000001405.15_GRCh38/seqs_for_alignment_pipelines.ucsc_ids/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gz" }

?

Good point - I've updated the spec to say that the execution engine is only required to support paths - the implication being it's also free to support URLs but that may not be portable across engines.

I'm open to expanding this in WDL 1.3 to require engines to support http(s) URLs as well. This would be consistent with the requirement to support http(s) URLs in imports.

adamnovak · 2024-07-25T20:49:55Z

SPEC.md


-If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task.
+If a mount point is specified, then it must be an absolute path to a location in the host environment (i.e., within the container). The specified path either must not already exist in the host environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes.


I've always seen "host environment" in the context of containerized applications used to mean the system outside the container; not the system within the container. If I am on a machine booted into Fedora and I run a WDL workflow that calls for an ubuntu:22.04 container to run, the "host" as I understand it would be the Fedora environment.

Also, I don't think that containers have a notion of some amount of space being available at a path in the container. I don't think I can make an image that has 20 GiB of free space at /mnt/freespace and upload it to Docker Hub, as far as I know. So I don't think saying must be empty and have at least the requested amount of space available is different from just saying must be empty. You can mount at paths that don't exist in your container image, or you can mount at paths that exist as empty directories, but you can't mount at paths that are files or paths that are directories that have contents.

Yes, we should be more consistent with usage. I've updated the spec such that all uses of "execution environment" refers to the context in which the task is executed, and "host" refers to the system that hosts the execution environment.

SPEC.md

Fix typo

… formats

patmagee · 2024-08-23T01:57:51Z

SPEC.md

@@ -3386,7 +3419,7 @@ A document is imported using it's [URI](https://en.wikipedia.org/wiki/Uniform_Re
 * `https://`
 * 🗑 `file://` - Using the `file://` protocol for local imports can be problematic. Its use is deprecated and will be removed in WDL 2.0.

-In the event that there is no protocol specified, the import is resolved **relative to the location of the current document**. In the primary WDL document, a protocol-less import is relative to the host file system. If a protocol-less import starts with `/` it is interpreted as relative to the root of the host in the resolved URI.
+In the event that there is no protocol specified, the import is resolved **relative to the location of the current document**. In the primary WDL document, a protocol-less import is relative to the folder that contains the primary WDL file. If a protocol-less import starts with `/` it is interpreted as relative to the root of the file system that contains the primary WDL file.


The language here makes an assumption that path resolution happens on disk. This is not always true, as would be the case for resolving the imports of a Dockstore workflow. Its common practice to use relative import uris, but when resolving the workflow you would resolve it against the https:// context

patmagee · 2024-08-23T02:00:18Z

SPEC.md

@@ -5081,9 +5119,11 @@ Test config:
    * `Array[String]` - An array of disk specifications.
 * Default value: `1 GiB`

-The `disks` attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location. When the `disks` attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command.
+The `disks` attribute provides a way to request one or more persistent volumes, each of which has a minimum size and is mounted at a specific location with both read and write permissions. When the `disks` attribute is provided, the execution engine must guarantee the requested resources are available or immediately fail the task prior to instantiating the command.


read and write permissions

I am worried this requirement constitutes a breaking change. I am not 1--% clear how cromwell works with data-disks and other existing disks but I would not be surprised if it treated some mounts as read-only

patmagee · 2024-08-23T02:03:12Z

SPEC.md


-If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task.
+If a mount point is specified, then it must be an absolute path to a location in the execution environment (i.e., within the container). The specified path either must not already exist in the execution environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes.


The specified path either must not already exist in the execution environment, or it must be empt

I think this is technically hard to achieve when using a container technology like Docker, which simply allows overriding paths.

patmagee · 2024-08-23T02:04:44Z

SPEC.md

@@ -7957,7 +7996,7 @@ File join_paths(File, Array[String]+)
 File join_paths(Array[String]+)
 ```

-Joins together two or more paths into an absolute path in the host filesystem.
+Joins together two or more paths into an absolute path in the execution environment's filesystem.


We should define these somewhere. Execution environment can mean multiple different things in different contexts. IE in this case it could be

the trusted compute environment ie a cloud project

a VM

The running docker container

patmagee · 2024-08-23T02:07:08Z

SPEC.md

+
+It is up to the execution engine to resolve input files and directories and stage them into the execution environment. The execution engine is free to specify the values that are allowed for `File` and `Directory` parameters, but at a minimum it is required to support POSIX absolute file paths (e.g., `/path/to/file`).
+
+It is strongly recommended that input files and directories be specified as absolute paths to local files or as URLs. If relative paths are allowed, then it is suggested that they be resolved relative to the directory that contains the input JSON file (if a file is provided) or to the working directory in which the workflow is initially launched.


Should change the suggestion from URL to URI` to capture use cases of cloud URIs

jdidion added 18 commits June 24, 2024 18:01

update versions

1d2309b

fix #661

d7a7156

update changelog;

100e82f

update changelog, fix #662

edd1d9b

update changelog, fix #663

efd4165

update changelog, fix #663

4361edc

update changelog, fix #664

7b9e5fe

update changelog, fix #665

273e7cc

update changelog, fix #666

7a509ae

update changelog, fix #665

be15cf2

update changelog, fix #666

e2418eb

update changelog, fix #668

a2f6b57

fix #667

d470654

update changelog

3da8825

672: clarify that mount points must be empty, writable, and ephemeral

5bd0b0f

671: fix python json serialization of string list

ea1cd0a

migrate changes from #674

b927573

fix

91f2788

jdidion requested review from wleepang, vsmalladi, geoffjentry, patmagee, cjllanwarne, markjschreiber and rhpvorderman July 24, 2024 23:36

jdidion and others added 3 commits July 24, 2024 19:27

disallow relative path literals

830d074

update changelog

a38d161

Fix typo

fcd5751

adamnovak reviewed Jul 25, 2024

View reviewed changes

jdidion added 5 commits July 25, 2024 16:48

Merge pull request #677 from markwoon/patch-1

2cf8f64

Fix typo

use index_id rather than hard-coded path

5f650f4

clarify

d954a11

fix test, make use of "host" vs "execution environment" consistent

b994318

suggest values for file/directory inputs and outputs in stadnard JSON…

865b913

… formats

patmagee reviewed Aug 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix many issues with examples #670

Fix many issues with examples #670

jdidion commented Jun 26, 2024 •

edited

Loading

jdidion commented Jul 24, 2024

adamnovak Jul 25, 2024

jdidion Jul 26, 2024 •

edited

Loading

adamnovak Jul 25, 2024

jdidion Jul 26, 2024

patmagee Aug 23, 2024

patmagee Aug 23, 2024

patmagee Aug 23, 2024

patmagee Aug 23, 2024

patmagee Aug 23, 2024


		If a mount point is specified, then it must be an absolute path to a location in the host environment. If the mount point is omitted, it is assumed to be a persistent volume mounted at the root of the execution directory within a task.
		If a mount point is specified, then it must be an absolute path to a location in the host environment (i.e., within the container). The specified path either must not already exist in the host environment, or it must be empty and have at least the requested amount of space available. The mount point should be assumed to be ephemeral, i.e., it will be deleted after the task completes.


		It is up to the execution engine to resolve input files and directories and stage them into the execution environment. The execution engine is free to specify the values that are allowed for `File` and `Directory` parameters, but at a minimum it is required to support POSIX absolute file paths (e.g., `/path/to/file`).

		It is strongly recommended that input files and directories be specified as absolute paths to local files or as URLs. If relative paths are allowed, then it is suggested that they be resolved relative to the directory that contains the input JSON file (if a file is provided) or to the working directory in which the workflow is initially launched.

Fix many issues with examples #670

Are you sure you want to change the base?

Fix many issues with examples #670

Conversation

jdidion commented Jun 26, 2024 • edited Loading

Checklist

jdidion commented Jul 24, 2024

Choose a reason for hiding this comment

jdidion Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jdidion commented Jun 26, 2024 •

edited

Loading

jdidion Jul 26, 2024 •

edited

Loading