automatically propagate task values to environment variables #458
Replies: 3 comments 5 replies
-
Interesting proposal, though on thing that immediately popped into my head was the potential of unintended or unknown side effects. Some programs will look at the environment variables to automatically set certain settings. If someone is unaware of a given environment variable being meaningful to the program they're using, then that might mean they end up running the program with different settings than intended. Potentially without ever being made aware of that. This might just be an edge case though and one could argue you should know how to run the tools you're putting in your workflow, so this might not be a big issue. |
Beta Was this translation helpful? Give feedback.
-
@mlin this is a really interesting idea and would definitely simplify alot of very simple tasks by not requiring any sort of interpolation. Using environment variables seems like a really elegant way of passing values to a task To @DavyCats point, I think we could simply use a few conventions that alot of people will be familiar with for environment variables. First of all everything probably should be prefixed with a well documented string, Secondly, we might want to keep environment variables in all caps. In bash it seems like it's a general convention to make globally available vars full caps and local vars lowercase. So I think we can also draw on examples of existing software for defining how complex types should work. I personally write alot of applications in spring, which allows you define very complex configuration (normally written in yaml or json) as environment variables. And we could mimic that. For example say we have the following variables
Then environment values would be assigned like the following
|
Beta Was this translation helpful? Give feedback.
-
Our application libraries follow 12-factor best practices, and accept (and in some cases even require) environment variables as core configuration. As @mlin called out, most of the workarounds for the lack of first-class env var support in WDL are error-prone due to quoting issues or shell interpretation. I agree that the WDL spec should have first-class support for environment variables, but I have concerns about some of the suggestions in this thread:
|
Beta Was this translation helpful? Give feedback.
-
WDL
~{}
interpolations have a couple of disadvantages:A potential solution to both problems is to prefer passing inputs to the task command script as environment variables, instead of textually interpolating them. If the WDL engine sets the task inputs in the container environment, then the command script can use standard bash rules for handling them safely (which has pitfalls of its own, but at least they're well-known ones that ShellCheck can point out), and this may also ease the learning curve for new WDLers.
In chanzuckerberg/miniwdl#503 I prototyped a task runtime setting
autoEnv: true
which causes the task's WDL value declarations to implicitly propagate into the environment of the command script. For example, if the WDL task has an inputFile bam
, then in the command script,$bam
will refer to the localized filename. So too forString
,Int
, etc. values. This prototype is meant as a discussion starting point.One open question is how to deal with WDL's compound types, like arrays and structs. In this prototype I've chosen to punt on this by making the environment propagation apply only to "atomic" value types. For compound types, one could introduce an auxiliary value like e.g.
File filenames = write_lines(file_array)
and then consume$filenames
in the command. This is open to discussion, of course -- a possible alternative is to load them into the environment as JSON text. (However, it's conceivable that a compound WDL data structure could be too large to fit in an environment variable, or into whatever API call sets up the container environment.)Beta Was this translation helpful? Give feedback.
All reactions