-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototypes rev 1.1: inputs/outputs #103
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Aidan Oldershaw <[email protected]>
Since prototypes can emit multiple response objects, this also means you can | ||
have *streams* of outputs sharing a name that are identified by some other | ||
field(s). e.g. the above artifact could be identified as | ||
`some_output{some_data: 123}` (or something along those lines). That gives you |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see use cases for emitting streams of outputs (across MessageResponses
) - for instance, you have a Go prototype that can compile multiple packages simultaneously, and you want each package to have its own artifact - or you have a code scanner prototype that can scan multiple repos simultaneously.
However, in these cases, it's probably not uncommon to want single standalone outputs as well. For the go build
prototype, for instance, you'd probably want to cache the module cache in the GOPATH
. Since all packages would be built using the same GOPATH
, there's really only a single modcache
output - not one per MessageResponse
. Forcing the outputs to be defined alongside streams of data make this a bit awkward - should the same artifact be emitted in each of the MessageResponses
? Just the first?
Granted, the use cases I mentioned are mainly around performance (not wanting to spin up many containers, and instead shoehorning some of the across
step's behaviour into prototypes). tbh don't have any strong use cases in mind for streams of outputs that can't be accomplished by having a stream of run
steps instead, at the cost of performance.
* Can't mount inputs at specific paths - if your prototype requires a specific | ||
filesystem layout, it needs to shuffle the inputs around | ||
* e.g. [oci-build-task] may depend on inputs being at certain paths relative to | ||
the `Dockerfile` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vito brought up an interesting idea for this - adding a special syntax for defining a directory layout within run.params
. For instance, in order to build an OCI image (like the oci-build-task does) with a dependency mounted under the main build context, you could do something like:
run: build
type: oci-image
params:
context:
tree: # this is a special syntax that says...
.: @concourse # ...remount the "concourse" artifact to . (within the tree)
./dumb-init: @dumb-init # ...remount the "dumb-init" artifact to ./dumb-init (within the tree)
(p.s. this is just an example syntax)
The prototype would then receive something like:
{
"object": {
"context": "/tmp/build/context"
}
}
...where /tmp/build/context
has the following directory structure:
/tmp/build/context
README.md
Dockerfile
atc/
... # other concourse files
dumb-init/
... # dumb-init binaries
For comparison, this is how you'd do something similar with the oci-build-task:
task: build
privileged: true
config:
platform: linux
image_resource:
type: registry-image
source:
repository: vito/oci-build-task
inputs:
- name: concourse
path: .
- name: dumb-init
outputs:
- name: image
run:
path: build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to this thought - the @
syntax is a way to say "mount this artifact as an input and give me the path to it". {tree: {...}}
(or whatever syntax we may come up with) is saying something very similar: "mount this tree of artifacts as an input and give me the path to it". What if we unified the two concepts, such that @artifact
is just a short-hand for {tree: {.: artifact}}
(or {tree: artifact}
)?
One difficulty here is that now you can't really append strings to the {tree: ...}
syntax, e.g. to represent globs. For instance, you might do something like: @binaries/*-linux
to refer to the linux binary, but you can't really do {tree: binaries}/*-linux
without some hackery (especially if the tree: {...}
spans multiple lines, like with the initial context:
example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe, what if we supported configuring run.inputs: [...]
- we just provided a short-hand syntax (e.g. @foo
) for avoiding needing to repeat artifact names in both params
and inputs
(where possible).
Consider the following run
step:
run: build
type: go
params:
module: @my-repo
package: ./cmd/my-cmd
This could be a short-hand for the following equivalent run
step:
run: build
type: go
inputs:
- name: my-repo
params:
module: my-repo
package: ./cmd/my-cmd
If you need a more complex setup, e.g. inputs mounted at specific paths, you could still do that explicitly:
run: build
type: oci-image
params:
context: @concourse
inputs:
- name: dumb-init
path: concourse/dumb-init
or:
run: build
type: oci-image
params:
context: concourse
inputs:
- name: concourse
- name: dumb-init
path: concourse/dumb-init
or even:
run: build
type: oci-image
params:
context: .
inputs:
- name: concourse
path: .
- name: dumb-init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question with the @
syntax is how we can differentiate it from normal uses of the "@" sign (e.g. to signify a slack handle/twitter username). There are a couple options I can see:
- Provide a syntax to escape the "@" sign (e.g.
handle: \@username_here
orhandle: @@username_here
). However, this adds a bit of overhead when you have to work with "@" (which admittedly is likely to be pretty rare, I would guess) - Disambiguate the
@
syntax. For instance, maybe you need to wrap the artifact name in{...}
, e.g.context: @{concourse}
. Given that this is the common case, though, it could be a bit annoying - but should have less mental overhead- I realize that if you wanted to use the literal string
@{something}
, you'd still need some escape mechanism - but that seems like a pretty unlikely case
- I realize that if you wanted to use the literal string
Wrapping the artifact name in {...}
also may allow us to extend the syntax. For instance, currently the @artifact
syntax only lets you mount artifacts at a path identical to their name, and if you want a different path you need to use inputs: [...]
. However, we could possibly introduce some syntax to configure the mount path as well, e.g.:
run: build
type: oci-image
params:
context: @{concourse:.} # resolves to ".", and mounts concourse at "."
...which would be equivalent to:
run: build
type: oci-image
inputs:
- name: concourse
path: .
params:
context: .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized that the proposed @
syntax isn't very practical, since @
is a reserved character in YAML. It's still possible to use, but requires quoting the value, which is pretty annoying.
Another possible syntax is using <
to mean input, since it points "inward" toward the params e.g.:
run: build
type: oci-image
params:
context: <{concourse} # feels like injecting `concourse` into the context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To build on the idea of having a short-hand syntax for specifying inputs, what if we did the same for specifying outputs? Currently, all the proposed approaches have the prototype reporting the outputs. What if instead we made it explicit in the build plan? For instance, we could use another character to denote outputs, e.g.:
run: build
type: oci-image
params:
context: <{concourse:.} # `concourse` is an input mounted to `.`
targets:
={builder-image}: builder # `builder-image` is an output
={final-image}: app # `final-image` is an output
...which is equivalent to:
run: build
type: oci-image
params:
context: .
targets:
builder-image: builder
final-image: app
inputs:
- name: concourse
path: .
outputs:
- name: builder-image
- name: final-image
Here, =
is used to denote "the name (and optionally path) following is an output of the step". The thinking is that =
denotes assignment to the build scope. I wanted to use something like >
to parallel <
for inputs, but unfortunately that's a control flow character, so it's not possible to use without quoting. Plus, it could be easy to mix <
and >
up.
The example in the motivation for alternative approaches could be written as:
run: build
type: go
params:
packages:
={cmd1-binary}: <{repo1}/cmd/cmd1
={cmd2-binary}: <{repo1}/cmd/cmd2
={cmd3-binary}: <{repo2}/cmd/cmd3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other thought - we don't have to try to come up with symbols to denote inputs and outputs, and could make it more verbose, e.g.:
run: build
type: go
params:
packages:
output{cmd1-binary}: input{repo1}/cmd/cmd1
output{cmd2-binary}: input{repo1}/cmd/cmd2
output{cmd3-binary}: input{repo2}/cmd/cmd3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a bummer about the @
syntax. I guess one alternative to consider could be ${foo}
, although it kind of carries a 'regular old var syntax' connotation, not artifacts. TBH I'm not a huge fan of the proposed syntaxes in the last few comments; <
made me think of the >
YAML multiline syntax and the rest don't feel like they suggest inputs/outputs/interpolation to me. (TBH I'm not sure why @
did either - maybe from my background in Ruby. 😆)
I like the idea behind @{foo:mount-path}
- though one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params
. For example, if you're setting up a directory tree to docker build
you may be tempted to pass other artifacts as bogus values just to mount them beneath the context dir:
params:
context: @{concourse:context}
blah: @{dumb-init:context/dumb-init}
Which of course wouldn't work if the prototype validates its fields. In this case the more explicit inputs:
form would probably be favorable.
Just noting, not arguing in favor of it, but the tree:
approach from #103 (comment) avoids this issue by representing the root path and its subpaths as one nested value under params:
, with only the root path being ultimately passed to the prototype, and the nested values just used for setting up mounts beneath it.
Re: #103 (comment) - explicitly declaring outputs is closer to how the prototype RFC is now, just without special syntax, and it only has a list of output names. (ref.)
Here's a thought: what if instead of having params:
implicitly fill in inputs:
, we went the other way around and had inputs:
fill in params:
?
In this mockup, I've added artifact:
to the input config which is the artifact that will be passed along, and param
is a field to set under params:
. This way you can configure artifact mounts however you like, and you don't have to repeat yourself in params:
.
run: build
type: oci-image
inputs:
- param: context
artifact: concourse
path: .
- artifact: dumb-init
path: ./dumb-init
With this example the prototype would receive {"context": "."}
.
One downside of this approach is that it works best with simple named params, not arbitrary data structures such as arrays or maps of artifacts. I guess we could support path-style things like params: foo.bar
but it'd get pretty awkward with arrays; foo[0]
, foo[1]
would be annoying to maintain. But maybe we just don't care about either case and we ain't gonna need it.
Not sure how this approach could be extended to outputs. I'll leave this comment for now and see if anything springs to mind. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inputs
filling in params
is an interesting approach, but as you mention, it does feel a little awkward for certain cases. For instance, suppose you wanted to specify a list of globs, e.g. for creating a GitHub release. Using some syntax for embedding inputs in the params (I'll go with the more explicit syntax in #103 (comment)), it feels pretty natural:
put: gh-release # could also be a `run:` step
params:
globs: [input{concourse-binaries}/*, input{fly-binaries}/*, input{license}/*]
...whereas with inputs[].param
, it's not so easy - in addition to the array indexing issue, how do you add the /*
suffix? Would you ever want to add a prefix?
one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params
Yeah, that's true - the thinking was that you'd have to use run.inputs
for cases where it doesn't make sense to embed the inputs directly in params
. Something like the tree
syntax could solve that, but there are a few things I'm not sure about there:
- Would you ever need one of these "sibling" inputs to be up a directory from the root of the tree? e.g. you have a go module which has a
replace
directive to a sibling directory (e.g. guardian does this - https://github.com/cloudfoundry/guardian/blob/2f945c09a983e4/go.mod#L60-L63). If you want to compile guardian, you'd wantparams.module
to point to the mount path ofguardian
, but then you also want to mount "implicit" inputs (garden
, etc.) up a directory. Do we want to allow going up a directory intree
? e.g.:
run: build
type: go
params:
module:
tree:
.: guardian
../garden: garden
Maybe tree
isn't the best name in this case, given that it would be able to modify mounts outside of the directory tree that it returns?
- Is there a syntax we can use to add suffixes (e.g. globs) to the
tree
result? If it's just defined as a YAML object, it's not obvious how you could append a string
Rendered: https://github.com/aoldershaw/rfcs/blob/prototype-inputs-outputs/037-prototypes/proposal.md#inputsoutputs