Prototypes rev 1.1: inputs/outputs #103

aoldershaw · 2021-05-04T20:38:06Z

Rendered: https://github.com/aoldershaw/rfcs/blob/prototype-inputs-outputs/037-prototypes/proposal.md#inputsoutputs

Signed-off-by: Aidan Oldershaw <[email protected]>

aoldershaw · 2021-05-10T14:17:49Z

037-prototypes/proposal.md

+Since prototypes can emit multiple response objects, this also means you can
+have *streams* of outputs sharing a name that are identified by some other
+field(s). e.g. the above artifact could be identified as
+`some_output{some_data: 123}` (or something along those lines). That gives you


I can see use cases for emitting streams of outputs (across MessageResponses) - for instance, you have a Go prototype that can compile multiple packages simultaneously, and you want each package to have its own artifact - or you have a code scanner prototype that can scan multiple repos simultaneously.

However, in these cases, it's probably not uncommon to want single standalone outputs as well. For the go build prototype, for instance, you'd probably want to cache the module cache in the GOPATH. Since all packages would be built using the same GOPATH, there's really only a single modcache output - not one per MessageResponse. Forcing the outputs to be defined alongside streams of data make this a bit awkward - should the same artifact be emitted in each of the MessageResponses? Just the first?

Granted, the use cases I mentioned are mainly around performance (not wanting to spin up many containers, and instead shoehorning some of the across step's behaviour into prototypes). tbh don't have any strong use cases in mind for streams of outputs that can't be accomplished by having a stream of run steps instead, at the cost of performance.

aoldershaw · 2021-05-10T14:34:24Z

037-prototypes/proposal.md

+* Can't mount inputs at specific paths - if your prototype requires a specific
+  filesystem layout, it needs to shuffle the inputs around
+  * e.g. [oci-build-task] may depend on inputs being at certain paths relative to
+    the `Dockerfile`


@vito brought up an interesting idea for this - adding a special syntax for defining a directory layout within run.params. For instance, in order to build an OCI image (like the oci-build-task does) with a dependency mounted under the main build context, you could do something like:

run: build type: oci-image params: context: tree: # this is a special syntax that says... .: @concourse # ...remount the "concourse" artifact to . (within the tree) ./dumb-init: @dumb-init # ...remount the "dumb-init" artifact to ./dumb-init (within the tree)

(p.s. this is just an example syntax)

The prototype would then receive something like:

{ "object": { "context": "/tmp/build/context" } }

...where /tmp/build/context has the following directory structure:

/tmp/build/context README.md Dockerfile atc/ ... # other concourse files dumb-init/ ... # dumb-init binaries

For comparison, this is how you'd do something similar with the oci-build-task:

task: build privileged: true config: platform: linux image_resource: type: registry-image source: repository: vito/oci-build-task inputs: - name: concourse path: . - name: dumb-init outputs: - name: image run: path: build

Related to this thought - the @ syntax is a way to say "mount this artifact as an input and give me the path to it". {tree: {...}} (or whatever syntax we may come up with) is saying something very similar: "mount this tree of artifacts as an input and give me the path to it". What if we unified the two concepts, such that @artifact is just a short-hand for {tree: {.: artifact}} (or {tree: artifact})?

One difficulty here is that now you can't really append strings to the {tree: ...} syntax, e.g. to represent globs. For instance, you might do something like: @binaries/*-linux to refer to the linux binary, but you can't really do {tree: binaries}/*-linux without some hackery (especially if the tree: {...} spans multiple lines, like with the initial context: example)

Or maybe, what if we supported configuring run.inputs: [...] - we just provided a short-hand syntax (e.g. @foo) for avoiding needing to repeat artifact names in both params and inputs (where possible).

Consider the following run step:

run: build type: go params: module: @my-repo package: ./cmd/my-cmd

This could be a short-hand for the following equivalent run step:

run: build type: go inputs: - name: my-repo params: module: my-repo package: ./cmd/my-cmd

If you need a more complex setup, e.g. inputs mounted at specific paths, you could still do that explicitly:

run: build type: oci-image params: context: @concourse inputs: - name: dumb-init path: concourse/dumb-init

or:

run: build type: oci-image params: context: concourse inputs: - name: concourse - name: dumb-init path: concourse/dumb-init

or even:

run: build type: oci-image params: context: . inputs: - name: concourse path: . - name: dumb-init

One question with the @ syntax is how we can differentiate it from normal uses of the "@" sign (e.g. to signify a slack handle/twitter username). There are a couple options I can see:

Provide a syntax to escape the "@" sign (e.g. handle: \@username_here or handle: @@username_here). However, this adds a bit of overhead when you have to work with "@" (which admittedly is likely to be pretty rare, I would guess)

Disambiguate the @ syntax. For instance, maybe you need to wrap the artifact name in {...}, e.g. context: @{concourse}. Given that this is the common case, though, it could be a bit annoying - but should have less mental overhead

I realize that if you wanted to use the literal string @{something}, you'd still need some escape mechanism - but that seems like a pretty unlikely case

Wrapping the artifact name in {...} also may allow us to extend the syntax. For instance, currently the @artifact syntax only lets you mount artifacts at a path identical to their name, and if you want a different path you need to use inputs: [...]. However, we could possibly introduce some syntax to configure the mount path as well, e.g.:

run: build type: oci-image params: context: @{concourse:.} # resolves to ".", and mounts concourse at "."

...which would be equivalent to:

run: build type: oci-image inputs: - name: concourse path: . params: context: .

I just realized that the proposed @ syntax isn't very practical, since @ is a reserved character in YAML. It's still possible to use, but requires quoting the value, which is pretty annoying.

Another possible syntax is using < to mean input, since it points "inward" toward the params e.g.:

run: build type: oci-image params: context: <{concourse} # feels like injecting `concourse` into the context

To build on the idea of having a short-hand syntax for specifying inputs, what if we did the same for specifying outputs? Currently, all the proposed approaches have the prototype reporting the outputs. What if instead we made it explicit in the build plan? For instance, we could use another character to denote outputs, e.g.:

run: build type: oci-image params: context: <{concourse:.} # `concourse` is an input mounted to `.` targets: ={builder-image}: builder # `builder-image` is an output ={final-image}: app # `final-image` is an output

...which is equivalent to:

run: build type: oci-image params: context: . targets: builder-image: builder final-image: app inputs: - name: concourse path: . outputs: - name: builder-image - name: final-image

Here, = is used to denote "the name (and optionally path) following is an output of the step". The thinking is that = denotes assignment to the build scope. I wanted to use something like > to parallel < for inputs, but unfortunately that's a control flow character, so it's not possible to use without quoting. Plus, it could be easy to mix < and > up.

The example in the motivation for alternative approaches could be written as:

run: build type: go params: packages: ={cmd1-binary}: <{repo1}/cmd/cmd1 ={cmd2-binary}: <{repo1}/cmd/cmd2 ={cmd3-binary}: <{repo2}/cmd/cmd3

One other thought - we don't have to try to come up with symbols to denote inputs and outputs, and could make it more verbose, e.g.:

run: build type: go params: packages: output{cmd1-binary}: input{repo1}/cmd/cmd1 output{cmd2-binary}: input{repo1}/cmd/cmd2 output{cmd3-binary}: input{repo2}/cmd/cmd3

Ah, that's a bummer about the @ syntax. I guess one alternative to consider could be ${foo}, although it kind of carries a 'regular old var syntax' connotation, not artifacts. TBH I'm not a huge fan of the proposed syntaxes in the last few comments; < made me think of the > YAML multiline syntax and the rest don't feel like they suggest inputs/outputs/interpolation to me. (TBH I'm not sure why @ did either - maybe from my background in Ruby. 😆)

I like the idea behind @{foo:mount-path} - though one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params. For example, if you're setting up a directory tree to docker build you may be tempted to pass other artifacts as bogus values just to mount them beneath the context dir:

params: context: @{concourse:context} blah: @{dumb-init:context/dumb-init}

Which of course wouldn't work if the prototype validates its fields. In this case the more explicit inputs: form would probably be favorable.

Just noting, not arguing in favor of it, but the tree: approach from #103 (comment) avoids this issue by representing the root path and its subpaths as one nested value under params:, with only the root path being ultimately passed to the prototype, and the nested values just used for setting up mounts beneath it.

Re: #103 (comment) - explicitly declaring outputs is closer to how the prototype RFC is now, just without special syntax, and it only has a list of output names. (ref.)

Here's a thought: what if instead of having params: implicitly fill in inputs:, we went the other way around and had inputs: fill in params:?

In this mockup, I've added artifact: to the input config which is the artifact that will be passed along, and param is a field to set under params:. This way you can configure artifact mounts however you like, and you don't have to repeat yourself in params:.

run: build type: oci-image inputs: - param: context artifact: concourse path: . - artifact: dumb-init path: ./dumb-init

With this example the prototype would receive {"context": "."}.

One downside of this approach is that it works best with simple named params, not arbitrary data structures such as arrays or maps of artifacts. I guess we could support path-style things like params: foo.bar but it'd get pretty awkward with arrays; foo[0], foo[1] would be annoying to maintain. But maybe we just don't care about either case and we ain't gonna need it.

Not sure how this approach could be extended to outputs. I'll leave this comment for now and see if anything springs to mind. :)

inputs filling in params is an interesting approach, but as you mention, it does feel a little awkward for certain cases. For instance, suppose you wanted to specify a list of globs, e.g. for creating a GitHub release. Using some syntax for embedding inputs in the params (I'll go with the more explicit syntax in #103 (comment)), it feels pretty natural:

put: gh-release # could also be a `run:` step params: globs: [input{concourse-binaries}/*, input{fly-binaries}/*, input{license}/*]

...whereas with inputs[].param, it's not so easy - in addition to the array indexing issue, how do you add the /* suffix? Would you ever want to add a prefix?

one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params

Yeah, that's true - the thinking was that you'd have to use run.inputs for cases where it doesn't make sense to embed the inputs directly in params. Something like the tree syntax could solve that, but there are a few things I'm not sure about there:

Would you ever need one of these "sibling" inputs to be up a directory from the root of the tree? e.g. you have a go module which has a replace directive to a sibling directory (e.g. guardian does this - https://github.com/cloudfoundry/guardian/blob/2f945c09a983e4/go.mod#L60-L63). If you want to compile guardian, you'd want params.module to point to the mount path of guardian, but then you also want to mount "implicit" inputs (garden, etc.) up a directory. Do we want to allow going up a directory in tree? e.g.:

run: build type: go params: module: tree: .: guardian ../garden: garden

Maybe tree isn't the best name in this case, given that it would be able to modify mounts outside of the directory tree that it returns?

Is there a syntax we can use to add suffixes (e.g. globs) to the tree result? If it's just defined as a YAML object, it's not obvious how you could append a string

alternative input/output approaches for prototypes

82dc7e4

Signed-off-by: Aidan Oldershaw <[email protected]>

aoldershaw mentioned this pull request May 4, 2021

Figure out best approach for providing inputs/outputs to prototypes concourse/concourse#6980

Closed

taylorsilva assigned vito May 6, 2021

aoldershaw commented May 10, 2021

View reviewed changes

vito mentioned this pull request May 10, 2021

Prototypes concourse/concourse#6905

Open

aoldershaw changed the title ~~prototypes rev 1.1: propose alternative input/output approaches~~ Prototypes rev 1.1: inputs/outputs May 13, 2021

chenbh mentioned this pull request May 19, 2021

Building task with output named anything but "image" fails at exporting the built tarball. concourse/oci-build-task#56

Closed

This was referenced May 21, 2021

Implement the run: step concourse/concourse#7061

Open

Implement skeleton for Prototypes concourse/concourse#7065

Merged

aoldershaw mentioned this pull request Jul 6, 2021

Spatial Automation (across step) concourse/concourse#5656

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototypes rev 1.1: inputs/outputs #103

Prototypes rev 1.1: inputs/outputs #103

aoldershaw commented May 4, 2021

aoldershaw May 10, 2021

aoldershaw May 10, 2021

aoldershaw May 11, 2021 •

edited

Loading

aoldershaw May 13, 2021 •

edited

Loading

aoldershaw May 18, 2021 •

edited

Loading

aoldershaw May 23, 2021

aoldershaw May 23, 2021 •

edited

Loading

aoldershaw May 23, 2021 •

edited

Loading

vito May 26, 2021

aoldershaw May 27, 2021 •

edited

Loading

Prototypes rev 1.1: inputs/outputs #103

Are you sure you want to change the base?

Prototypes rev 1.1: inputs/outputs #103

Conversation

aoldershaw commented May 4, 2021

aoldershaw May 10, 2021

Choose a reason for hiding this comment

aoldershaw May 10, 2021

Choose a reason for hiding this comment

aoldershaw May 11, 2021 • edited Loading

Choose a reason for hiding this comment

aoldershaw May 13, 2021 • edited Loading

Choose a reason for hiding this comment

aoldershaw May 18, 2021 • edited Loading

Choose a reason for hiding this comment

aoldershaw May 23, 2021

Choose a reason for hiding this comment

aoldershaw May 23, 2021 • edited Loading

Choose a reason for hiding this comment

aoldershaw May 23, 2021 • edited Loading

Choose a reason for hiding this comment

vito May 26, 2021

Choose a reason for hiding this comment

aoldershaw May 27, 2021 • edited Loading

Choose a reason for hiding this comment

aoldershaw May 11, 2021 •

edited

Loading

aoldershaw May 13, 2021 •

edited

Loading

aoldershaw May 18, 2021 •

edited

Loading

aoldershaw May 23, 2021 •

edited

Loading

aoldershaw May 23, 2021 •

edited

Loading

aoldershaw May 27, 2021 •

edited

Loading