Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototypes rev 1.1: inputs/outputs #103

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

aoldershaw
Copy link

Comment on lines +810 to +813
Since prototypes can emit multiple response objects, this also means you can
have *streams* of outputs sharing a name that are identified by some other
field(s). e.g. the above artifact could be identified as
`some_output{some_data: 123}` (or something along those lines). That gives you
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see use cases for emitting streams of outputs (across MessageResponses) - for instance, you have a Go prototype that can compile multiple packages simultaneously, and you want each package to have its own artifact - or you have a code scanner prototype that can scan multiple repos simultaneously.

However, in these cases, it's probably not uncommon to want single standalone outputs as well. For the go build prototype, for instance, you'd probably want to cache the module cache in the GOPATH. Since all packages would be built using the same GOPATH, there's really only a single modcache output - not one per MessageResponse. Forcing the outputs to be defined alongside streams of data make this a bit awkward - should the same artifact be emitted in each of the MessageResponses? Just the first?


Granted, the use cases I mentioned are mainly around performance (not wanting to spin up many containers, and instead shoehorning some of the across step's behaviour into prototypes). tbh don't have any strong use cases in mind for streams of outputs that can't be accomplished by having a stream of run steps instead, at the cost of performance.

Comment on lines +874 to +877
* Can't mount inputs at specific paths - if your prototype requires a specific
filesystem layout, it needs to shuffle the inputs around
* e.g. [oci-build-task] may depend on inputs being at certain paths relative to
the `Dockerfile`
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vito brought up an interesting idea for this - adding a special syntax for defining a directory layout within run.params. For instance, in order to build an OCI image (like the oci-build-task does) with a dependency mounted under the main build context, you could do something like:

run: build
type: oci-image
params:
  context:
    tree:                     # this is a special syntax that says...
      .: @concourse           # ...remount the "concourse" artifact to . (within the tree)
      ./dumb-init: @dumb-init # ...remount the "dumb-init" artifact to ./dumb-init (within the tree)

(p.s. this is just an example syntax)

The prototype would then receive something like:

{
  "object": {
    "context": "/tmp/build/context"
  }
}

...where /tmp/build/context has the following directory structure:

/tmp/build/context
    README.md
    Dockerfile
    atc/
    ... # other concourse files
    dumb-init/
        ... # dumb-init binaries

For comparison, this is how you'd do something similar with the oci-build-task:

task: build
privileged: true
config:
  platform: linux
  image_resource:
    type: registry-image
    source:
      repository: vito/oci-build-task
  inputs:
  - name: concourse
    path: .
  - name: dumb-init
  outputs:
  - name: image
  run:
    path: build

Copy link
Author

@aoldershaw aoldershaw May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this thought - the @ syntax is a way to say "mount this artifact as an input and give me the path to it". {tree: {...}} (or whatever syntax we may come up with) is saying something very similar: "mount this tree of artifacts as an input and give me the path to it". What if we unified the two concepts, such that @artifact is just a short-hand for {tree: {.: artifact}} (or {tree: artifact})?

One difficulty here is that now you can't really append strings to the {tree: ...} syntax, e.g. to represent globs. For instance, you might do something like: @binaries/*-linux to refer to the linux binary, but you can't really do {tree: binaries}/*-linux without some hackery (especially if the tree: {...} spans multiple lines, like with the initial context: example)

Copy link
Author

@aoldershaw aoldershaw May 13, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe, what if we supported configuring run.inputs: [...] - we just provided a short-hand syntax (e.g. @foo) for avoiding needing to repeat artifact names in both params and inputs (where possible).

Consider the following run step:

run: build
type: go
params:
  module: @my-repo
  package: ./cmd/my-cmd

This could be a short-hand for the following equivalent run step:

run: build
type: go
inputs:
- name: my-repo
params:
  module: my-repo
  package: ./cmd/my-cmd

If you need a more complex setup, e.g. inputs mounted at specific paths, you could still do that explicitly:

run: build
type: oci-image
params:
  context: @concourse
inputs:
- name: dumb-init
  path: concourse/dumb-init

or:

run: build
type: oci-image
params:
  context: concourse
inputs:
- name: concourse
- name: dumb-init
  path: concourse/dumb-init

or even:

run: build
type: oci-image
params:
  context: .
inputs:
- name: concourse
  path: .
- name: dumb-init

Copy link
Author

@aoldershaw aoldershaw May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question with the @ syntax is how we can differentiate it from normal uses of the "@" sign (e.g. to signify a slack handle/twitter username). There are a couple options I can see:

  1. Provide a syntax to escape the "@" sign (e.g. handle: \@username_here or handle: @@username_here). However, this adds a bit of overhead when you have to work with "@" (which admittedly is likely to be pretty rare, I would guess)
  2. Disambiguate the @ syntax. For instance, maybe you need to wrap the artifact name in {...}, e.g. context: @{concourse}. Given that this is the common case, though, it could be a bit annoying - but should have less mental overhead
    • I realize that if you wanted to use the literal string @{something}, you'd still need some escape mechanism - but that seems like a pretty unlikely case

Wrapping the artifact name in {...} also may allow us to extend the syntax. For instance, currently the @artifact syntax only lets you mount artifacts at a path identical to their name, and if you want a different path you need to use inputs: [...]. However, we could possibly introduce some syntax to configure the mount path as well, e.g.:

run: build
type: oci-image
params:
  context: @{concourse:.} # resolves to ".", and mounts concourse at "."

...which would be equivalent to:

run: build
type: oci-image
inputs:
- name: concourse
  path: .
params:
  context: .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that the proposed @ syntax isn't very practical, since @ is a reserved character in YAML. It's still possible to use, but requires quoting the value, which is pretty annoying.

Another possible syntax is using < to mean input, since it points "inward" toward the params e.g.:

run: build
type: oci-image
params:
  context: <{concourse} # feels like injecting `concourse` into the context

Copy link
Author

@aoldershaw aoldershaw May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To build on the idea of having a short-hand syntax for specifying inputs, what if we did the same for specifying outputs? Currently, all the proposed approaches have the prototype reporting the outputs. What if instead we made it explicit in the build plan? For instance, we could use another character to denote outputs, e.g.:

run: build
type: oci-image
params:
  context: <{concourse:.} # `concourse` is an input mounted to `.`
  targets:
    ={builder-image}: builder # `builder-image` is an output
    ={final-image}: app # `final-image` is an output

...which is equivalent to:

run: build
type: oci-image
params:
  context: .
  targets:
    builder-image: builder
    final-image: app
inputs:
- name: concourse
  path: .
outputs:
- name: builder-image
- name: final-image

Here, = is used to denote "the name (and optionally path) following is an output of the step". The thinking is that = denotes assignment to the build scope. I wanted to use something like > to parallel < for inputs, but unfortunately that's a control flow character, so it's not possible to use without quoting. Plus, it could be easy to mix < and > up.


The example in the motivation for alternative approaches could be written as:

run: build
type: go
params:
  packages:
    ={cmd1-binary}: <{repo1}/cmd/cmd1
    ={cmd2-binary}: <{repo1}/cmd/cmd2
    ={cmd3-binary}: <{repo2}/cmd/cmd3

Copy link
Author

@aoldershaw aoldershaw May 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thought - we don't have to try to come up with symbols to denote inputs and outputs, and could make it more verbose, e.g.:

run: build
type: go
params:
  packages:
    output{cmd1-binary}: input{repo1}/cmd/cmd1
    output{cmd2-binary}: input{repo1}/cmd/cmd2
    output{cmd3-binary}: input{repo2}/cmd/cmd3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's a bummer about the @ syntax. I guess one alternative to consider could be ${foo}, although it kind of carries a 'regular old var syntax' connotation, not artifacts. TBH I'm not a huge fan of the proposed syntaxes in the last few comments; < made me think of the > YAML multiline syntax and the rest don't feel like they suggest inputs/outputs/interpolation to me. (TBH I'm not sure why @ did either - maybe from my background in Ruby. 😆)

I like the idea behind @{foo:mount-path} - though one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params. For example, if you're setting up a directory tree to docker build you may be tempted to pass other artifacts as bogus values just to mount them beneath the context dir:

params:
  context: @{concourse:context}
  blah: @{dumb-init:context/dumb-init}

Which of course wouldn't work if the prototype validates its fields. In this case the more explicit inputs: form would probably be favorable.

Just noting, not arguing in favor of it, but the tree: approach from #103 (comment) avoids this issue by representing the root path and its subpaths as one nested value under params:, with only the root path being ultimately passed to the prototype, and the nested values just used for setting up mounts beneath it.

Re: #103 (comment) - explicitly declaring outputs is closer to how the prototype RFC is now, just without special syntax, and it only has a list of output names. (ref.)


Here's a thought: what if instead of having params: implicitly fill in inputs:, we went the other way around and had inputs: fill in params:?

In this mockup, I've added artifact: to the input config which is the artifact that will be passed along, and param is a field to set under params:. This way you can configure artifact mounts however you like, and you don't have to repeat yourself in params:.

run: build
type: oci-image
inputs:
- param: context
  artifact: concourse
  path: .
- artifact: dumb-init
  path: ./dumb-init

With this example the prototype would receive {"context": "."}.

One downside of this approach is that it works best with simple named params, not arbitrary data structures such as arrays or maps of artifacts. I guess we could support path-style things like params: foo.bar but it'd get pretty awkward with arrays; foo[0], foo[1] would be annoying to maintain. But maybe we just don't care about either case and we ain't gonna need it.

Not sure how this approach could be extended to outputs. I'll leave this comment for now and see if anything springs to mind. :)

Copy link
Author

@aoldershaw aoldershaw May 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inputs filling in params is an interesting approach, but as you mention, it does feel a little awkward for certain cases. For instance, suppose you wanted to specify a list of globs, e.g. for creating a GitHub release. Using some syntax for embedding inputs in the params (I'll go with the more explicit syntax in #103 (comment)), it feels pretty natural:

put: gh-release # could also be a `run:` step
params:
  globs: [input{concourse-binaries}/*, input{fly-binaries}/*, input{license}/*]

...whereas with inputs[].param, it's not so easy - in addition to the array indexing issue, how do you add the /* suffix? Would you ever want to add a prefix?

one downside I see is that in situations where you're trying to set up a nested tree, you night not actually have a valid place to pass the value for setting up the nested paths in params

Yeah, that's true - the thinking was that you'd have to use run.inputs for cases where it doesn't make sense to embed the inputs directly in params. Something like the tree syntax could solve that, but there are a few things I'm not sure about there:

  • Would you ever need one of these "sibling" inputs to be up a directory from the root of the tree? e.g. you have a go module which has a replace directive to a sibling directory (e.g. guardian does this - https://github.com/cloudfoundry/guardian/blob/2f945c09a983e4/go.mod#L60-L63). If you want to compile guardian, you'd want params.module to point to the mount path of guardian, but then you also want to mount "implicit" inputs (garden, etc.) up a directory. Do we want to allow going up a directory in tree? e.g.:
run: build
type: go
params:
  module:
    tree:
      .: guardian
      ../garden: garden

Maybe tree isn't the best name in this case, given that it would be able to modify mounts outside of the directory tree that it returns?

  • Is there a syntax we can use to add suffixes (e.g. globs) to the tree result? If it's just defined as a YAML object, it's not obvious how you could append a string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants