Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Env helper layers #598

Closed
wants to merge 3 commits into from
Closed

Env helper layers #598

wants to merge 3 commits into from

Conversation

schneems
Copy link
Contributor

@schneems schneems commented Jul 16, 2023

When a developer needs to set environment variables only, they have to make a layer struct and pass it to handle_layer. Instead of forcing them to write a boilerplate struct, this commit provides two utility structs:

  • DefaultEnvLayer - Used to set default environment variables
  • ConfigureEnvLayer - Can be used with LayerEnv::new() to configure any environment variables (versus DefaultEnvLayer only creates default environment variables).

An alternative to providing structs could be to provide functions that accept a context and layer name in addition to the required env data.

GUS-W-13815854.

@schneems schneems force-pushed the schneems/env-apis branch 3 times, most recently from 1d1127b to 70f4366 Compare July 16, 2023 21:45
@schneems schneems marked this pull request as ready for review July 16, 2023 22:01
@schneems schneems requested a review from a team as a code owner July 16, 2023 22:01
@schneems schneems force-pushed the schneems/env-apis branch 2 times, most recently from 4e317c3 to 36632a1 Compare July 17, 2023 16:22
@colincasey
Copy link
Contributor

@schneems i like this idea of reducing layer boilerplate for common operations like settings environment variables.

my one complaint is that there are two different layer types (DefaultEnvLayer and ConfigureEnvLayer) required to do roughly the same operation. have you considered some alternatives such as:

  • using a single layer that accepts a list of variants that represent the default vs. advanced env variable definitions?
enum EnvVar {
  Basic(String, String) // key, val
  Full(Scope, ModificationBehavior, String, String)
}

env_layer::create(&context, layer_name!("my_env"), [
  EnvVar::Basic("JRUBY_OPTS", "-Xcompile.invokedynamic=false"),
  EnvVar::Full(Scope::All, ModificationBehavior::Override, "BUNDLE_CLEAN", "1"),
])?;
  • using a builder to allow both default and full configuration values to be supplied?
env_layer::builder()
  .set("JRUBY_OPTS", "-Xcompile.invokedynamic=false")
  .set_with_options("BUNDLE_CLEAN", "1", Scope::All, ModificationBehavior::Override)
  .build()?;

apart from that, i think this is a useful addition.

@schneems
Copy link
Contributor Author

my one complaint is that there are two different layer types (DefaultEnvLayer and ConfigureEnvLayer) required to do roughly the same operation. have you considered some alternatives such as:

My specific need was to set default env vars, but I didn't want to make N different types of ways to handle all the variants (3xScope multiplied by 5xbehavior). I was trying to re-use as much as possible and since LayerEnv::new().chainable_insert() is already a well-established interface that seems to meet everyone's needs, leverage that existing struct. It lets you build an env-only layer that can do everything a hand rolled env-only layer can do, while the default version is optimized for one use case. It's a bit of what I call the "have your cake and eat it too" API pattern.

I would state that my minimum goals are:

  • Introduce a way to set envs without having to define a struct
  • Make the most common operation have an easier interface. For me, this is setting default envs

To that end I've considered the builder (sketch):

let layer = env_layer::builder(
      &context,
      layer_name!("configure_env")
    )
    .default_all(
        "JRUBY_OPTS", 
        "-Xcompile.invokedynamic=false"
    )
    .set_with_options(
        "BUNDLE_CLEAN", 
        "1", 
        Scope::All, ModificationBehavior::Override
    )
    .build()?;

Versus this proposal:

let layer = env_layer::set_envs(
    &context,
    layer_name!("configure_env"),
      LayerEnv::new()
          .chainable_insert(
              Scope::All,
              ModificationBehavior::Default,
              "JRUBY_OPTS",
              "-Xcompile.invokedynamic=false",
          )
          .chainable_insert(
              Scope::All,
              ModificationBehavior::Override,
              "BUNDLE_CLEAN",
              "1",
        )
    )?;

I feel like they're both in the same ballpark. I'm also open to updating my opinion based on use some real world use cases from other existing buildpacks. Maybe there are other operations that are common and itchy enough that they're worth scratching.

One note is that we've got similarly named modules with libcnb::layer_env and libherokubuildpack::env_layer.

@schneems schneems force-pushed the schneems/env-apis branch from 36632a1 to fced2f0 Compare July 24, 2023 19:50
@Malax Malax added libherokubuildpack enhancement New feature or request labels Jul 25, 2023
Copy link
Member

@Malax Malax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not opposed to the feature, it can be useful and is a good first example of a generic layer that is not buildpack specific.

However, the implementation introduces extra APIs for environment variables I'd like to avoid. The way layer environment variables work in CNBs is not immediately obvious to start with and therefore the APIs of libcnb in this area inherit this.

Couldn't we just have an SetEnvironmentLayer struct that only has a LayerEnv field that can be populated using the already existing API? That layer can then be used via context.handle_layer as any other layer, removing the set_envs function which has an IMO misleading name and is really just a handle_layer with extra steps.

I'm not sold on the DefaultEnvLayer, which introduces a lot of code to avoid specifying Scope and ModificationBehavior when calling chainable_insert. In fact, it removes the ability to scope the environment variables. I think this case can be handled via SetEnvironmentLayer.

Bottom line: I think having SetEnvironmentLayer without any extra convenience to build LayerEnv is the way to go for this feature. If we think LayerEnv is too tedious to use we should focus our efforts there so that the improvements are available for everyone, not just users of this layer.

Happy to discuss and/or pair if you want to. :)

///# }
///
///# fn build(&self, context: BuildContext<Self>) -> libcnb::Result<BuildResult, Self::Error> {
/// // Don't forget to apply context.platform.env() too;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads as if you'd always need to do that - which is not the case. If it were, libcnb should do that automatically. Can we remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aiming for a higher level abstraction with this API. For someone who wants just to do one or two things and have it mostly do the right thing. I'm worried that someone like that might not understand the finer points of the difference between the different environments and when they're applied or not and just copy and paste example code.

I think most people would want the user's env vars which is confusingly labeled/buried as platform env. One way to solve this documentation issue could be to provide a simple interface for applying platform env. Maybe something like:

let env = Env::from_current()
    .with_user_vars(context.platform.env())

@schneems
Copy link
Contributor Author

schneems commented Aug 1, 2023

I appreciate the 👍 on SetEnvironmentLayer.

I'm interested in pushing the case on the DefaultEnvLayer and the functions. Partially because I feel I want them. Partially because if they don't make the cut, I want to get a better sense of what it takes to get over the line of providing a helpful abstraction.

Couldn't we just have an SetEnvironmentLayer struct

Yes. That was my original implementation. I added the function when I made the PR.

My thought was that if someone just ™️ wanted to set an environment variable that handle_layer isn't terribly helpful in describing what it does. In short, I think that needing to know handle_layer is an implementation detail, as is the fact that it needs a struct to operate on.

Aspirationally: I want to provide two levels of interfaces:

  • Total flexibility to do anything reasonably expected (Turing tarpit)
  • Minimal complexity (opinionated interfaces for the most common tasks)

SetEnvironmentLayer gives total flexibility versus DefaultEnvLayer paves over complexity to optimize for an opinionated use case. The SetEnvironmentLayer requires you to know and understand the LayerEnv interface, which requires you to think about scopes and all the various different types of possible modifications. There are also some foot guns there, like pretending to PATH but not setting a delimiter.

For the "minimal complexity" case, I'm imagining someone wanting to build a buildpack that only sets the environment variables. If they do that, what are they likely to need? How can we give them what they need while minimizing the number of things they need to learn or be exposed to?

The downside of adding the functions is that they increase the API surface area a developer might have to comb through to get what they want. The upside is that if they want to do one thing and do it succinctly and directly, then it's right there.

@edmorley
Copy link
Member

For the "minimal complexity" case, I'm imagining someone wanting to build a buildpack that only sets the environment variables. If they do that, what are they likely to need? How can we give them what they need while minimizing the number of things they need to learn or be exposed to?

It seems that:

  1. Writing a buildpack that only sets env vars might be a fairly uncommon use-case
  2. For the times when someone needs to do that - if they really want something minimal, then it's perhaps more likely that they'd write a quite bash buildpack instead of creating a libcnb.rs buildpack (and all the associated complexity, such as needing a buildpack compile/packaging step etc)

I'm not saying we shouldn't try and make libcnb.rs both as easy to use as possible and with as little boilerplate, but I do think that "absolute beginner buildpack authors who want something minimal and are easily confused" aren't perhaps our primary target audience? ie: I'm not sure we should be trying to hide things like the layer concept just for the sake of trying to avoid them needing to know about it. (If we can come up with a better env interface in general, great, but I think it's fair for us to expect buildpack authors to know how what layers are in the context of the CNB spec.)

@schneems
Copy link
Contributor Author

Coming back to this. In the case of the DefaultEnvLayer I think the problem (if you could call it that, possibly hesitation) is that it's a one-way abstraction. It only helps you in one way. If you choose to use that abstraction now because you are only setting default values, then later, if you want to append to PATH in the same layer, you must revert the abstraction and use another one. Versus if you choose to use SetEnvironmentLayer, then you already have that flexibility as it relies on LayerEnv, which is standard across all layers. Further, if you wanted to add behavior to that layer by modifying something on disk, you would still need to generate a LayerEnv, so while the SetEnvironmentLayer might not be used, transitioning over the logic to build the LayerEnv would be valuable.

Some light heuristics from this observation process:

  • Prefer transparent abstractions (If you need a LayerEnv, ask for a LayerEnv)
  • Prefer a reusable abstraction with 2 (easy) steps, over an unreusable one with 1 (easy) step

I'll remove the DefaultEnvLayer. I feel Manuel's direction earlier that we should improve LayerEnv to make setting values more ergonomic would be the right way to move forward there, though I don't feel that's a strong need at this time.

I see SetEnvironmentLayer as useful now. I'll work on shaping up the PR.

@schneems
Copy link
Contributor Author

While going down the path to make the modifications I agreed to I'm reminded that with a different layer abstraction, this might not be needed.

For example, if there are more abstractions based on LayerResult rather than LayerEnv (pulling from the high level concept of exposing operations on layers as a read/write like heroku/buildpacks-ruby#203) then we possibly introduce variants:

  • ReadWrite (reads cached data, writes anything)
  • WriteOnly (reads nothing, only writes)
  • ReadOnly (reads cached data, doesn't write anything, limited usefulness, maybe wait to implement)

In that scenario the ConfigureLayerEnv struct I'm trying to introduce here would be redundant as a "WriteOnly" interface would accept a LayerResult which could call env() with a LayerEnv built on the spot. The big headache I'm trying to avoid is having to jump through the boilerplate of defining a struct with a Layer impl on it just to write some data. A "WriteOnly" abstraction seems to satisfy that goal and be more flexible (in aligning with prior identified heuristics).

schneems and others added 3 commits January 18, 2024 16:17
When a developer needs to set environment variables only, they have to make a layer struct and pass it to `handle_layer`. Instead of forcing them to write a boilerplate struct, this commit provides two utility structs:

- `DefaultEnvLayer` - Used to set default environment variables
- `ConfigureEnvLayer` - Can be used with `LayerEnv::new()` to configure any environment variables (versus `DefaultEnvLayer` only creates default environment variables).

An alternative to providing structs could be to provide functions that accept a context and layer name in addition to the required env data.
Added functions so people don't have to bypass the `handle_layer` dance alltogether:

- set_default
- set_envs

After writing that I realized that the `env` namespace could cause confusion since there's commonly an `env` variable:

```rust
// This is confusing
let env = ...
env::set_default(...
```

So I changed the module name to `env_layer` so it avoids confusion.

There's some redundancy in `env_layer::DefaultEnvLayer` but I thought it would be more confusing making it `env_layer::Default` (collision with `std::default::Default`).
More rationale given here #598 (comment).

I am also removing `set_envs` as it does little for ergonomics.
///# }
///# }
/// ```
pub struct ConfigureEnvLayer<B: libcnb::Buildpack> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up keeping this (versus a read/write layer concept), we need to align on naming. We already have a LayerEnv which this takes, so I didn't want to call it EnvLayer. We could use Set instead of Configure. The module name is meh as well, I'm open to suggestions.

@schneems
Copy link
Contributor Author

schneems commented Apr 3, 2024

I'm closing this in favor of #805

@schneems schneems closed this Apr 3, 2024
@edmorley edmorley deleted the schneems/env-apis branch April 8, 2024 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request libherokubuildpack
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants