Use statically linked loader #2500

mering · 2024-12-11T22:20:26Z

🚀 feature request

Relevant Rules

py_binary

Description

With a similar motivation as #691, we would like to package a py_binary (including runfiles) into an oci_image and run it within a minimum base image like distroless_base in order to minimize the attack surface. This does not come with a shell and other tools which are required by #1929 so this unfortunately doesn't help us.

Describe the solution you'd like

Use a statically linked executable as loader.

Describe alternatives you've considered

Add more stuff to the base image. This is suboptimal as this does not only increase the size but also the attack surface.

The text was updated successfully, but these errors were encountered:

mering · 2024-12-11T22:24:40Z

I also noticed that Windows already uses a launcher executable:

rules_python/python/private/py_executable_bazel.bzl

Lines 62 to 68 in e823657

    
           "_launcher": attr.label( 
        
               cfg = "target", 
        
               # NOTE: This is an executable, but is only used for Windows. It 
        
               # can't have executable=True because the backing target is an 
        
               # empty target for other platforms. 
        
               default = "//tools/launcher:launcher", 
        
           ),

I wrote such a launcher executable (template) for Linux to replace the stage1 bootloader script:

#include <errno.h>
#include <unistd.h>

#include <cstdlib>
#include <cstring>
#include <filesystem>
#include <iostream>
#include <memory>
#include <string>
#include <vector>

#include "tools/cpp/runfiles/runfiles.h"
using bazel::tools::cpp::runfiles::Runfiles;

std::string find_python_interpreter(Runfiles& runfiles,
                                    const std::string& interpreter_path) {
  if (interpreter_path.length() > 0 && interpreter_path[0] == '/') {
    // An absolute path, i.e. platform runtime
    return interpreter_path;
  } else if (interpreter_path.find('/') != std::string::npos) {
    // A runfiles-relative path
    return runfiles.Rlocation(interpreter_path);
  } else {
    // A plain word, e.g. "python3". Rely on searching PATH
    return interpreter_path;
  }
}

int main(int argc, char** argv) {
  std::string STAGE1_BOOTSTRAP = argv[0];
  std::string STAGE2_BOOTSTRAP = "%stage2_bootstrap%";
  std::string PYTHON_BINARY = "%python_binary%";

  std::string error;
  std::unique_ptr<Runfiles> runfiles(
      Runfiles::Create(STAGE1_BOOTSTRAP, BAZEL_CURRENT_REPOSITORY, &error));
  if (runfiles == nullptr) {
    std::cerr << "ERROR: Could not resolve runfiles root: " << error << std::endl;
    return 1;
  }

  std::string python_exe = find_python_interpreter(*runfiles, PYTHON_BINARY);
  if (!std::filesystem::is_regular_file(python_exe)) {
    std::cerr << "ERROR: Python interpreter not found: $python_exe"
              << std::endl;
    return 1;
  }
  // TODO check if executable to provide better error

  std::string stage2_bootstrap = runfiles->Rlocation(STAGE2_BOOTSTRAP);

  // Don't prepend a potentially unsafe path to sys.path
  // See: https://docs.python.org/3.11/using/cmdline.html#envvar-PYTHONSAFEPATH
  // NOTE: Only works for 3.11+
  // We inherit the value from the outer environment in case the user wants to
  // opt-out of using PYTHONSAFEPATH. To opt-out, they have to set
  // `PYTHONSAFEPATH=` (empty string). This is because Python treats the empty
  // value as false, and any non-empty value as true.
  int result = setenv("PYTHONSAFEPATH", "1", false);
  if (result != 0) {
    std::cerr << "ERROR: Failed to set PYTHONSAFEPATH: " << strerror(errno)
              << std::endl;
  }

  // TODO set RUNFILES_DIR env var to runfiles root
  // Why does runfiles->Rlocation(".") not work?

  // We use `exec` instead of a child process so that signals sent directly
  // (e.g. using `kill`) to this process (the PID seen by the calling process)
  // are received by the Python process. Otherwise, this process receives the
  // signal and would have to manually propagate it. See
  // https://github.com/bazelbuild/rules_python/issues/2043#issuecomment-2215469971
  // for more information.
  std::vector<const char*> args(argv + 1, argv + argc);
  args.insert(args.begin(), {python_exe.c_str(), stage2_bootstrap.c_str()});
  //  const_cast is safe: https://stackoverflow.com/a/19505361
  execvp(args[0], const_cast<char**>(args.data()));
  // If execvp returns, there was an error.
  std::cerr << "Error executing command\n";
  return 1;
}

This template needs to be evaluated to resolve the following variables:

%stage2_bootstrap%
%python_binary%

I tested this with the following BUILD file:

load("@rules_python//python:defs.bzl", "py_binary")
load("@rules_oci//oci:defs.bzl", "oci_image", "oci_load")
load("@rules_pkg//:pkg.bzl", "pkg_tar")
load("@rules_cc//cc:defs.bzl", "cc_binary")

genrule(
    name = "loader_src",
    srcs = ["loader.cc.tmpl"],
    outs = ["loader.cc"],
    # requires `--@rules_python//python/config_settings:bootstrap_impl=script` to create the stage2 bootstrap
    cmd = 'sed -e "s:%stage2_bootstrap%:_main/zz/_zz_stage2_bootstrap.py:" -e "s:%python_binary%:rules_python~~python~python_3_11_x86_64-unknown-linux-gnu/bin/python3:" "$<" > "$@"',
    local = 1,
)

cc_binary(
    name = "loader",
    srcs = ["loader.cc"],
    deps = [
        "@bazel_tools//tools/cpp/runfiles",
    ],
)

py_binary(
    name = "zz",
    srcs = ["zz.py"],
)

pkg_tar(
    name = "zz_layer",
    srcs = [
        "loader",
        ":zz",
    ],
    include_runfiles = True,
    strip_prefix = "/",
)

oci_image(
    name = "zz_image",
    base = "@distroless_base",
    entrypoint = ["/zz/loader"],
    tars = [":zz_layer"],
    workdir = "/",
)

oci_load(
    name = "zz_image.tar",
    image = ":zz_image",
    repo_tags = ["zz/zz_image:latest"],
)

rickeylev · 2024-12-11T23:05:09Z

cc @groodt who I think also liked the idea of a native binary to launch things

groodt · 2024-12-11T23:15:12Z

I made a proposal a while ago, but nothing has really progressed: bazelbuild/proposals#275

I'm supportive of the idea, I'm just concerned about teams having to bring additional toolchains for compiling native launchers. Ideally it would be out of the box with bazel, since I think there are many interpreted languages that could benefit from a native launcher, but that arguably is more challenging to solve than solving it in rules_python.

Now that the rules are fully extracted out of bazelbuild/bazel, I imagine this could be something that is tackled eventually. But it's probably simpler to have a small docker image with python in it, or an approach like the one posted above, which is a neat solution to the problem.

mering · 2024-12-11T23:22:08Z

I'm supportive of the idea, I'm just concerned about teams having to bring additional toolchains for compiling native launchers. Ideally it would be out of the box with bazel, since I think there are many interpreted languages that could benefit from a native launcher, but that arguably is more challenging to solve than solving it in rules_python.

As of #1929 we have the flag --@rules_python//python/config_settings:bootstrap_impl, so my suggestion would be to just add an additional option there and make it optional in the beginning. So the additional toolchain will only be used if requested explicitly. Also we would not need to add zip support in the beginning but could also add it later. If we move the launcher somewhere else later, this is only an implementation detail.

But it's probably simpler to have a small docker image with python in it

The problem is that it requires Python twice, once in the image to bootstrap and then additionally packaged as part of the runfiles. A full Python runtime is not "small".

rickeylev · 2024-12-11T23:44:42Z

re: code: That looks pretty promising!

The main case I think is missing is the zip case. I guess statically link zlib into it (and we don't necessarily have to use zip, could use another format)?

For prototyping this, having the py_executable macro create a cc_binary is probably the easiest thing to do. Wiring it in is probably going to be hacky, but such is a prototype.

For the final code, though, I see two options:
(1) Calling the cc APIs (cc_common et al); I'm not entirely sure on how to do that, but there's enough prior art that we can figure it out. All we really need to do is copy/paste some core part of how cc_binary performs compiling and linking.

(2) Use cc_binary as-is, but modifying it after-the-fact, similar to the windows launcher. If we had a way to modify the contents of a binary (to perform the string replacements necessary), then this seems preferable to (1). The windows launcher does some sort of trick to append a couple extra lines onto the binary, which works, but also seems a bit hacky. Being able to e.g. stick the paths in a special elf section or something seems much more appealing.

Also, this doesn't have to use C++. Anything that produces a native, standalone executable would suffice (e.g rust is all the rage now).

I also noticed that Windows already uses a launcher executable:

Yeah, it does except it's primitive and out of our control, so it's more of a headache than a help for us. I wanted to replace it with a powershell-based thing after bootstrap=script is made the default to reduce the number of ways we bootstrap programs.

have a separate bootstrap_impl value

Yep! That's one of the reasons I made that flag a string instead of boolean :)

groodt · 2024-12-11T23:48:17Z

The problem is that it requires Python twice

There are sneaky things that can be done with the shebang so that it uses the hermetic interpreter, but I'll need to dig it out.

Overall, I agree though. Just noting that there are mechanisms to workaround this at the moment if desperate.

rickeylev · 2024-12-12T00:09:15Z

sneaky things that can be done with the shebang

With the script based bootstrap, you can probably more easily just use a custom stage1 bootstrap. This avoids any issues of trying to fit a program into 1 line that shebang process accepts.

Also, in Marcel's case, that may not work anyways -- he says his image doesn't have any shells available at all.

Hm, actually, I wonder if you could stick a prebuilt binary in as the stage1 bootstrap file. This might make it easier to prototype a native launcher, at the least.

It'll get passed through ctx.actions.expand_template, though, and I'm not sure if that handles binary data. Worst case, some flag somewhere to disable calling expand_template on it.

including python twice

An alternative is to use something like the runtime env toolchain or a platform runtime.

The runtime env toolchain's "interpreter" is simply a shell script that basically does exec python3 $@. You could, alternatively, point it to a prebuilt binary that did the same.

A platform runtime is when a fixed path is used, i.e. setting `py_runtime.interpreter_path = "/usr/bin/python".

aignas · 2024-12-12T00:14:45Z

For completeness, if we want to distribute binaries together with rules_python releases, I think Aspect has a blogpost on something related: https://blog.aspect.build/releasing-bazel-rulesets-rust

They are using rust to create their launcher that builds a venv on the fly (if I remember correctly) but having checked their code it seems that they are still depending on a shell bootstrap: https://github.com/aspect-build/rules_py/blob/main/py/private/run.tmpl.sh.

Since we are already depending on rules_cc, I think using C/C++ could be the way to go. We should not be doing anything fancy to warrant the need of Rust. Zig could be also an option because it is easy to cross-compile, but I am not sure if bazel has good support for that one.

mering · 2024-12-12T00:19:43Z

Also, this doesn't have to use C++. Anything that produces a native, standalone executable would suffice (e.g rust is all the rage now).

While I would have liked to pick another language, I decided for C++ because of the following reasons:

Most likely that the toolchain is already available in a Bazel workspace
Has a decent Bazel runfiles library
Doesn't require a runtime like Go

including python twice

An alternative is to use something like the runtime env toolchain or a platform runtime.

The runtime env toolchain's "interpreter" is simply a shell script that basically does exec python3 $@. You could, alternatively, point it to a prebuilt binary that did the same.

A platform runtime is when a fixed path is used, i.e. setting `py_runtime.interpreter_path = "/usr/bin/python".

We would prefer to just use the runtime as part of the runfiles in order to avoid knowing or figuring out where in the image the runtime is. Also we would like to be independent of the base image and not require others to add Python and configure the paths correctly.

For prototyping this, having the py_executable macro create a cc_binary is probably the easiest thing to do. Wiring it in is probably going to be hacky, but such is a prototype.

For the final code, though, I see two options: (1) Calling the cc APIs (cc_common et al); I'm not entirely sure on how to do that, but there's enough prior art that we can figure it out. All we really need to do is copy/paste some core part of how cc_binary performs compiling and linking.

(2) Use cc_binary as-is, but modifying it after-the-fact, similar to the windows launcher. If we had a way to modify the contents of a binary (to perform the string replacements necessary), then this seems preferable to (1). The windows launcher does some sort of trick to append a couple extra lines onto the binary, which works, but also seems a bit hacky. Being able to e.g. stick the paths in a special elf section or something seems much more appealing.

I only briefly checked the rules and it looks like there is no real macro as part of py_binary() which could be used to instantiate an additional rule. There is create_executable_rule() which returns a rule. All the other macros seem to be called as part of the rule implementation and also cannot just instantiate a cc_binary.
Interesting insights about the Windows launcher modifying the binary.

rickeylev · 2024-12-12T00:33:16Z

Ah, right, there isn't a common macro for both binaries and tests. Each has its own macro that calls its own rule (this isn't for a particular reason, just something that organically happened). python/private/py_binary_macro.bzl and python/private/py_test_macro.bzl are the macros for binaries and tests, respectively. Those would be the spots to modify to introduce additional targets.

mering · 2024-12-12T00:47:21Z

Ah, right, there isn't a common macro for both binaries and tests. Each has its own macro that calls its own rule (this isn't for a particular reason, just something that organically happened). python/private/py_binary_macro.bzl and python/private/py_test_macro.bzl are the macros for binaries and tests, respectively. Those would be the spots to modify to introduce additional targets.

This is what I tried but getting the values for %python_binary% and %stage2_bootstrap% seems quite involved.

rickeylev mentioned this issue Dec 23, 2024

2025 Priorities #2520

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use statically linked loader #2500

Use statically linked loader #2500

mering commented Dec 11, 2024

mering commented Dec 11, 2024 •

edited

Loading

rickeylev commented Dec 11, 2024

groodt commented Dec 11, 2024

mering commented Dec 11, 2024 •

edited

Loading

rickeylev commented Dec 11, 2024

groodt commented Dec 11, 2024

rickeylev commented Dec 12, 2024

aignas commented Dec 12, 2024

mering commented Dec 12, 2024

rickeylev commented Dec 12, 2024

mering commented Dec 12, 2024

Use statically linked loader #2500

Use statically linked loader #2500

Comments

mering commented Dec 11, 2024

🚀 feature request

Relevant Rules

Description

Describe the solution you'd like

Describe alternatives you've considered

mering commented Dec 11, 2024 • edited Loading

rickeylev commented Dec 11, 2024

groodt commented Dec 11, 2024

mering commented Dec 11, 2024 • edited Loading

rickeylev commented Dec 11, 2024

groodt commented Dec 11, 2024

rickeylev commented Dec 12, 2024

aignas commented Dec 12, 2024

mering commented Dec 12, 2024

rickeylev commented Dec 12, 2024

mering commented Dec 12, 2024

mering commented Dec 11, 2024 •

edited

Loading

mering commented Dec 11, 2024 •

edited

Loading