Nix+Bazel = ❤️

2024/04/20

Summary

You, too, can have a fully hermetic, reproducible, and ephemeral bazel build.

Intro

While looking for non-invasive and reproducible ways to create my dev environment, I considered bazel. I wrote up some of the results of this search.

I like bazel because not only will it build my project reproducibly, but will also set up the development environment for doing so beforehand. It means that I only need to have bazel installed on the local machine, and clone the repository I need, and I have all that is required to build and test the cloned code. This is in big contrast to the more “conventional” approaches to building, where you are left to work out how to pick and install the correct versions of the dependencies your code needs to build.

This is great for build and environment reproducibility, and makes it largely irrelevant what physical machine your build is executing on. Which, in turn, is what I am after: a completely fungible dev setup, which can be brought up at will, and destroyed, and re-instantiated somewhere else in a matter of minutes.

This is the hypothesis, at least. Also, not everyone agrees with this take.

The problem

In practice, a difficulty pops up, making it onerous to bring that state about.

To build something with bazel, the code and all its dependencies need to be buildable with bazel. This was for the longest of times a big showstopper. Most of the open source world does not care much about bazel, and uses some of the more “conventional” build systems, such as make, and cmake. Some (mostly via cmake) will support ninja as well. This means, in order to have a dependency brought into bazel, someone has to do the legwork and replace the build system. This is (a) nontrivial, and (b) is distracting to the original purpose of your work. Presumably you want to work on the original project, not to support a bespoke build setup for all its dependencies. This for me was especially onerous for python dependencies with deep dependency trees.

And even if you decide to write the rules up, it is still onerous to set up all the dependencies so that they are downloaded and available. You either need to teach bazel how to do this – which is time consuming; or you must somehow rely on pre-installed binaries – which is onerous and not reproducible. Again, python dependencies take the cake here. I always dread having to install python tooling, as that’s the surest way to end up in dependency hell. While there are ways to work around this, such as the various python environment installers, it was always a frustrating experience.

First try: dockerize all the things

As a way around it, I devised a way to package a bazel build step into a custom docker container. This served my purpose for a while, and allowed me to use some quite onerous software in bazel builds, although I can not show you that bit, as that setup is not publicized just yet.

Note that bazel has experimental sandboxing based on docker, but that is a different thing altogether. My docker based setup can be used to run a build step in a container that comes pre-packaged with the build tools you need. Other approaches are usually based on running the entire build in a container, which is also something one can do. But it again requires docker.

But while using docker solves some problems, it creates others. Installing docker is onerous in its own right, and running docker containers as the root user does not sound like the best way to go about running things. In some restrictive environments you might be prevented from doing so outright, or might be required to run bazel with sudo. For a build setup, all of that seems excessive, and constrains what you can do.

I have also tried to replace docker with podman, which would ostensibly do the same job as docker but without requiring root privileges. But, many confusing errors later, I came across a confirmation that podman fights with bazel sandboxing and is unlikely to work ever.

What now?

I remain firmly in team bazel.

With all the above, why is bazel still a good value proposition to me? It is because of its uniform setup, and shareable build rules. If you are able to offload the build techniques you use to a set of build rules; or, rather, if you can find the build rules already maintained, you will have one big concern lifted off your shoulders.

Another concern remains, which is providing the tools that the build needs. By default, bazel does not have a strict sandbox, and can use tools that are preinstalled on your machine. This is great for one-offs, but is a problem for build reproducibility.

It is time to go back and take a peek into the toolbox.

Nix

At this point the nix package manager comes in.

Nix is a package manager that uses declarative rules to describe the set of packages that it installs. This controls the packages down to the very specific version, and guarantees that repeated installations through Nix, using the same declaration, end up installing exactly the same binaries. If you are unfamiliar with how this works, please read the excellent documentation at https://nixos.org.

A natural idea comes in, to combine the imperfect sandboxing bazel with the perfectly hermetic nix installation and reap the benefits. This is what the kind folks at tweag.io have done. You can see the results of that work on their nix-bazel website. They wrote a set of bazel build rules, called rules_nixpkgs, which brings in arbitrary packages from an existing installation the “nixpkgs” repository of all software buildable with nix (currently seemingly the largest repository in the world by several times over).

Problem solved?

Not quite. If it were, then this whole article would have been unnecessary. Note that the rules use an existing nix installation. This means that you already have to use nix in some form in order to use this approach. And as many machines and VMs don’t come with Nix preinstalled, this becomes a yet another hurdle in our quest for hermetic, reproducible and ephemeral builds.

Second try: ephemeral Nix

I figured that if I could replace my dockerized build environment with something based on an ephemeral nix installation (i.e. no a system wide one, but one tied to a specific bazel workspace, or at least instance), then I would attain the holy grail of build setups: hermeticity, reproducibility and ephemerality.

A wrinkle in that plan is that native Nix is not created to support ephemeral setups. In my view, this is not explained well on the Nix website. It also leads to confusing discussions such as this one on stack overflow, where it isn’t quite clear what a “different location” is, only for the participants to figure out after many paragraphs written, that the answer to the question is some flavor of “no”.

After some exploration, I found out that there are ways to install nix ephemerally. In fact, there are multiple such approaches as evidenced by this article on the NixOS wiki. All approaches create some isolated environment where an arbitrary user directory is mounted to /nix, which then tricks the Nix installation into installing Nix ephemerally. Two classic workhorses for that are nix-user-chroot, and proot. I tried both, but quickly discovered that they each have untenable requirements.

nix-user-chroot requires a privilege to mount a directory, which is not available in all environments, and especially breaks in containerized environments, such as in GitHub Actions. This makes the use of nix-user-chroot not viable for regular open source work, where your builds are expected to cooperate with containerized environments.

proot came with its own set of issues. proot works by intercepting the syscalls that a child binary makes, and rewriting them to make the child binary believe it is seeing a different filesystem than it is actually running on. proot, however, has a deep dependency tree, making it onerous to produce a statically linked instance, which is what you need if you want your setups to be portable. It, also, had issues cooperating with bazel specifically and would refuse to terminate when bazel forked its server. This is because proot waits for the entire process group to terminate before terminating, and a detached daemon just sits there. A way to resolve this would be to instruct the bazel server not to stay around, but that makes each additional build invocation last much longer than it should. Which, in turn, made that a nonstarter.

nix-portable to the rescue

At this point, I was about to give up on the quest for the time being. But then, and almost completely by accident, I came across nix-portable. It was easy to try out. It seemed to work out of the box, it did not fight with bazel when used in tandem, and did not require any special privileges to run. A perfect combination! It even had some configuration knobs to handle different runtimes, which made its reuse in the bazel setup extra convenient.

Tying nix-portable into a bazel build

This is all well and good. But, now that we have nix-portable, how do we tie it into a bazel build?

I accidentally found out that when bazel is invoked, it will look for a script at //tools/bazel. If one exists, it will transfer execution to it. From here, the approach was more or less obvious: write a wrapper for bazel invocation, which will call nix-portable to set up the correct environment first, then delegate the build work to the original bazel.

This is exactly what I then did, which in the end resulted in the repository at https://github.com/filmil/bazel_local_nix. An example is at https://github.com/filmil/bazel_local_nix/blob/main/integration/README.md. It took a few tries to find the correct division of responsibilities to ensure easy installation and integrity checks for all the dependencies used. The nice folks at tweag.io guided me to find that best setup. I think that in its current form the repository above answers the feature request in rules_nixpkgs about ephemeral nix installation.

So, is it now done?

Well, the bulk of the work is done indeed. But there is still one more loose end to tie up.

You can install the above repository into your own bazel build today, and with the correct but very limited additional setup, you too can have a hermetic, reproducible, ephemeral bazel build. Such an example repo is here and you can test it today.

An issue remains, which is that all binaries built this way now depend on shared libraries and the ELF interpreter from an ephemeral nix installation. This means that your C++ binary compiled this way will not be executable on other machines. This, as you may surmise, is suboptimal. Luckily, the kind folks at tweag.io worked out a tool called clodl, and a set of bazel rules which compute a closure of all libraries your newly-built binary depends on, and makes a convenient package out of all the needed dependencies. I proposed a set of changes that make clodl’s work more resilient to system variations, and which is known to work with my hermetic, ephemeral, and reproducible build setup.

I was also able to make a drop-in addition to their existing repository, to make it work with my nix+bazel setup.

What remains?

At this point, the list of prerequisites that remain are as follows:

  1. You need to have an installation of bazel. Since major bazel versions are quite different from each other, and projects often rely on using a specific major version, instead of installing bazel directly, I recommend installing bazelisk instead, and making it available in your $PATH under the name bazel. If you do that, bazelisk will handle the bazel versioning for you, and will do so correctly. It will not otherwise affect the operation of the bazel binary.

  2. You need to clone your code correctly. This means, for example, using --recurse-submodules where needed. I think this is not controversial.

And… that’s it. I find it unlikely that there could be an approach without these two minimal steps.

For a hermetic, ephemeral, reproducible bazel build, the only next thing needed is to issue the command bazel build //…. All the rest of the build environment will be set up by bazel for you, automatically.

So long as you have an Internet connection, that is.

Conclusion

If you read this far, you have a fairly good idea of how the hermetic, ephemeral, reproducible bazel builds work. I like how it is simply reusing the existing tools in a new way.

This is the only such openly available setup that I am aware of at the time I wrote this. Companies such as Google have such setups built out internally, but none of those have been published in the open, nor are available for reuse.

I hope this new option advances the state of the art of the build environments, and that someone finds it helpful.