Nuxi The CloudABI Development Blog

CloudABI for Fedora and openSUSE: reproducible builds

May 7, 2016 by Ed Schouten

Instead of publishing the last article of our series on how CloudABI works on Mac OS X, I decided to use this week’s blog post to announce the availability of CloudABI for Fedora and openSUSE! In today’s article I will be discussing a single aspect of these ports, namely how we generate packages (RPMs) for these systems deterministically.

What are deterministic packages and why bother?

The CloudABI Ports Collection requires that the build process for its packages is fully deterministic. In a nutshell, this means that if we don’t make changes to any of the package build rules in our repository, the generated packages should remain the same as well. Though this sounds fairly easy to achieve in theory, you typically see that lots of packages depend on one of the following sources of non-determinism:

You nowadays see that there is a large effort across various Open Source projects to come up with tooling to make builds reproducible. For CloudABI, we care about achieving this goal as well, for two reasons:

Where we seem to take things a bit further than most of the distributions, is that we expect that the generated packages are byte for byte reproducible. There should be no need to use 300+ line shell scripts or specialized utilities to compare the packages for equality. A binary file comparison should be sufficient.

Fully deterministic RPMs

When I started experimenting with exporting packages from CloudABI Ports to RPMs, I noticed that rpmbuild doesn’t seem to be capable of creating RPMs in a reproducible way. First of all, the headers of the RPMs it generates contain entries like RPMTAG_BUILDTIME and RPMTAG_BUILDHOST, whose values cannot be overridden. Second, the cpio payload holding the contents of the files that need to be installed also store various file properties that cannot be reproduced easily, such as inode numbers, filesystem IDs and file modification times.

To prevent this from happening, we now generate RPMs without using rpmbuild. This is slighty harder to realize for RPMs than for some of the other formats we support, as RPMs use a home-grown binary format for encoding their headers. We wrote a simple Python module that can be used to construct these headers more easily, which we then use to generate a header that contains exactly those entries that we need.

To generate cpio archive payloads that are fully deterministic, we make use of bsdtar’s support for mtree files. Mtrees allow us to list all of the files that should be archived, exactly specifying which file attributes they should have. It is important that the cpio archive that is created is of the newc format, as this seems to be the only style of cpios supported by the RPM utilities. The POSIX cpio format cannot be used.

Package signing

Traditionally, RPM has always used per-package cryptographic signatures, as opposed to apt-get, which guarantees consistency by signing off the repository index files. RPM’s way of signing packages is a bit problematic for us, again from a reproducibility standpoint, as it embeds PGP signatures in the RPM files directly. This would mean that without handing out our private signing key, there would be no way for our users to exactly reproduce our packages.

It turns out that we’re in luck, as both Fedora’s DNF and openSUSE’s ZYpp nowadays do provide support for signing repository metadata, using the repo_gpgcheck configuration option. For CloudABI Ports we’ve decided to make use of this feature, meaning that the signature header of our RPMs don’t contain a PGP signature. Instead, we sign off the repomd.xml file after calling createrepo.

Closing words

Now that we have basic support for using CloudABI on Fedora and openSUSE, we’d like to invite you to give it a try. Be sure to send an email to the mailing list to report your success, or to let us know what we can do to improve the documentation. As you can see, there are still things that could be streamlined, such as having proper packages for both the toolchain and the utilities. Feel free to get in touch in case you’re interested in working on this. We’re always looking for more volunteers!