February 18, 2017 by Ed Schouten
Two weeks ago I gave two talks at FOSDEM: one where I gave a general overview of the CloudABI project and one where I discussed how CloudABI works on FreeBSD. Though I think that both talks give a good insight in the project, there are always topics that didn’t make the cut, due to time on stage being limited.
Today we’re going to cover one of such topics, namely how one goes about porting a piece of software to CloudABI. Though there is no universal recipe for this, let’s take a look at the steps I have made to port LevelDB. This will give us technical insight in both the workings of CloudABI and LevelDB.
LevelDB is a high-performance persistent key-value store that is implemented as a C++ library. Due to it being a library, it can easily be embedded into programs. It doesn’t need to run as a separate service. It also doesn’t support any clustering or replication, but it’s important to keep in mind that this can be implemented as a layer on top. Distributed databases like Google’s Bigtable partition rows in a table into so-called tablets, which can be spread out across systems in a cluster.
LevelDB uses a data model similar to Bigtable’s internal tablet format. The data structures that LevelDB uses are often collectively referred to as a log-structured merge-tree. Ilya Grigorik wrote a nice article on his weblog that accurately summarises how they work.
Having LevelDB packaged for CloudABI is nice to have, as it allows our users to experiment with building services that need to keep track of persistent data, while still being very strongly sandboxed.
Okay, enough talking about LevelDB. Let’s actually get it ported over to CloudABI. The first step is to download the source tarball and extract it.
$ wget https://github.com/google/leveldb/archive/v1.19.tar.gz $ tar -xf v1.19.tar.gz $ cd leveldb-1.19
LevelDB uses a simple
Makefile to build its sources. After browsing
through this file, we can determine that we have to invoke
make with a
couple of variables set. First of all, we need to ensure that we’re
using the right build tools (
CC). As CloudABI executables are
always linked statically, we can disable shared library support
SHARED_LIBS). For now, there is also no need to build any command
line utilities (
LevelDB’s build infrastructure depends on a variable specifying for
which operating system to target (
Even with all of the required variables set,
make will fail very early
$ make \ AR=x86_64-unknown-cloudabi-ar CXX=x86_64-unknown-cloudabi-c++ \ SHARED_LIBS= SHARED_PROGRAMS= STATIC_PROGRAMS= \ TARGET_OS=CloudABI Unknown platform!
By searching through the source tree, we can find out that this error
message is generated by
build_detect_platform. We still need to teach
this script what to do when
TARGET_OS=CloudABI. After applying
the compilation process may start, but will still not be able to
successfully compile any source files.
./port/port_posix.h:38:12: fatal error: 'endian.h' file not found #include <endian.h> ^~~~~~~~~~
LevelDB has most of its OS-dependent definitions stored in a header file
port_posix.h. Looking through this source file, we need to make
two adjustments to it. First of all, CloudABI doesn’t provide the
<endian.h>, which is used by LevelDB to
determine the system’s endianness.
In this case we can patch up the code to simply use Clang/GCC’s built-in
Second, by default, LevelDB will make use of several standard I/O
functions that are only provided by the GNU C library, such as
port_posix.h can be used to fall back to their
the build may continue. We can now get to the point where all source
files build, with the exception of
util/env_posix.cc:229:16: error: use of undeclared identifier 'open' int fd = open(dir.c_str(), O_RDONLY); ^
What’s interesting about LevelDB’s design is that any interaction with
the outside world (in this case, just the file system) is done through
an interface, called
Env. LevelDB ships with two implementations of
Env: an in-memory environment used for testing and a POSIX environment
that stores data on disk. A Singleton instance of the POSIX environment
can be obtained by calling
In the case of CloudABI, the POSIX environment fails to build, as it
attempts to make use of functions that use the global file system
namespace, one of the things CloudABI is actively trying to prevent. On
CloudABI, we must access the file system namespace by using the POSIX
*at() system calls (e.g.,
openat()), so that processes can only
access select portions of the namespace.
What distinguishes CloudABI from comparable sandboxing frameworks is
that all of the places that need to be patched up to use
*at() can be
detected by simply building the code. Repetetive builds allowed me to
derive this patch,
which extends the POSIX environment to keep track of a directory file
descriptor in a class member. The POSIX environment is thus no longer a
Singleton object. Multiple environment objects, each potentially using a
different base directory, can now be created using
In addition to this specific change, we have to make two more tweaks to
env_posix.cc for it to build:
util/env_posix.cc:259:16: error: variable has incomplete type 'struct flock' struct flock f; ^
CloudABI doesn’t support any file locks. The reason for this is that due to their semantics (them being per-process, as opposed to per-descriptor), they don’t work well with composition and decomposition of processes (privilege separation). In the case of LevelDB, we can get away with disabling file locking entirely.
util/env_posix.cc:478:61: error: use of undeclared identifier 'geteuid'; did you mean 'gettid'? snprintf(buf, sizeof(buf), "/tmp/leveldbtest-%d", int(geteuid())); ^~~~~~~
As CloudABI doesn’t support any traditional UNIX credentials management,
geteuid() is no longer present. LevelDB uses this function to attempt
to generate a unique directory for testing. As this code doesn’t end up
getting used in our case, we can work around this by
replacing the UID by a constant.
With these patches applied, all of the LevelDB C++ code builds successfully, but still fails to link, giving us this nasty error message:
x86_64-unknown-cloudabi-ar: Unknown command line argument '-rs'. Try: 'x86_64-unknown-cloudabi-ar -help' x86_64-unknown-cloudabi-ar: Did you mean '-M'?
This error message is due to a missing feature in LLVM’s copy of the
making it unable to parse the provided command line arguments properly.
We can deal with this problem by
removing the hyphen in front of the arguments.
make one more time now gives us a copy of LevelDB built for
x86_64-unknown-cloudabi-ar: creating out-static/libleveldb.a x86_64-unknown-cloudabi-ar: creating out-static/libmemenv.a
If we were to copy these libraries and LevelDB’s header files into the CloudABI toolchain’s prefix, we could make use of them. That said, at this point it would make more sense to just install a prebuilt copy from our CloudABI Ports repository.
To demonstrate that our port of LevelDB actually works, I’ve written
a tiny LevelDB editor.
This utility reads key-value pairs from a terminal and stores them in a
LevelDB. Upon detecting end-of-file, all of the entries in the LevelDB
get printed back to the terminal. In addition to depending on the
LevelDB library, it makes use of a small number of
that are part of Boost. It can be built as follows:
$ wget https://nuxi.nl/blog/assets/cloudabi-edit-leveldb.cc $ x86_64-unknown-cloudabi-c++ -std=c++1z -O2 \ -o cloudabi-edit-leveldb cloudabi-edit-leveldb.cc \ -lleveldb -lsnappy -lboost_iostreams
To run this CloudABI program, we make use of a very simple configuration
cloudabi-run, which grants the program access to only the
directory storing our LevelDB (
./db/) and the terminal.
$ cat cloudabi-edit-leveldb.yaml %TAG ! tag:nuxi.nl,2015:cloudabi/ --- database: !file path: db terminal: !fd stdout
Below is a transcript of what an invocation of this utility looks like:
$ mkdir db $ cloudabi-run cloudabi-edit-leveldb < cloudabi-edit-leveldb.yaml Key: apple Value: green Key: banana Value: yellow Key: orange Value: orange Key: strawberry Value: red Key: ^D |apple| -> |green| |banana| -> |yellow| |orange| -> |orange| |strawberry| -> |red|
Our key-value pairs should get written to disk, meaning that a second invocation will start off with the existing dataset.
$ cloudabi-run cloudabi-edit-leveldb < cloudabi-edit-leveldb.yaml Key: banana Value: brown Key: ^D |apple| -> |green| |banana| -> |brown| |orange| -> |orange| |strawberry| -> |red|
After chatting with a lot people at FOSDEM, I’ve observed that many tend to be under the impression that systems like CloudABI are impractical to use, as they require us to make excessive changes to existing code. With this article, I hope that I’ve demonstrated the opposite, as we’ve managed to get a working sandboxing-aware copy of LevelDB without a lot of effort.
The effort of porting more software over to CloudABI is going on as we speak. One of the things I’m working on right now, is getting Django to work. Expect to see an article on that project in the nearby future!