Nuxi The CloudABI Development Blog

Porting LevelDB to CloudABI

February 18, 2017 by Ed Schouten

Two weeks ago I gave two talks at FOSDEM: one where I gave a general overview of the CloudABI project and one where I discussed how CloudABI works on FreeBSD. Though I think that both talks give a good insight in the project, there are always topics that didn’t make the cut, due to time on stage being limited.

Today we’re going to cover one of such topics, namely how one goes about porting a piece of software to CloudABI. Though there is no universal recipe for this, let’s take a look at the steps I have made to port LevelDB. This will give us technical insight in both the workings of CloudABI and LevelDB.

Introducing LevelDB

LevelDB is a high-performance persistent key-value store that is implemented as a C++ library. Due to it being a library, it can easily be embedded into programs. It doesn’t need to run as a separate service. It also doesn’t support any clustering or replication, but it’s important to keep in mind that this can be implemented as a layer on top. Distributed databases like Google’s Bigtable partition rows in a table into so-called tablets, which can be spread out across systems in a cluster.

LevelDB uses a data model similar to Bigtable’s internal tablet format. The data structures that LevelDB uses are often collectively referred to as a log-structured merge-tree. Ilya Grigorik wrote a nice article on his weblog that accurately summarises how they work.

Having LevelDB packaged for CloudABI is nice to have, as it allows our users to experiment with building services that need to keep track of persistent data, while still being very strongly sandboxed.

Getting LevelDB to build

Okay, enough talking about LevelDB. Let’s actually get it ported over to CloudABI. The first step is to download the source tarball and extract it.

$ wget https://github.com/google/leveldb/archive/v1.19.tar.gz
$ tar -xf v1.19.tar.gz
$ cd leveldb-1.19

LevelDB uses a simple Makefile to build its sources. After browsing through this file, we can determine that we have to invoke make with a couple of variables set. First of all, we need to ensure that we’re using the right build tools (AR and CC). As CloudABI executables are always linked statically, we can disable shared library support (SHARED_LIBS). For now, there is also no need to build any command line utilities (SHARED_PROGRAMS and STATIC_PROGRAMS). Finally, LevelDB’s build infrastructure depends on a variable specifying for which operating system to target (TARGET_OS).

Even with all of the required variables set, make will fail very early on.

$ make \
      AR=x86_64-unknown-cloudabi-ar CXX=x86_64-unknown-cloudabi-c++ \
      SHARED_LIBS= SHARED_PROGRAMS= STATIC_PROGRAMS= \
      TARGET_OS=CloudABI
Unknown platform!

By searching through the source tree, we can find out that this error message is generated by build_detect_platform. We still need to teach this script what to do when TARGET_OS=CloudABI. After applying this patch, the compilation process may start, but will still not be able to successfully compile any source files.

./port/port_posix.h:38:12: fatal error: 'endian.h' file not found
  #include <endian.h>
           ^~~~~~~~~~

LevelDB has most of its OS-dependent definitions stored in a header file called port_posix.h. Looking through this source file, we need to make two adjustments to it. First of all, CloudABI doesn’t provide the Linux-specific header <endian.h>, which is used by LevelDB to determine the system’s endianness. In this case we can patch up the code to simply use Clang/GCC’s built-in __BYTE_ORDER__ definition.

Second, by default, LevelDB will make use of several standard I/O functions that are only provided by the GNU C library, such as fread_unlocked(), fwrite_unlocked() and fflush_unlocked(). An option in port_posix.h can be used to fall back to their non-_unlocked() counterparts.

After applying this patch, the build may continue. We can now get to the point where all source files build, with the exception of env_posix.cc:

util/env_posix.cc:229:16: error: use of undeclared identifier 'open'
      int fd = open(dir.c_str(), O_RDONLY);
               ^

What’s interesting about LevelDB’s design is that any interaction with the outside world (in this case, just the file system) is done through an interface, called Env. LevelDB ships with two implementations of Env: an in-memory environment used for testing and a POSIX environment that stores data on disk. A Singleton instance of the POSIX environment can be obtained by calling Env::Default().

In the case of CloudABI, the POSIX environment fails to build, as it attempts to make use of functions that use the global file system namespace, one of the things CloudABI is actively trying to prevent. On CloudABI, we must access the file system namespace by using the POSIX *at() system calls (e.g., openat()), so that processes can only access select portions of the namespace.

What distinguishes CloudABI from comparable sandboxing frameworks is that all of the places that need to be patched up to use *at() can be detected by simply building the code. Repetetive builds allowed me to derive this patch, which extends the POSIX environment to keep track of a directory file descriptor in a class member. The POSIX environment is thus no longer a Singleton object. Multiple environment objects, each potentially using a different base directory, can now be created using Env::DefaultWithDirectory().

In addition to this specific change, we have to make two more tweaks to env_posix.cc for it to build:

util/env_posix.cc:259:16: error: variable has incomplete type 'struct flock'
  struct flock f;
               ^

CloudABI doesn’t support any file locks. The reason for this is that due to their semantics (them being per-process, as opposed to per-descriptor), they don’t work well with composition and decomposition of processes (privilege separation). In the case of LevelDB, we can get away with disabling file locking entirely.

util/env_posix.cc:478:61: error: use of undeclared identifier 'geteuid'; did you mean 'gettid'?
      snprintf(buf, sizeof(buf), "/tmp/leveldbtest-%d", int(geteuid()));
                                                            ^~~~~~~

As CloudABI doesn’t support any traditional UNIX credentials management, geteuid() is no longer present. LevelDB uses this function to attempt to generate a unique directory for testing. As this code doesn’t end up getting used in our case, we can work around this by replacing the UID by a constant.

With these patches applied, all of the LevelDB C++ code builds successfully, but still fails to link, giving us this nasty error message:

x86_64-unknown-cloudabi-ar: Unknown command line argument '-rs'.  Try: 'x86_64-unknown-cloudabi-ar -help'
x86_64-unknown-cloudabi-ar: Did you mean '-M'?

This error message is due to a missing feature in LLVM’s copy of the ar utility, making it unable to parse the provided command line arguments properly. We can deal with this problem by removing the hyphen in front of the arguments. Running make one more time now gives us a copy of LevelDB built for CloudABI.

x86_64-unknown-cloudabi-ar: creating out-static/libleveldb.a
x86_64-unknown-cloudabi-ar: creating out-static/libmemenv.a

If we were to copy these libraries and LevelDB’s header files into the CloudABI toolchain’s prefix, we could make use of them. That said, at this point it would make more sense to just install a prebuilt copy from our CloudABI Ports repository.

Testing our copy of LevelDB

To demonstrate that our port of LevelDB actually works, I’ve written a tiny LevelDB editor. This utility reads key-value pairs from a terminal and stores them in a LevelDB. Upon detecting end-of-file, all of the entries in the LevelDB get printed back to the terminal. In addition to depending on the LevelDB library, it makes use of a small number of iostreams classes that are part of Boost. It can be built as follows:

$ wget https://nuxi.nl/blog/assets/cloudabi-edit-leveldb.cc
$ x86_64-unknown-cloudabi-c++ -std=c++1z -O2 \
      -o cloudabi-edit-leveldb cloudabi-edit-leveldb.cc \
      -lleveldb -lsnappy -lboost_iostreams

To run this CloudABI program, we make use of a very simple configuration file for cloudabi-run, which grants the program access to only the directory storing our LevelDB (./db/) and the terminal.

$ cat cloudabi-edit-leveldb.yaml
%TAG ! tag:nuxi.nl,2015:cloudabi/
---
database: !file
  path: db
terminal: !fd stdout

Below is a transcript of what an invocation of this utility looks like:

$ mkdir db
$ cloudabi-run cloudabi-edit-leveldb < cloudabi-edit-leveldb.yaml
  Key: apple
Value: green
  Key: banana
Value: yellow
  Key: orange
Value: orange
  Key: strawberry
Value: red
  Key: ^D
|apple| -> |green|
|banana| -> |yellow|
|orange| -> |orange|
|strawberry| -> |red|

Our key-value pairs should get written to disk, meaning that a second invocation will start off with the existing dataset.

$ cloudabi-run cloudabi-edit-leveldb < cloudabi-edit-leveldb.yaml
  Key: banana
Value: brown
  Key: ^D
|apple| -> |green|
|banana| -> |brown|
|orange| -> |orange|
|strawberry| -> |red|

Closing words

After chatting with a lot people at FOSDEM, I’ve observed that many tend to be under the impression that systems like CloudABI are impractical to use, as they require us to make excessive changes to existing code. With this article, I hope that I’ve demonstrated the opposite, as we’ve managed to get a working sandboxing-aware copy of LevelDB without a lot of effort.

The effort of porting more software over to CloudABI is going on as we speak. One of the things I’m working on right now, is getting Django to work. Expect to see an article on that project in the nearby future!