Nuxi The CloudABI Development Blog

An embedded implementation of the C time conversion functions

June 29, 2016 by Ed Schouten

During my talk about CloudABI at BSDCan in 2015, I mentioned that CloudABI’s C library ships with a custom implementation of the C time conversion functions. As I only had the time to discuss this briefly, I thought it would be interesting to publish an article on how these functions work on most UNIX-like systems and what makes CloudABI’s implementation different.

A quick recap of the C time conversion API

The C standard defines (more than) two different datatypes for expressing time values. The first one is time_t, which is a numerical type that can be used to store the time as the number of seconds relative to a fixed starting point. POSIX requires that time_t is an integer type and 1970-01-01T00:00Z is used as the starting point. As an int32_t would only allow you to express time values between 1901-12-13 and 2038-01-19, you see that most modern operating systems define time_t as an int64_t, which should be sufficient to last the lifetime of our solar system.

Though time_t values allow for easy comparison, addition and subtraction, they are of course not intended to be displayed to users directly. To solve this, the standard also defines the tm structure, which contains an integer field for every Gregorian date and time component:

struct tm {
  int tm_sec;    // Seconds [0,60].
  int tm_min;    // Minutes [0,59].
  int tm_hour;   // Hour [0,23].
  int tm_mday;   // Day of month [1,31].
  int tm_mon;    // Month of year [0,11].
  int tm_year;   // Years since 1900.
  int tm_wday;   // Day of week [0,6] (Sunday = 0).
  int tm_yday;   // Day of year [0,365].
  int tm_isdst;  // Daylight Savings flag.

This structure can be formatted into a human readable string by using strftime(). As expected, the standard also provides a couple of utility functions that can be used to convert between time_t and the tm structure:

struct tm *gmtime(const time_t *);
struct tm *localtime(const time_t *);
time_t mktime(struct tm *);

The first two functions can be used to decompose a time_t value into a tm structure, using UTC or the local system’s time zone, respectively. mktime() does the opposite, packing a tm structure into a time_t value. A less commonly used, but quite useful feature of mktime() is that it normalises any fields that are out of bounds. The code below shows how this feature can be used to obtain a timestamp that corresponds to the last second of the month:

#include <time.h>

time_t end_of_month(time_t t) {
  struct tm *tm = localtime(&t);
  tm->tm_sec = -1;
  tm->tm_min = 0;
  tm->tm_hour = 0;
  tm->tm_mday = 1;
  tm->tm_isdst = -1;
  return mktime(tm);

A problem with the gmtime() and localtime() functions as part of the C standard is that they are not thread-safe. The object that they return may be shared between different threads. POSIX solved this by adding the gmtime_r() and localtime_r() functions. Windows added gmtime_s() and localtime_s() instead.

Another feature that is missing is that there is no counterpart of mktime() that uses UTC instead of the system’s time zone. Many POSIX-like systems provide an extension for this purpose called timegm(). Windows provides a similar function called _mkgmtime().

The IANA time zone database

In my opinion, the most important aspect of providing a usable implementation of these functions is that they use accurate, complete and up-to-date time zone information. As maintaining this data requires quite a lot of effort, you see that most Open Source projects make use of the IANA time zone database, which is released into the public domain.

The tzdata package provided by IANA contains a number of text files filled with time zone rules for many major cities around the world. For each of those cities, these text files contain snippets that look like this:

# Zone  NAME            GMTOFF  RULES   FORMAT  [UNTIL]
Zone Europe/Amsterdam   0:19:32 -       LMT     1835
                        0:19:32 Neth    %s      1937 Jul  1
                        0:20    Neth    NE%sT   1940 May 16  0:00
                        1:00    C-Eur   CE%sT   1945 Apr  2  2:00
                        1:00    Neth    CE%sT   1977
                        1:00    EU      CE%sT

As you can see, the file format is relatively simple, listing the different UTC offsets that are used over time in the GMTOFF column. In this case, it shows that the Netherlands changed its offset on a couple of occasions, both for practical and political reasons. For every entry, it can also refer to a set of daylight saving time rules (RULES) and provide an abbreviation string (FORMAT). This abbreviation can either be constant (‘LMT’) or depend on whether daylight saving time is used (‘CET’/’CEST’). The timestamp specified in the UNTIL column determines when the next line becomes active.

Daylight saving time rules are provided in the same file and may look like this:

# Rule  NAME    FROM    TO      TYPE    IN      ON      AT      SAVE    LETTER/S
Rule    EU      1977    1980    -       Apr     Sun>=1   1:00u  1:00    S
Rule    EU      1977    only    -       Sep     lastSun  1:00u  0       -
Rule    EU      1978    only    -       Oct      1       1:00u  0       -
Rule    EU      1979    1995    -       Sep     lastSun  1:00u  0       -
Rule    EU      1981    max     -       Mar     lastSun  1:00u  1:00    S
Rule    EU      1996    max     -       Oct     lastSun  1:00u  0       -

The FROM and TO columns specify the years to which these rules apply. The IN, ON and AT columns specify at which moment within these years these rules become active. Finally, the SAVE and LETTER/S columns declare the amount of daylight saving time and the midfix to be added to the abbreviation string.

These text files may be processed by the zone information compiler (zic) that is part of the tzcode package. This compiler is capable of flattening the Zone and Rule directives into a chronological sequence of events and writing them into binary files, one for every time zone. These binary files are more efficient to use than the unprocessed text files. On most systems, they are installed in /usr/share/zoneinfo.

The tzcode package also provides a reference implementation of the C functions that we discussed previously, capable of parsing the binary files generated by zic. This implementation is used by most of the C libraries shipped with the BSDs. Glibc and musl include their own implementations that are capable of parsing the same binary files.

CloudABI’s implementation of the C time conversion functions

What simplifies things a lot for us is that due to the strong isolation that CloudABI provides, there is no such thing as a system-wide time zone that is visible to the application. In our environment, gmtime_r(), localtime_r(), timegm() and mktime() all use UTC unconditionally. The underlying conversion routines they use are called __localtime_utc() and __mktime_utc().

That said, providing support for UTC only would be rather limiting. Applications like web services often want to personalise responses by displaying timestamps in the user’s time zone, for example. This is why our C library has support for time zones as part of its localisation framework (category LC_TIMEZONE_MASK). Two new functions can be used to convert timestamps to arbitrary time zones in a thread-safe way, now using the timespec structure to offer nanosecond precision:

int localtime_l(const struct timespec *, struct tm *, locale_t);
int mktime_l(const struct tm *, struct timespec *, locale_t);

For CloudABI, it makes far more sense to have the time zone database integrated into the C library as opposed to loading the binary time zone files from disk. As CloudABI programs cannot access any global paths by design, programs would otherwise need to be configured explicitly to gain access to /usr/share/zoneinfo, which would be impractical. Integration has the advantage of providing a good out-of-the-box experience, while also making it easy to get consistent behaviour when migrating CloudABI programs across systems.

While implementing this feature, we realised that importing the binary files generated by zic into the C library directly would be infeasible, for the reason that for all time zones combined, these consume approximately 2 MB of space. To solve this, we have implemented localtime_l() and mktime_l() in such a way that they don’t require the dataset to be flattened. They can work on a dataset that is structured similarly to the text files, while still providing a good running time complexity.

New releases of the IANA time zone database are imported into cloudlibc’s source tree regularly. The text files with the definitions are converted by a not-so-pretty Python script into a C header file. When compiled, the time conversion routines together with the dataset account for approximately 115 KB of space. This is remarkably small, as IANA’s reference implementation that has to load time zone definitions from disk is already 70 KB in size. Having very few dependencies, this implementation is also easily usable in embedded/freestanding environments.

Testing our new implementation

Coming up with high-quality test vectors for our implementation turned out to be easier than expected. The tzcode package ships with a little known utility called zdump. When called with the -v flag, this utility walks over time and prints pairs of consecutive timestamps at which changes to the time zone’s offset, abbreviation or daylight saving time flag occur:

$ zdump -v Europe/Amsterdam
Europe/Amsterdam  Sun Mar 27 00:59:59 2016 UTC = Sun Mar 27 01:59:59 2016 CET isdst=0 gmtoff=3600
Europe/Amsterdam  Sun Mar 27 01:00:00 2016 UTC = Sun Mar 27 03:00:00 2016 CEST isdst=1 gmtoff=7200
Europe/Amsterdam  Sun Oct 30 00:59:59 2016 UTC = Sun Oct 30 02:59:59 2016 CEST isdst=1 gmtoff=7200
Europe/Amsterdam  Sun Oct 30 01:00:00 2016 UTC = Sun Oct 30 02:00:00 2016 CET isdst=0 gmtoff=3600

With some scripting, we can convert the output of this utility to a 50 MB C file, containing test vectors for every change in every supported time zone. This C file allowed us to uncover a lot of small implementation bugs. As this source file is too large to distribute, only tests for time zones that triggered bugs during development have been checked in permanently.