Serialization

Serialization is primarily provided through the rbuf interface, which is a custom IDL format.

A typical rbuf file looks something like this:

namespace ark::example;

schema MyMessage
{
    uint32 identifier;
    string payload;
}

The IDL format supports a handful of features:

primitives (see below for a full list)
define “schemas” (objects)
define enumerations (with varying sizes, such as 8-bit or 32-bit)
reference other schemas or enums
specify a namespace
create dynamic or fixed-size lists of objects
create dictionaries of objects
create variants of schemas
optional fields
include other rbuf files
define global constant strings/integers
define constant primitive types within “schemas”
use forwards/backwards compatibility with “groups” and field retiring

Additional functionality will be added as-needed. The rbuf compiler will generate C++, JavaScript, or Python code that includes both the structure definition and the code to serialize/deserialize the structure.

In addition to basic serialization, Ark also uses rbuf internally as the backend for ROS serialization, along with the ability to deserialize YAML-based configuration into rbuf-generated structures.

cmake

You can specify one or more rbuf files to be compiled into a messages library with this cmake definition:

add_rbuf(my_message_lib my_message1.rbuf my_message2.rbuf)

Supported Field Types

The normal primitives are all supported (uint8, uint16, uint32, uint64, int8, int16, int32, int64, float, double, and bool). Additionally, a few other types are supported:

string - A std::string equivalent.
guid - A core::Guid equivalent.
duration - A std::chrono::nanoseconds equivalent.
steady_time_point - A std::chrono::steady_clock::now equivalent.
system_time_point - A std::chrono::system_clock::now equivalent.
byte_buffer - A core::ByteBuffer equivalent.
arraylist<T> - A std::vector<T> equivalent.
dictionary<K, V> - A std::map<K, V> equivalent.
array<T, S> - A fixed-size array (std::array<T, S>) equivalent.
variant<T1, T2, ... Tn> - One of type T (a std::variant equivalent).

Constant Fields

You can also include constant primitives (including string) in a schema.

enum MyObject
{
    const string NAME = "name";
    const float VALUE = 4.2;

    string str;
}

These values are stored in the schema and not serialized into individual messages.

NOTE: Floating point values must include a number before and after the decimal.

Enumerations

You can also create enumerations, and make use of those in your schema definitions. For example:

enum MyEnum
{
    Empty = 0;
    GoodValue = 1;
    BadValue = 2;
}

schema MyObject
{
    MyEnum value;
}

Enums resolve to 32-bit integers, and use enum classes in C++.

You can also make use of “enum classes” (just like C++):

enum MyEnum : int16
{
    Negative = -100;
    Positive = 100;
}

Class types supported are all signed and unsigned integers (8-bit through 64-bit).

When using reflection, you will need to use std::holds_alternative to discover the underlying type (or look at the schema).

Note that the class type (storage type) is not versioned (nor are values). To preserve binary compatibility, you cannot change the size or existing values after you have written data to disk.

You can fetch all potential values an enum can supply by using the ark::serialization::values() API. For example, you can use ark::serialization::values<MyEnum>() from the example above, and you will receive a std::vector<int16_t> which contains the values -100 and 100.

The ark::serialization::count() API is provided to return a count of the number of values contained in an enum. This is only the count of the enum as compiled by rbuf. Forwards/backwards compatible messages may have values outside this range, if enumerations were added/removed. In the example above, ark::serialization::count<MyEnum>() would yield the value 2.

The ark::serialization::max() API is provided to return the maximum value seen in an enumeration, as a convenience over defining it manually. This is only the maximum value of the enum as compiled by rbuf. Forwards/backwards compatible messages may have values outside this range, if enumerations were added/removed. In the example above, ark::serialization::max<MyEnum>() would yield the value 100.

Bitmasks

You can mark an enumeration as a bitmask, which causes the code generators to emit some additional functionality for string conversion and the combination operators (|, &, |=, &=, and ~).

enum bitmask MyBitmask : uint16
{
    FirstSlot = 0x0001;
    SecondSlot = 0x0002;
    TopSlot = 0x8000;
}

You can use the combination operators to set or unset bits. Note that it is possible to get into a situation (particularly with ~) where some bits that are set are not actually defined in your bitmask.

You can then use code like this:

// Set bitmask equal to 0x03
MyBitmask bitmask = MyBitmask::FirstSlot | MyBitmask::SecondSlot;

// Remove the first bit, so it is now 0x02.
bitmask &= ~MyBitmask::FirstSlot;

// And with 0x8000, so it is now zero.
bitmask &= MyBitmask::TopSlot;

Using the to_string or from_string APIs would yield strings that contain all of the values, broken down, such as “FirstSlot, TopSlot”, which would be converted back to a mask of MyBitmask::FirstSlot | MyBitmask::TopSlot.

Nested Objects

Schemas can reference each other. As an example:

schema FirstObject
{
    uint64 identifier;
}

schema SecondObject
{
    FirstObject primary;
    arraylist<FirstObject> additional;
}

schema ThirdObject
{
    dictionary<string, SecondObject> name_to_object_mapping;
}

This will get compiled into two separate structures, with the second object referencing the first as you’d expect.

Variants

A variant allows you to specify a type that contains exactly one of a list of potential types. It’s equivalent to the C++ std::variant class.

When specifying types for the variant, you must tag them with an integer ‘index’. This ‘index’ must be unique over the lifetime of the schema – it is intended to allow for backwards and forwards compatibility, so you can remove or add variant members at later times without breaking message compatibility.

For example:

schema ObjectA
{
    string content;
}

schema ObjectB
{
    uint64 content;
}

schema Message
{
    variant<ObjectA = 1, ObjectB = 2> object;
}

This creates a variant with two types, ObjectA, and ObjectB. The former has a tag of 1, while the latter has a tag of 2.

This will yield a C++ type of std::variant<ObjectA, ObjectB>, and serialize only the object that is actually used. In this example, this allows you to store either a string or a uint64 depending on your needs.

You could then remove ObjectA and add an ObjectC like so:

schema Message
{
    variant<ObjectB = 2, ObjectC = 3> object;
}

Variants in rbuf can only contain “object” types – they cannot contain lists, maps, or primitives like bare strings, byte buffers, or integers. In those cases, you should wrap the type in an object and make use of that.

For readability, some people prefer to specify each subtype on their own line. This is also permitted:

schema Message
{
    variant<
      ObjectA = 1,
      ObjectB = 2,
      ObjectC = 3
    > object;
}

Normal whitespace rules apply within the <> subtype list.

Arrays

You can use either the arraylist or array keyword to generate an array in rbuf. The arraylist keyword creates a dynamically-sized array, whereas the array keyword creates a fixed-size array.

You should consider using fixed-size arrays only if you know the size of the array ahead of time, and you expect that size to never change. If you change the size of the array, it will break backwards compatibility. The primary benefit is that the on-wire size will be a bit smaller, as the size of the array does not need to be written out.

Examples:

schema MySchema
{
    arraylist<uint32> dynamic_primitive_array;
    arraylist<Object> dynamic_object_array;
    array<uint32, 8> fixed_primitive_array;
    array<Object, 3> fixed_object_array;
}

Including

The rbufc compiler will obey include search paths (-I on the command line). You can then use include statements in your code:

include "path/to/my.rbuf"

schema MySchema
{
    ObjectInMyRbuf object;
}

This assumes ObjectInMyRbuf is defined in the my.rbuf file. The generated code will handle emitting the proper C++ includes. Normal cmake rules apply: if the rbuf you are including is in a different library, you will need to define that in your CMakeLists.txt file explicitly or you may get non-deterministic build errors.

For example, if you have an rbuf in the rbuf library project::perception_messages that includes an rbuf from the rbuf library project::geometry_messages, you would add this to your CMakeLists.txt:

target_link_libraries(
    project_perception_messages
    PUBLIC project::geometry_messages)

Versioning

rbuf supports both forwards and backwards compatable serialization, with some caveats.

The language structures messages into “groups”, such that any group may or may not be present when deserializing the structure.

For example:

schema VersionedObject
{
    uint32 identifier;
    string text;

    group 0
    {
        arraylist<string> metadata;
    }
}

In this example, the fields identifier and text are present in every version of the object. The field metadata is only set if your software both knows about group #0 and if the incoming data has group #0 in it.

Older software (without knowledge of group #0) can still read the object, but the metadata field will not be available to them.

If a group is removed, older software will see default values for all of the fields contained within that group. The fields are default-initialized with C++ logic (containers and strings are empty, primitives are zero initialized, variants contain the first element).

You cannot change the fields that are present (either the “base” fields of a structure, or the fields that are in an established group). You may retire groups (just remove it from the schema) without issue.

You can use group identifiers 0 through 255.

If you wish to omit versioning information altogether, you can apply the [[final]] attribute to your schema. This will prevent you from using groups at all, and your schema must never change (or it will break backwards compatibility).

An example:

[[final]]
schema FinalObject
{
    array<float32, 3> position;
    array<float32, 4> colors;
}

The upside to this is that the on-wire representation is more succinct, which can make a big difference if a particular schema is never expected to change and is repeated in lists many times. The above example will only consume 28 bytes on the wire (whereas if you used dynamic lists and non-final, it would consume around 37 bytes).

Renaming

In binary serialization, you can rename your fields to anything you like without worrying about versioning. The only thing that is relevant is the field type and the ordering.

For JSON or YAML (config) serialization, you can rename fields, but any existing plain-text serialized data will drop data from the renamed field.

In other words, if you had:

config:
  default:
    my_field_value: 10

And renamed my_field_value to new_name, then new_name would be empty when deserializing the above YAML, rather than containing the value 10. You would need to manually rename the fields in all existing configuration structures.

Removing Fields

Right now, you can safely remove any group without hindering binary compatibility. The entire group must be removed.

If you wish to remove an individual field, you must leave it in your rbuf definition, but prefix it with the [[removed]] attribute, like so:

schema Message
{
    [[removed]] arraylist<double> my_old_array;

    arraylist<double> my_new_array;
}

This will prevent my_old_array from being a public member in your structure. Data will be skipped from that structure when deserializing, and when serializing, it will be as if the default value is serialized. This allows you to maintain forwards/backwards compatibility.

Optional Fields

Any field can be marked as ‘optional’ with the [[optional]] attribute. Note that changing the optional attribute will break forwards/backwards compatibility. Once something is marked as optional, it is always optional (or vice-versa).

For example:

schema Message
{
    [[optional]] string name;
    [[optional]] string address;
}

These two fields are entirely optional. In C++, they will be generated with the std::optional type wrapping the string (so, std::optional<std::string>). In Javascript the types are undefined if they are not present, and they are set to None if they are not present in Python.

When emitted as JSON, the field is entirely missing if it is not set. Similarly, if it is missing from the JSON, the optional is not set in the parsing language.

You can use this to reduce storage costs for types if they are not present, as the type is not emitted at all if it is optional and not set.

Combining Attributes

You can combine attributes with a comma operator, like so: [[removed, optional]].

Constant Strings

It can be useful to define a constant string, so that you can reference things like channel names in a more type-safe way. For example:

const CAMERA_CHANNEL = "camera_image/raw";

Would generate C++ code that allows you to reference the variable CAMERA_CHANNEL as a name.

Constant Expressions

You can do very basic constant expressions when assigning constant values to either variables or enum values. For example:

const BASE_VALUE = 0x1000;

const FAULT_CODE_1 = BASE_VALUE | 0x0001;
const FAULT_CODE_2 = BASE_VALUE | 0x0002;

enum FaultCodes : int32
{
    Code1 = FAULT_CODE_1 + 0x100;
    Code2 = FAULT_CODE_2 + 0x200;
}

In this case, FAULT_CODE_1 would become 0x1001 and FaultCodes::Code1 would become 0x1101. Today, only single-operation expressions are supported. Expressions may reference other constants or enum values (from within the rbuf or an included rbuf). Standard namespace resolution rules apply.

Serializing/Deserializing

You can serialize and deserialize with one-liners if you like:

#include "ark/serialization/helpers.hh"

// Create a structure and serialize it.
MySerializableStruct structure;
auto bytes = ark::serialization::serialize(structure);

// Roundtrip it by deserializing into a different structure.
auto roundtrip = ark::serialization::deserialize<MySerializableStruct>(bytes);

At this point, structure and roundtrip should be identical.

Under the hood, these helpers are actually instantiating and managing input and output streams for you. You can manually manage these streams yourself, if that makes it easier:

ark::serialization::OutputStream output;
output.write(<my object>);

ark::serialization::InputStream input(output);
input.read(<my object>);

You can instantiate input streams against arrays of bytes, vectors, byte buffers, or other output streams. They simply wrap existing data; they don’t take ownership of that data or copy of it, so the source must not fall out of scope.

JSON

Any rbuf type can be serialized to or from JSON automatically. You can call these helper routines:

to_json - Converts the object into a JsonBlob
from_json - Converts a JsonBlob into the destination object

The JsonBlob type exists to abstract out the use of JSON libraries internally. You can call str() on it to get a string representation of the JSON.

Example code for serializing to and from JSON:

MySerializableStruct structure;
auto json = ark::serialization::to_json(structure);

MySerializableStruct roundtrip;
ark::serialization::from_json(json, roundtrip);

std::cout << json.str();

At this point structure and roundtrip should be identical, and you should have a JSON string printed to stdout.

There’s also a one-liner to make moving from JSON easier:

#include "ark/serialization/helpers.hh"

auto message = ark::serialization::from_json<MySerializableStruct>("... my json...");

That will treat the string passed in as JSON, and deserialize it into the type specified in your template.

Reflection

Every rbuf object has an associated Schema. This schema is a structure that you can use to serialize and deserialize an object without knowing its type at compile time.

This works by storing the structure of the rbuf in a programmatic structure, and then parsing input/output streams manually.

This is not a very performant way to serialize/deserialize objects, but it has an advantage in that you can runtime deserialize types without knowing (at compile time) what they are.

For example, for logging and ark-spy, they can simply fetch the schemas live to deserialize content for you, rather than needing to dynamically load C++ code.

An example of using this:

// Assuming you have a "registry" from a source...
ark::serialization::Object object(registry, "MySerializableStruct");

object.deserialize(input_bitstream, registry);

// You can now lookup fields...
auto object = object.field<uint32_t>("name");

// Or output JSON:
std::cout << object.json().str() << std::endl;

Schema provides all of the internal fields in its fields member. These are made up of FieldDescriptors which contain enough metadata to understand the name and type of a field.

You can retrieve nested objects by getting their ObjectPtr:

auto nested = object.field<ark::serialization::ObjectPtr>("nested_object");

std::cout << nested.json().str() << std::endl;

That code will retrieve a nested object, and then print out its JSON contents.

Finally, a SchemaRegistry exists as a mechanism to group many schemas together into one place. This is both convenient and necessary for deserializing objects that may reference multiple object types.

You can typically get a SchemaRegistry by querying the log or the HTTP server: these will return a JSON-serialized registry, which can be deserialized into a concrete SchemaRegistry.

You can also get a serialized SchemaRegistry for any type by using this API:

ark::serialization::schema<MySerializableStruct>();

This provides a registry instead of a schema, and that registry will consist of all of the schemas necessary to serialize/deserialize that type.

Supported Languages

The rbufc compiler supports compiling rbuf files and emitting code in both C++, JavaScript, and Python. In all languages, you are capable of fully serializing and deserializing objects using native code (pure C++, pure JavaScript, or pure Python).

Note that Python serialization/deserialization is achieved without the use of C bindings. This means that it is typically slower then just using something like pickle or msgpack (for simple benchmarks, it is typically 1.6x slower). As an example, a small structure with a few strings and primitives takes about 3us to serialize/deserialize in pickle, and about 4.5us to do the same in rbuf. The C++ and JavaScript code is considerably faster.

Use the --langauge command line option to rbufc to generate in a particular language (C++ is chosen by default). See --help for more details.

Writing Files

You can write rbuf objects out to files (and read them back in) using the APIs in the ark/serialization/file.hh header. For example, say you wanted to record your MySerializableStruct to disk:

ark::serialization::write_serializable_type_to_file("test.rbin", my_rbuf_type);

auto content = ark::serialization::read_serializable_type_from_file<MySerializableStruct>("test.rbin");

At this point, content and my_rbuf_type will be identical. Note that you can pass along the ark::core::WriteOptions flags to the write_serializable_type_to_file API, so you can write files out atomically if desired.

This API writes out the binary form of the rbuf to disk, and also includes additional schema infomration and other data, allowing you to read an rbuf file even if you don’t know the type:

auto content = ark::serialization::read_generic_object_from_serialized_file("test.rbin");

std::cout << content->json().str();

This will cause whatever is in test.rbin to be deserialized and output as a JSON object. There is a commandline tool, ark-cat-rbuf-file, which will do this for you.

Note that ark::serialization::read_serializable_type_from_file can also read rbuf files that were simply written to disk directly (say, using ark::core::write_string_to_disk APIs). In that case, you cannot invoke the read_generic_object_from_serialized_file APIs, as the schema information was not written out.

The opposite is also true – files written out with the write_serializable_type_to_file can be read in and deserialized using standard APIs. The serialized data is written out first, followed by a header block, which would be ignored.

Writing to Parquet

You can write simple rbuf files out as rows in a Parquet file. As long as your rbuf only contains basic (non-nested) fields, such as fundamental types, timestamps, strings, or GUIDs, you can write it out as a row in a parquet file.

For example:

#include "ark/serialization/parquet/parquet_writer.hh"
#include "my_rbuf_type.hh"

auto writer = ParquetWriter::Create<MyRbufType>("path/to/file.parquet");

for (size_t row = 0; row < 10; ++row)
{
    MyRbufType type{.my_value = row};

    writer.write(type);
}

writer.close();

Please note that you can also provide a compression type to the constructor, allowing you to create compressed parquet files if desired.