Length |
---|
Type Tag |
Message |
Checksum |
Checksum. As checksum, the plan is to use a CRC-32. We don't want it to be too large to affect performance, and we want it to catch reasonable losses of integrity. I'm considering storing this as a varint after the actual message, but for the time being, it is given as 4 raw bytes (it is not implemented at all yet). Please give me feedback on this: if we make it a varint, we can stuff the checksum in there, but that will also run the risk of not being able to read the checksum due to corruption of the checksum, so offhand, I would say that a fixed number of bytes is preferable.
Type Tag. The type tag is a single byte giving the type of the event. This means that we are limited to 255 events, but considering that we don't even have 26 events in 5.1 right now, I don't see that we will run into that limit very soon. It is possible to put the type tag in the message as well, and specifying it as an enum inside the protobuf specification, but that will just provide the information in two places, so it is better to keep it separately.
Length. The length is the length of the message, that is, it does not include the type tag, checksum, nor the length itself. This simplifies the normal processing, and in the event that one needs to skip an event, it is easy to compute the next position of a transport frame by just decoding the length (see below).
Length encoding
The length is encoded using a special scheme to allow for very little overhead for small events while still leaving room for giving the length for very large events. This scheme currently allows for a compact representation of lengths from 2 bytes 4 GiB (Gigabytes) and, if you don't need to have a compact representation of 2, you can represent lengths in the range 3 bytes to to 16 EiB (Exabytes) [sic].The basic idea is to note that the length of a message can never be zero, and the minimal length in this case is actually 16 bytes. Since we will never have a length that is less than 16 for an event, that leaves the lengths 0-16 available for denoting other information. The obvious solution is to let the first byte denote either the length, if it is, say greater than 8, or the number of bytes following the byte that gives the length if it is less than or equal to 8, but we can actually do better.
Storing Length requires ceil(log256(Length)) bytes, so if we let the first byte L be
Bytes | Value | ||||
---|---|---|---|---|---|
2A | 42 | ||||
00 | FF | 02 | 767 | ||
01 | FF | 01 | 01 | 00 | 66047 |
Computing the number of bytes can be done by computing 1
<< (L + 1)
, but computing the inverse is a little more
involved.
The following two functions does the job. Although it looks like a lot
of code, length_encode()
is actually only 29 instructions
on my machine (no function calls), while length_decode()
is about 7 instructions. The trick here is to compute the logarithm at
the same time as we serialize the bytes into the memory, so the overhead
compared to just serializing the length is just a few instructions.
inline unsigned char * length_encode(size_t length, unsigned char *buf) { unsigned char *ptr= buf; assert(length > 1); if (length < 256) *ptr++= length & 0xFF; else { int_fast8_t log2m1= -1; // ceil(log2(ptr - buf)) - 1 uint_fast8_t pow2= 1; // pow2(log2m1 + 1) while (length > 0) { // Check the invariants assert(pow2 == (1 << (log2m1 + 1))); assert((ptr - buf) <= (1 << (log2m1 + 1))); // Write the least significant byte of the current // length. Prefix increment is used to make space for the first // byte that will hold log2m1. *++ptr= length & 0xFF; length >>= 8; // Ensure the invariant holds by correcting it if it doesn't, // that is, the number of bytes written is greater than the // nearest power of two. if (ptr - buf > pow2) { ++log2m1; pow2 <<= 1; } } // Clear the remaining bytes up to the next power of two while (++ptr < buf + pow2 + 1) *ptr= 0; *buf= log2m1; assert(ptr == buf + pow2 + 1); } return ptr; } inline const unsigned char * length_decode(const unsigned char *buf, size_t *plen) { if (*buf > 1) { *plen = *buf; return buf + 1; } size_t bytes= 1U << (*buf + 1); const unsigned char *ptr= buf + 1; size_t length= 0; for (unsigned int i = 0 ; i < bytes ; ++i) length |= *ptr++ << (8 * i); *plen= length; return ptr; }