In short, I needed a miniature library for microcontrollers with a binary data serializer and the subsequent transmission of these messages over low-speed communication lines, while the usual formats xml, json, bson, yaml, protobuf, Thrift, ASN.1, etc. did not fit for various reasons.
As expected, the solution turned out to be more than a bike, and nevertheless, the very publication of the article on Habré helped me a lot. The fact is that during the initial analysis of possible libraries, for some reason I overlooked the MessagePack, CBOR and UBJSON serializers.
Links to them were written to me in the comments after the article was published. And I immediately realized that most likely CBOR , UBJSON can easily solve the problem in front of me. And they do it much better than my own development.
After that, I screwed my interface to the CBOR library (so as not to shovel the sources), and ... decided to abandon this format in favor of MessagePack :-)
CBOR vs. MessagePack
Actually CBOR and MessagePack formats use the same principle of data serialization. They are based on a practical method of writing TLVs , with the only exception that in the classic form, a TLV always contains a tag field and a data length field. But the field with the data itself may be absent (if the data size is zero).
And in these serializers, the developers went even further, and made almost ingenious formats in which the presence of a field with the data size depends on the data type and is not required for fixed-size fields, and the first byte stores both the type of the field with the data size and its immediate value (of course, if bit depth allows).
In the original article, I wrote about the fact that I need the maximum packing of binary data, and both of these formats cope with this task with a bang. They are very similar to each other, and differ only in the number of bits in which the field type values are stored.
In CBOR format, the minimum storage overhead for each field is three bits, i.e. in the first byte of each field, the first three bits are responsible for the content type, and depending on it, the presence and size of other fields are interpreted, and the remaining 5 bits may already contain the field value itself.
But in MessagePack they went even further! In this format, the minimum storage overhead for a value is only 1 (ONE!)bit of information. Accordingly, 7 bits can be used to store additional information, and values with the most significant bit set are used to indicate additional information about the field type.
It is clear that the range of representation of negative values with this encoding method is reduced due to positive numbers (only 32 negative numbers can be stored in one byte, and the other values will require the second byte). But this is the correct imbalance and it is shifted in the right direction, because in practice, positive numbers are used much more often than negative ones.
In other words, in one byte in the CBOR format, integer values from 0 to 23 fit, and in the MessagePack format, from 0 to 127!
It was this moment, as well as a normal library with the implementation of the format in a dozen different languages, that determined my final choice in favor of the MessagePack format. I think that I'm not the only one who may be interested in these details of the implementation of these formats, so I think it is right to share this information.
As a result, the original format of the serializer was made even more compact, including due to some conventions (for example, the structure of the encoded data should be limited only to a flat list and the refusal to use unclaimed types), and my sleep became calmer, because no longer a headache about compatibility at the level of the formats of forwarded messages between devices.
Many thanks to Habra-users Spym and edo1hthat responded to a previous post and thus helped find a solution to a really serious problem with such little effort!
Primary sources:
CBOR specification . There is a good article with a description on Habré .
The MessagePack specification is very easy to read in the documentation and does not require any translation or additional explanations.