Micro Property - minimalistic binary data serializer for embedded systems

Micro Property is a library for serializing data with minimal overhead. It is designed for use in microcontrollers and a variety of memory-constrained embedded devices that have to operate over low-speed communication lines.



Of course, I know about such formats as xml, json, bson, yaml, protobuf, Thrift, ASN.1. I even found an exotic Tree, which itself is a killer of JSON, XML, YAML and others like them .



So why didn't they all fit? Why was I forced to write another serializer?



After the publication of the article in the comments, they gave several links to the CBOR , UBJSON and MessagePack formats I missed . And they are likely to solve my problem without writing a bicycle.

It's a shame that I couldn't find these specifications earlier, so I'll add this paragraph for the readers and for my own reminder not to rush to write code ;-).

Reviews of formats on HabrΓ©: CBOR , UBJSON



image





Initial requirements



Imagine that you need to modify a distributed system consisting of several hundred devices of different types (more than ten types of devices performing different functions). They are combined into groups that exchange data with each other via serial communication lines using the Modbus RTU protocol.



Also, some of these devices are connected to a common CAN communication line, which provides data transfer within the entire system as a whole. The data transfer rate on the Modbus communication line is up to 115200 Baud, and the speed on the CAN bus is limited to the speed up to 50kBaud due to its length and the presence of serious industrial interference.



The overwhelming majority of devices are developed on microcontrollers of the STM32F1x and STM32F2x series. Although some of them work on STM32F4x too. And of course, Windows / Linux based systems with x86 microprocessors as top-level controllers.



To estimate the amount of data that is processed and transmitted between devices or stored as settings / operating parameters: In one case - 2 numbers of 1 byte and 6 numbers of 4 bytes, in the other - 11 numbers of 1 byte and 1 number of 4 bytes and etc. For reference, the data size in a standard CAN frame is up to 8 bytes, and in a Modbus frame, up to 252 bytes of payload.



If you have not yet penetrated the depth of the rabbit hole, add to this input data: the need to keep track of protocol versions and firmware versions for different types of devices, as well as the requirement to maintain compatibility not only with the currently existing data formats, but also to ensure joint work of devices with future generations, which also do not stand still and are constantly evolving and reworked as the functionality develops and jambs are found in implementations. Plus, interaction with external systems, expansion of requirements, etc.



Initially, due to limited resources and low speeds of communication lines, a binary format was used for data exchange, which was tied only to Modbus registers. But such an implementation did not pass the first test for compatibility and extensibility.



Therefore, when redesigning the architecture, it was necessary to abandon the use of standard Modbus registers. And not even because other communication lines are used in addition to this protocol, but rather because of the excessively limited organization of data structures based on 16-bit registers.



Indeed, in the future, with the inevitable evolution of the system, it may be required (and in fact, it was already required), to transfer text strings or arrays. In theory, they can also be displayed on the Modbus register map, but this turns out to be oil, because comes abstraction over abstraction.



Of course, you can transfer data as a binary blob with reference to the protocol version and the block type. And although at first glance, this idea may seem sound, because by fixing certain requirements for the architecture, you can define data formats once and for all, thereby significantly saving on the overhead costs that will be inevitable when using formats such as XML or JSON.



To make it easier to compare options, I made the following table for myself:
:

:



  • . , .


:



  • , .
  • . , .
  • . , , . , .
  • , .


:



:

  • .


:

  • . , .
  • , , .




And just imagine how several hundred devices begin to exchange binary data with each other, even with the binding of each message to the protocol version and / or device type, then the need to use a serializer with named fields immediately becomes obvious. After all, even a simple interpolation of the complexity of supporting such a solution as a whole, albeit after a very short time, forces you to grab your head.



And this, even without taking into account the expected wishes of the customer to increase the functionality, the presence of mandatory jambs in the implementation and "minor", at first glance, improvements, which will certainly bring with them a special piquancy in the search for recurring jambs in the well-coordinated work of such a zoo ...



image



What are the options?



After such reasoning, you involuntarily come to the conclusion that it is required from the very beginning to lay a universal identification of binary data, including when exchanging packets over low-speed communication lines.



And when I came to the conclusion that one cannot do without a serializer, I first looked at the existing solutions that have already proven themselves from the best side, and which are already used in many projects.



The basic formats xml, json, yaml and other text variants with a very convenient and simple formal syntax, which is well suited for processing documents and at the same time convenient for reading and editing by humans, had to be dropped immediately. And just because of their convenience and simplicity, they have a very large overhead when storing binary data, which just needed to be processed.



Therefore, in view of limited resources and low-speed communication lines, it was decided to use a binary data presentation format. But even in the case of formats that can convert data to a binary representation, such as Protocol Buffers, FlatBuffers, ASN.1 or Apache Thrift, the overhead of data serialization, as well as the general convenience of their use, did not contribute to the immediate implementation of any of these libraries.



The BSON format, which has a minimal overhead, was the best suited for the combination of parameters. And I seriously considered using it. But as a result, he nevertheless decided to abandon it, since all other things being equal, even BSON will have unacceptable overhead costs.

It may seem strange to some that you have to worry about a dozen extra bytes, but unfortunately, this dozen bytes will have to be transmitted every time a message is sent. And in the case of working on low-speed communication lines, even an extra ten bytes in each package are important.



In other words, when you operate with ten bytes, you begin to count each of them. But along with the data, device addresses, packet checksums and other information specific to each communication line and protocol are also transmitted to the network.

What happened



As a result of thought and a few experiments, a serializer with the following features and characteristics was obtained:



  • Overhead for fixed-size data is 1 byte (not counting the length of the data field name).
  • , , β€” 2 ( ). , CAN Modbus, .
  • β€” 16 .
  • , , .. . , 16 .
  • (, ) β€” 252 (.. ).
  • β€” .
  • . .
  • Β« Β», , . , , - ( 0xFF).
  • . , . .
  • , . .




  • 8 64 .
  • .
  • ( ).
  • β€” . , , . ;-)
  • . , .


I would like to note separately



The implementation is done in C ++ x11 in a single header file using the SFINAE (Substitution failure is not an error) templating mechanism.



Supported by the correct reading of the data in the buffer (variable) b About ng bigger size than the stored data type. For example, an integer of 8 bits can be read into a variable from 8 to 64 bits. I'm thinking, maybe it would be worth adding a packing of integers, the size of which exceeds 8 bits, so that they can be transmitted in a smaller number.



Serialized arrays can be read both by copying to the specified memory area, or by obtaining a normal reference to the data in the original buffer, if you want to avoid copying, in cases where it is not required. But this feature should be used with caution, because arrays of integers are stored in network byte order, which can differ between machines.



Serialization of structures or more complex objects was not even planned. It is generally dangerous to transfer structures in binary form because of the possible alignment of its fields. But if this problem is nevertheless solved in a relatively simple way, then there will still be a problem of converting all fields of objects containing integers to network byte order and back.



Moreover, in case of emergency, structures can always be saved and restored as an array of bytes. Naturally, in this case, the conversion of integers will need to be done manually.



Implementation



The implementation is here: https://github.com/rsashka/microprop



How to use it is written in examples with varying degrees of detail:



Fast use
#include "microprop.h"

Microprop prop(buffer, sizeof (buffer));//      

prop.FieldExist(string || integer); //      ID
prop.FieldType(string || integer); //    

prop.Append(string || integer, value); //  
prop.Read(string || integer, value); //  




Slow and thoughtful use
#include "microprop.h"

Microprop prop(buffer, sizeof (buffer)); //  

prop.AssignBuffer(buffer, sizeof (buffer)); //  
prop.AssignBuffer((const)buffer, sizeof (buffer)); //  read only 
prop.AssignBuffer(buffer, sizeof (buffer), true); //  read only 

prop.FieldNext(ptr); //     
prop.FieldName(string || integer, size_t *length = nullptr); //   ID 
prop.FieldDataSize(string || integer); //   

//   
prop.Append(string || blob || integer, value || array);
prop.Read(string || blob || integer, value || array);

prop.Append(string || blob || integer, uint8_t *, size_t);
prop.Read(string || blob || integer, uint8_t *, size_t);

prop.AppendAsString(string || blob || integer, string);
const char * ReadAsString(string || blob || integer);




Example implementation using enum as data identifier
class Property : public Microprop {
public:
    enum ID {
    ID1, ID2, ID3
  };

  template <typename ... Types>
  inline const uint8_t * FieldExist(ID id, Types ... arg) {
    return Microprop::FieldExist((uint8_t) id, arg...);
  }

  template <typename ... Types>
  inline size_t Append(ID id, Types ... arg) {
    return Microprop::Append((uint8_t) id, arg...);
  }

  template <typename T>
  inline size_t Read(ID id, T & val) {
    return Microprop::Read((uint8_t) id, val);
  }

  inline size_t Read(ID id, uint8_t *data, size_t size) {
    return Microprop::Read((uint8_t) id, data, size);
  }

    
  template <typename ... Types>
  inline size_t AppendAsString(ID id, Types ... arg) {
    return Microprop::AppendAsString((uint8_t) id, arg...);
  }

  template <typename ... Types>
  inline const char * ReadAsString(ID id, Types... arg) {
    return Microprop::ReadAsString((uint8_t) id, arg...);
  }
};




The code is published under the MIT license, so use it for health.



I will be glad to any feedback, including comments and / or suggestions.



Update: I was not mistaken in choosing a picture for the article ;-)



All Articles