Efficient and flexible analysis of binary data

I have an external device that spits out UDP packets of binary data and software running on an embedded system, which should read that data stream, parse it, and do something useful. Binary data is also logged in the file. I would like to write a parser that can easily take input data directly from a UDP stream or file, parse the data in a specific format, and then pipe the output to a file (like a matlab dat file) or another process that will do some realtime processing ... Are there any resources that would help me with this and what is the best way to do it? I think it makes sense to use C ++ streams, but I'm not familiar with creating custom output streams.Does this seem like a good adoption approach or is there a better way to do it?

Thank.

+2


source to share


2 answers


The beauty of binary data is that it tends to be in a very fixed format. A typical method for parsing it is to declare a structure that maps to the received packets, and then just use a vise to read the fields as structure members.

The beauty is that it doesn't require any parsing.



you have to be careful with packaging rules and content to make the card structure exactly the same. Using the C macros "offsetof" and "sizeof" is useful to emit some debugging information to check if your structure is actually being matched to what you think the mapping is.

Packing rules can usually be changed either by directives (like # pragma) or by command line parameters. The endian you're stuck with. If it differs from what your embedded system uses, declare all fields as bytes, or use something like the "ntoh" macro to swap bytes.

+4


source


The New Jersey Machine Code Toolkit is a scheme for decoding arbitrary binary patterns. It was originally designed to decode instruction sets, but should be fine for decoding message formats. You provide a description of the binary format, it synthesizes the code to access the fields of that format (when valid). You can reference message fields using generated function calls rather than thinking about where the field is or how it is encoded.



0


source







All Articles