Decode Byte Stream

I have a series of posts that are defined by independent frameworks. These structures share a common header that is sent between applications. I am creating a decoder that will take the raw data captures in messages that were built using these structures and decode / parse them into some plain text.

I have over 1000 different messages that need to be decoded, so I'm not sure if defining all of the structural formats in XML and then using XSL or some kind of translation is the way to go, or if there is a better way to do it.

Sometimes I will need to decode logs containing over a million messages, so performance is an issue.

Any advice on techniques / tools / algorithms for making a decoder / parser?

struct:
struct {
  dword messageid;
  dword datavalue1;
  dword datavalue2;
} struct1;

      

An example of initial data:

0101010A0A0A0A0F0F0F0F

      

Decoded message (desired output):

message id: 0x01010101, datavalue1: 0x0A0A0A0A, datavalue2: 0x0F0F0F0F

      

I am using C ++ for this development.

0


source to share


4 answers


In terms of "performance" - if you are using disk IO and IO mapping is possible, I doubt your parser / decoder will have much of an effect unless you are using a truly terrible algorithm.

I'm also not sure what the problem is. With this question in mind right now, you have 3 DWORDs in your structure and you are claiming that there are over 1000 unique messages based on these values.

Your decoded message does not mean you need any parsing - only direct output seems to work (conversion from byte to ascii representation of hex value)



If you have a value-to-string mapping then the big switch statement is simple - or alternatively if you wanted to have these added dynamically or change the mapping, then I would provide key / value pairs (mapping) in the config file (text, xml and etc.) and then search for when the log file / raw data is read.

is what I will use in this case.

Perhaps if you provide another specific example of values ​​and decoded output, I can offer a more appropriate suggestion.

0


source


If you have message definitions already provided in the syntax you used in your example, you definitely don't need to try to convert it manually to a different syntax (XML or otherwise).

Instead, you should try to write a compiler that takes these method definitions and compiles them into a decoder function.



It is recommended these days to use ANTLR as a parser generator, using any of the ANTLR languages ​​for a real compiler (Java, Python, Ruby, C #, C ++). This compiler then has to output the C code that does all the decoding and is pretty-printable.

0


source


You can use yacc or antlr, add the appropriate parsing rules, populate some data structure from it (tree maybe) when parsing, then traverse the data structure and do whatever you like.

0


source


I'm going to assume that all you have to do is format the records and output them.

Use a dedicated code generator. The generated code will look something like this:

typedef struct { word messageid; } Header;

//repeated for each record type
typedef struct {
    word messageid;
    // <members here>
} Record_##;
//END


void Process(Input inp, Output out) {
    char buffer[BIG_ENOUGH];
    char *offset;

    offset = &buffer[BIG_ENOUGH];

    while(notEnd) {
        if(&offset[sizeof(LargestStruct)] >= &buffer[BIG_ENOUGH])
            // move remaining buffer to start and fill tail from inp

        Header *hpt = (Header*)offset;

        switch(hpt->messageid)
        {
            //repeated for each record type
            case <recond ID for given type>: 
            {
                Record_##* rpt = (Record_##*)offset;
                outp.format("name1: %t, ...\n", rpt->name1, ...);
                offset += sizeof(Record_##);
                break;
            }
            //END
        }
    }
}

      

Most of this boiler plate is, so it shouldn't be difficult to write a program to create it.

If you need more processing, I think this idea could be modified by some to make this work.


Edit: After re-reading the question, it looks like you may have structures already defined. In this case, you can just simply #include

use them and use them directly. However, then you get the question of how to parse structures to generate input to the formatting function. Awk or sed might be helpful there.

0


source







All Articles