Convert Binary Data to Printable Hexadecimal

In this thread, someone pointed out that the following code should only be used in "toys" projects. Unfortunately, he didn't come back to say why it's not product quality, so I was hoping someone in the community could either reassure me that the code is ok (because I really like it) or determine what is wrong ...

template< class T1, class T2>
void hexascii( T1& out, const T2& in )
{
    out.resize( in.size() * 2 );
    const char hexDigits[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); ++it )
    {
        *outit++ = hexDigits[*it >> 4];
        *outit++ = hexDigits[*it & 0xF];
    }
}

template<class T1, class T2>
void asciihex( T1& out, const T2& in )
{
    size_t size = in.size;
    assert( !(size % 2) );

    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( T2::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
                (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f);
    }
}

      

Edit: Thanks for the help, you've made some big improvements. I wrote functions in the two suggested styles from your answers. Some rough tests show the second method is slightly faster than the first, but IMO this is outweighed by the improved readability of the first.

template<class T1>
void asciihex2( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
    for( size_t i = 0; i < in.size(); i += 2 )
    {
        int tmp;
        sscanf( in.c_str() + i, "%02X", &tmp );
        *outit++ = tmp;
    }
}

template<class T1>
void asciihex3( T1& out, const std::string& in )
{
    dassert( sizeof(T1::value_type)==1 );
    size_t size = in.size();
assert( !(size % 2) );
    out.resize( size / 2 );
    T1::iterator outit = out.begin();
const char hexDigits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 
                          0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
                  0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F};
for( std::string::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
    {
    *outit = (hexDigits[(*it - 0x30) & 0x1f] << 4) + 
              hexDigits[((*(it+1) - 0x30) & 0x1f)];
    }
}

      

Some of the assumptions referencing this code are: 1: they are not intended to be generic, but are used in the anonymous namespace to translate data for a specific class. 2: a template is required since two different container types are used (one is std :: vector, the other is a similar byte array type container from a third party library. 3: The goal is to be able to convert binary data of undefined length to strings and vice versa 6 : He needs commenting

Any other ideas are appreciated.

+1


source to share


9 replies


It looks like a lot of boilerplate code is achieved with very little since you have direct hex conversion in the standard C scanf and printf functions . why bother?



+8


source


My main comment is that it is very difficult to read.

In particular:



*outit = ((( (*it > '9' ? *it - 0x07 : *it)  - 0x30) << 4) & 0x00f0) + 
            (((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)

      

I would need my brain to figure this out, and annoy me if I inherited the code.

+6


source


What should he do? There is no well-known accepted meaning of hexascii or asciihex, so the names must change.

[edit] Converting from binary to hexadecimal notation often should not be called ascii ... as ascii is a 7-bit format.

+4


source


I don't mind this. It is generic (within), it uses constants, references where needed, etc. It lacks documentation, and the purpose is asciihex

*outit

not entirely clear at a glance.

resize

initializes unneeded output elements (use instead reserve

).

Maybe too much flexibility: you can combine algorithms with any data type you like, while you only have to specify hexadecimal numbers (like a vector

of double

s)

And indeed, it can be a bit overkill given the good library functions.

+3


source


what happened with

*outit = hexDigits[*it]

      

Why can't these two functions share a common hexDigits list and eliminate the tricky (and slow) calculation of the ASCII character?

+3


source


  • There are assert statements in the code instead of handling the error condition correctly (and if your assert is disabled the code could explode)

  • for a loop has a dangerous double increment of the iterator (it + = 2). Especially in case your assertion didn't work. What happens when your iterator is at the end and you are ++ this?

  • The code is templated, but what you are doing is just converting characters to numbers or vice versa. This is cargo cult programming . You hope that the blessings of template programming will come to you through templates. You even flagged this as a boilerplate question, although the boilerplate aspect is completely irrelevant in your functions.

  • the line * outit = is too complex.

  • the code reinvents the wheel. To a large extent.

+3


source


Some problems I see:

This will work great if only used for an input container that stores 8 bit types - eg. char or unsigned char. For example, the following code will not work if used with a 32-bit type that is greater than 15 after the right shift - it is recommended to always use a mask to ensure that the search index is always within the range.

*outit++ = hexDigits[*it >> 4];

      

What is the expected behavior if you pass a container containing unsigned longs - in order for this to be a generic class, it probably has to handle converting 32-bit numbers to headstrings as well.

This only works when the input is a container - what if I just want to convert one byte? Here's a suggestion to refactor the code into a main function that can hide one byte (hex => ascii and ascii => hex) and then provide additional functions to use that main function to cover containers of bytes, etc.

In asciihex () bad things will happen if the size of the input container is not divisible by 2. Usage:

it != in.end(); it += 2

      

is dangerous, because if the size of the container is not divisible by 2, then an increment of two will cause the iterator to go past the end of the container, and comparison with end () will never work. This is somewhat shielded from calling assert, but the assertion can be compiled (for example, it is often compiled in release builds), so it would be much better to make this if statement.

+2


source


The problems I see:

hexascii does not check if sizeof(T2::value_type)==1

hexascii dereferences it

twice, asciihex even more. There is no reason for this, as you can save the result. This means that you cannot use istream_iterator.

asciihex needs a random iterator as input because (it + 1) and (it + = 2) are used. The algorithm can work on a cross-iterator if you only use (++ it).

(*it > '9' ? *it - 0x07 : *it) - 0x30

can be simplified to *it - (*it > '9' ? 0x37 : 0x30)

so that there is only one unconditional subtraction left. However, finding an array will be more efficient. Subtract 0x30. '0' will become 0; 'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case insensitive and you can perform the resulting lookup in char [0x20] without the risk of overflow. Non-hex characters will just give you weird meanings.

+1


source


The reason I will be looking at this toy code is for error checking.

I could give him two vectors, and he would gladly try to do something and make a complete mess, generating random delusions.

0


source







All Articles