Convert Binary Data to Printable Hexadecimal
In this thread, someone pointed out that the following code should only be used in "toys" projects. Unfortunately, he didn't come back to say why it's not product quality, so I was hoping someone in the community could either reassure me that the code is ok (because I really like it) or determine what is wrong ...
template< class T1, class T2>
void hexascii( T1& out, const T2& in )
{
out.resize( in.size() * 2 );
const char hexDigits[] = {'0', '1', '2', '3', '4', '5', '6', '7','8', '9', 'A', 'B', 'C', 'D', 'E', 'F'};
T1::iterator outit = out.begin();
for( T2::const_iterator it = in.begin(); it != in.end(); ++it )
{
*outit++ = hexDigits[*it >> 4];
*outit++ = hexDigits[*it & 0xF];
}
}
template<class T1, class T2>
void asciihex( T1& out, const T2& in )
{
size_t size = in.size;
assert( !(size % 2) );
out.resize( size / 2 );
T1::iterator outit = out.begin();
for( T2::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
{
*outit = ((( (*it > '9' ? *it - 0x07 : *it) - 0x30) << 4) & 0x00f0) +
(((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f);
}
}
Edit: Thanks for the help, you've made some big improvements. I wrote functions in the two suggested styles from your answers. Some rough tests show the second method is slightly faster than the first, but IMO this is outweighed by the improved readability of the first.
template<class T1>
void asciihex2( T1& out, const std::string& in )
{
dassert( sizeof(T1::value_type)==1 );
size_t size = in.size();
assert( !(size % 2) );
out.resize( size / 2 );
T1::iterator outit = out.begin();
for( size_t i = 0; i < in.size(); i += 2 )
{
int tmp;
sscanf( in.c_str() + i, "%02X", &tmp );
*outit++ = tmp;
}
}
template<class T1>
void asciihex3( T1& out, const std::string& in )
{
dassert( sizeof(T1::value_type)==1 );
size_t size = in.size();
assert( !(size % 2) );
out.resize( size / 2 );
T1::iterator outit = out.begin();
const char hexDigits[] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F};
for( std::string::const_iterator it = in.begin(); it != in.end(); it += 2, ++outit )
{
*outit = (hexDigits[(*it - 0x30) & 0x1f] << 4) +
hexDigits[((*(it+1) - 0x30) & 0x1f)];
}
}
Some of the assumptions referencing this code are: 1: they are not intended to be generic, but are used in the anonymous namespace to translate data for a specific class. 2: a template is required since two different container types are used (one is std :: vector, the other is a similar byte array type container from a third party library. 3: The goal is to be able to convert binary data of undefined length to strings and vice versa 6 : He needs commenting
Any other ideas are appreciated.
source to share
My main comment is that it is very difficult to read.
In particular:
*outit = ((( (*it > '9' ? *it - 0x07 : *it) - 0x30) << 4) & 0x00f0) +
(((*(it+1) > '9' ? *(it+1) - 0x07 : *(it+1)) - 0x30) & 0x000f)
I would need my brain to figure this out, and annoy me if I inherited the code.
source to share
I don't mind this. It is generic (within), it uses constants, references where needed, etc. It lacks documentation, and the purpose is asciihex
*outit
not entirely clear at a glance.
resize
initializes unneeded output elements (use instead reserve
).
Maybe too much flexibility: you can combine algorithms with any data type you like, while you only have to specify hexadecimal numbers (like a vector
of double
s)
And indeed, it can be a bit overkill given the good library functions.
source to share
-
There are assert statements in the code instead of handling the error condition correctly (and if your assert is disabled the code could explode)
-
for a loop has a dangerous double increment of the iterator (it + = 2). Especially in case your assertion didn't work. What happens when your iterator is at the end and you are ++ this?
-
The code is templated, but what you are doing is just converting characters to numbers or vice versa. This is cargo cult programming . You hope that the blessings of template programming will come to you through templates. You even flagged this as a boilerplate question, although the boilerplate aspect is completely irrelevant in your functions.
-
the line * outit = is too complex.
-
the code reinvents the wheel. To a large extent.
source to share
Some problems I see:
This will work great if only used for an input container that stores 8 bit types - eg. char or unsigned char. For example, the following code will not work if used with a 32-bit type that is greater than 15 after the right shift - it is recommended to always use a mask to ensure that the search index is always within the range.
*outit++ = hexDigits[*it >> 4];
What is the expected behavior if you pass a container containing unsigned longs - in order for this to be a generic class, it probably has to handle converting 32-bit numbers to headstrings as well.
This only works when the input is a container - what if I just want to convert one byte? Here's a suggestion to refactor the code into a main function that can hide one byte (hex => ascii and ascii => hex) and then provide additional functions to use that main function to cover containers of bytes, etc.
In asciihex () bad things will happen if the size of the input container is not divisible by 2. Usage:
it != in.end(); it += 2
is dangerous, because if the size of the container is not divisible by 2, then an increment of two will cause the iterator to go past the end of the container, and comparison with end () will never work. This is somewhat shielded from calling assert, but the assertion can be compiled (for example, it is often compiled in release builds), so it would be much better to make this if statement.
source to share
The problems I see:
hexascii does not check if sizeof(T2::value_type)==1
hexascii dereferences it
twice, asciihex even more. There is no reason for this, as you can save the result. This means that you cannot use istream_iterator.
asciihex needs a random iterator as input because (it + 1) and (it + = 2) are used. The algorithm can work on a cross-iterator if you only use (++ it).
(*it > '9' ? *it - 0x07 : *it) - 0x30
can be simplified to *it - (*it > '9' ? 0x37 : 0x30)
so that there is only one unconditional subtraction left. However, finding an array will be more efficient. Subtract 0x30. '0' will become 0; 'A' will become 0x11 and 'a' will become 0x31. Mask with 0x1f to make it case insensitive and you can perform the resulting lookup in char [0x20] without the risk of overflow. Non-hex characters will just give you weird meanings.
source to share