Efficient binary to hexadecimal string conversion
I am writing a program that converts the hexadecimal representation of a binary value to a regular string. Therefore, each hexadecimal character is converted to two hexadecimal characters per string. This means that the result will be twice as large; hexadecimal representation of 1 byte will require two bytes per line.
Hexadecimal characters
0123456789 ;0x30 - 0x39
ABCDEF ;0x41 - 0x46
Example
0xF05C1E3A ;hex
4032568890 ;dec
will become
0x4630354331453341 ;hex
5057600944242766657 ;dec
Question?
Are there any neat / alternative (/ interesting) methods for converting between these states other than the lookup table (bitwise operations, shifts, modulation, etc.)? I am not looking for a function in a library, but rather how it should / should be implemented. Any ideas?
source to share
There is a solution here that has nothing but shifts and / or, and add / subtract. There are no cycles.
uint64_t x, m; x = 0xF05C1E3A; x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL); x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL); x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL); x += 0x0606060606060606LL; m = ((x & 0x1010101010101010LL) >> 4) + 0x7f7f7f7f7f7f7f7fLL; x += (m & 0x2a2a2a2a2a2a2a2aLL) | (~m & 0x3131313131313131LL);
Above was a simplified version with which I came up with a little time to ponder. Below is the original answer.
uint64_t x, m; x = 0xF05C1E3A; x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL); x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL); x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL); x += 0x3636363636363636LL; m = (x & 0x4040404040404040LL) >> 6; x += m; m = m ^ 0x0101010101010101LL; x -= (m << 2) | (m << 1);
See it in action: http://ideone.com/nMhJ2q
source to share
Spreading nuts in bytes is easy with pdep
:
spread = _pdep_u64(raw, 0x0F0F0F0F0F0F0F0F);
Now we need to add 0x30 to bytes in the range 0-9 and 0x41 to higher bytes. This can be done by SWAR-subtracting 10 from each byte and then using the sign to choose which number to add, for example (not tested)
H = 0x8080808080808080;
ten = 0x0A0A0A0A0A0A0A0A
cmp = ((spread | H) - (ten &~H)) ^ ((spread ^~ten) & H); // SWAR subtract
masks = ((cmp & H) >> 7) * 255;
// if x-10 is negative, take 0x30, else 0x41
add = (masks & 0x3030303030303030) | (~masks & 0x3737373737373737);
asString = spread + add;
The SWAR comparison can be optimized as it does not require a complete subtraction to implement.
There are several different suggestions here, including SIMD: http://wm.ite.pl/articles/convert-to-hex.html
source to share
A somewhat simpler version based on Mark Ransom's:
uint64_t x = 0xF05C1E3A; x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL); x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL); x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL); x = (x + 0x3030303030303030LL) + (((x + 0x0606060606060606LL) & 0x1010101010101010LL) >> 4) * 7;
And if you want to avoid multiplication:
uint64_t m, x = 0xF05C1E3A; x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL); x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL); x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL); m = (x + 0x0606060606060606LL) & 0x1010101010101010LL; x = (x + 0x3030303030303030LL) + (m >> 1) - (m >> 4);
source to share
A LUT (lookup table) C ++ variant. I haven't tested the resulting machine code, but I believe that any modern C ++ compiler can catch the idea and compile it well.
static const char nibble2hexChar[] { "0123456789ABCDEF" };
// 17B in total, because I'm lazy to init it per char
void byteToHex(std::ostream & out, const uint8_t value) {
out << nibble2hexChar[value>>4] << nibble2hexChar[value&0xF];
}
// this one is actually written more toward short+simple source, than performance
void dwordToHex(std::ostream & out, uint32_t value) {
int i = 8;
while (i--) {
out << nibble2hexChar[value>>28];
value <<= 4;
}
}
EDIT: For C code, you just need to switch from std::ostream
to some other inference means, unfortunately your question does not go into any detail on what you are actually trying to achieve and why you are not using the built- printf
in C family of functions.
For example, C like this could write to some output buffer char*
, converting an arbitrary number of bytes:
/**
* Writes hexadecimally formatted "n" bytes array "values" into "outputBuffer".
* Make sure there enough space in output buffer allocated, and add zero
* terminator yourself, if you plan to use it as C-string.
*
* @Returns: pointer after the last character written.
*/
char* dataToHex(char* outputBuffer, const size_t n, const unsigned char* values) {
for (size_t i = 0; i < n; ++i) {
*outputBuffer++ = nibble2hexChar[values[i]>>4];
*outputBuffer++ = nibble2hexChar[values[i]&0xF];
}
return outputBuffer;
}
And finally, I really helped someone in reviewing the code as it had a performance bottleneck with hexadecimal formatting, but I did a code conversion there without LUTs, and the whole process and another dimension of the answer + might be a tutorial for you as you can see that the fastest solution does not just blindly convert the result, but actually mixes with the main operation to achieve the best overall performance. So why I'm wondering what you are trying to solve, as the whole problem can often allow for a better solution if you just ask about conversion printf("%x",..)
- a safe bet.
Here's another approach to "hex" conversion: C ++ Swift XOR Function
source to share
Slightly more worthy integer to string conversion any base from 2 to digit length
char *reverse(char *);
const char digits[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *convert(long long number, char *buff, int base)
{
char *result = (buff == NULL || base > strlen(digits) || base < 2) ? NULL : buff;
char sign = 0;
if (number < 0)
{
sign = '-';
number = -number;
}
if (result != NULL)
{
do
{
*buff++ = digits[number % base];
number /= base;
} while (number);
if(sign) *buff++ = sign;
*buff = 0;
reverse(result);
}
return result;
}
char *reverse(char *str)
{
char tmp;
int len;
if (str != NULL)
{
len = strlen(str);
for (int i = 0; i < len / 2; i++)
{
tmp = *(str + i);
*(str + i) = *(str + len - i - 1);
*(str + len - i - 1) = tmp;
}
}
return str;
}
example - counting from -50 to 50 decimal places in base 23
-24 -23 -22 -21 -20 -1M -1L -1K -1J -1I -1H -1G -1F -1E -1D
-1C -1B -1A -19 -18 -17 -16 -15 -14 -13 -12 -11 -10 -M -L
-K -J -I -H -G -F -E -D -C -B -A -9 -8 -7 -6
-5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
A B C D E F G H I J K L M 10 11
12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 1G
1H 1I 1J 1K 1L 1M 20 21 22 23 24
source to share
-
Decimal -> Hex
Just repeat the string string and each character is converted to int
, then you can do
printf("%02x", c);
or use sprintf
to store in another variable
-
Hex -> Decimal
code
printf("%c",16 * hexToInt('F') + hexToInt('0'));
int hexToInt(char c)
{
if(c >= 'a' && c <= 'z')
c = c - ('a' - 'A');
int sum;
sum = c / 16 - 3;
sum *= 10;
sum += c % 16;
return (sum > 9) ? sum - 1 : sum;
}
source to share
The articles below compare different ways to convert digits to string, hex numbers are not covered, but it seems not a big problem to switch from dec to hex
@EDIT Thanks for pointing out that the answer above is irrelevant. The usual way without LUT is to split the integer by nibbles and map them to ASCII
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)
void int64_to_char(char carr[], int64_t val){
memcpy(carr, &val, 8);
}
uint64_t inp = 0xF05C1E3A;
char tmp_st[8];
int main()
{
int64_to_char(tmp_st,inp);
printf("Sample: %x\n", inp);
printf("Result: 0x");
for (unsigned int k = 8; k; k--){
char tmp_ch = *(tmp_st+k-1);
char hi_nib = HI_NIBBLE(tmp_ch);
char lo_nib = LO_NIBBLE(tmp_ch);
if (hi_nib || lo_nib){
printf("%c%c",hi_nib+((hi_nib>9)?55:48),lo_nib+((lo_nib>9)?55:48));
}
}
printf("\n");
return 0;
}
Another way is to use Alnison's Algorithm. I am a complete noob at ASM, so I am posting the code as I was looking for it.
Option 1:
ADD AL,90h
DAA
ADC AL,40h
DAA
Option 2:
CMP AL, 0Ah
SBB AL, 69h
DAS
source to share