Efficient binary to hexadecimal string conversion

Question

Efficient binary to hexadecimal string conversion

I am writing a program that converts the hexadecimal representation of a binary value to a regular string. Therefore, each hexadecimal character is converted to two hexadecimal characters per string. This means that the result will be twice as large; hexadecimal representation of 1 byte will require two bytes per line.

Hexadecimal characters

0123456789                    ;0x30 - 0x39
ABCDEF                        ;0x41 - 0x46

Example

0xF05C1E3A                    ;hex
4032568890                    ;dec

will become

0x4630354331453341            ;hex
5057600944242766657           ;dec

Question?

Are there any neat / alternative (/ interesting) methods for converting between these states other than the lookup table (bitwise operations, shifts, modulation, etc.)? I am not looking for a function in a library, but rather how it should / should be implemented. Any ideas?

-2

c assembly windows 64bit

Will 09 Aug 17 at 18:53

source to share

7 replies

Spreading nuts in bytes is easy with pdep

:

spread = _pdep_u64(raw, 0x0F0F0F0F0F0F0F0F);

Now we need to add 0x30 to bytes in the range 0-9 and 0x41 to higher bytes. This can be done by SWAR-subtracting 10 from each byte and then using the sign to choose which number to add, for example (not tested)

H = 0x8080808080808080;
ten = 0x0A0A0A0A0A0A0A0A
cmp = ((spread | H) - (ten &~H)) ^ ((spread ^~ten) & H); // SWAR subtract
masks = ((cmp & H) >> 7) * 255;
// if x-10 is negative, take 0x30, else 0x41
add = (masks & 0x3030303030303030) | (~masks & 0x3737373737373737);
asString = spread + add;

The SWAR comparison can be optimized as it does not require a complete subtraction to implement.

There are several different suggestions here, including SIMD: http://wm.ite.pl/articles/convert-to-hex.html

+4

harold 09 Aug 17 at 19:01

source to share

A somewhat simpler version based on Mark Ransom's:

uint64_t x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8)  | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4)  | (x & 0x000f000f000f000fLL);
x =  (x + 0x3030303030303030LL) +
   (((x + 0x0606060606060606LL) & 0x1010101010101010LL) >> 4) * 7;

And if you want to avoid multiplication:

uint64_t m, x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8)  | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4)  | (x & 0x000f000f000f000fLL);
m =  (x + 0x0606060606060606LL) & 0x1010101010101010LL;
x =  (x + 0x3030303030303030LL) + (m >> 1) - (m >> 4);

+3

chqrlie 12 Aug 17 at 12:43

source to share

A LUT (lookup table) C ++ variant. I haven't tested the resulting machine code, but I believe that any modern C ++ compiler can catch the idea and compile it well.

static const char nibble2hexChar[] { "0123456789ABCDEF" };
     // 17B in total, because I'm lazy to init it per char

void byteToHex(std::ostream & out, const uint8_t value) {
    out << nibble2hexChar[value>>4] << nibble2hexChar[value&0xF];
}

// this one is actually written more toward short+simple source, than performance
void dwordToHex(std::ostream & out, uint32_t value) {
    int i = 8;
    while (i--) {
        out << nibble2hexChar[value>>28];
        value <<= 4;
    }
}

EDIT: For C code, you just need to switch from std::ostream

to some other inference means, unfortunately your question does not go into any detail on what you are actually trying to achieve and why you are not using the built- printf

in C family of functions.

For example, C like this could write to some output buffer char*

, converting an arbitrary number of bytes:

/**
 * Writes hexadecimally formatted "n" bytes array "values" into "outputBuffer".
 * Make sure there enough space in output buffer allocated, and add zero
 * terminator yourself, if you plan to use it as C-string.
 * 
 * @Returns: pointer after the last character written.
 */
char* dataToHex(char* outputBuffer, const size_t n, const unsigned char* values) {
    for (size_t i = 0; i < n; ++i) {
        *outputBuffer++ = nibble2hexChar[values[i]>>4];
        *outputBuffer++ = nibble2hexChar[values[i]&0xF];
    }
    return outputBuffer;
}

And finally, I really helped someone in reviewing the code as it had a performance bottleneck with hexadecimal formatting, but I did a code conversion there without LUTs, and the whole process and another dimension of the answer + might be a tutorial for you as you can see that the fastest solution does not just blindly convert the result, but actually mixes with the main operation to achieve the best overall performance. So why I'm wondering what you are trying to solve, as the whole problem can often allow for a better solution if you just ask about conversion printf("%x",..)

- a safe bet.

Here's another approach to "hex" conversion: C ++ Swift XOR Function

+2

Ped7g 09 Aug 17 at 19:45

source to share

Slightly more worthy integer to string conversion any base from 2 to digit length

char *reverse(char *);

const char digits[] = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
char *convert(long long number, char *buff, int base)
{
    char *result = (buff == NULL || base > strlen(digits) || base < 2) ? NULL : buff;
    char sign = 0;

    if (number < 0)
    {
         sign = '-';
        number = -number;
    }
    if (result != NULL)
    {
        do
        {
            *buff++ = digits[number % base];
            number /= base;
        } while (number);
        if(sign) *buff++ = sign;
        *buff = 0;
        reverse(result);
    }
    return result;
}


char *reverse(char *str)
{
    char tmp;
    int len;

    if (str != NULL)
    {
        len = strlen(str);
        for (int i = 0; i < len / 2; i++)
        {
            tmp = *(str + i);
            *(str + i) = *(str + len - i - 1);
            *(str + len - i - 1) = tmp;

        }
    }
    return str;
}

example - counting from -50 to 50 decimal places in base 23

-24     -23     -22     -21     -20     -1M     -1L     -1K     -1J     -1I     -1H     -1G     -1F     -1E     -1D
-1C     -1B     -1A     -19     -18     -17     -16     -15     -14     -13     -12     -11     -10     -M      -L
-K      -J      -I      -H      -G      -F      -E      -D      -C      -B      -A      -9      -8      -7      -6
-5      -4      -3      -2      -1      0       1       2       3       4       5       6       7       8       9
A       B       C       D       E       F       G       H       I       J       K       L       M       10      11
12      13      14      15      16      17      18      19      1A      1B      1C      1D      1E      1F      1G
1H      1I      1J      1K      1L      1M      20      21      22      23      24

+2

PeterJ_01 09 Aug 17 at 20:53

source to share

Decimal -> Hex

Just repeat the string string and each character is converted to int

, then you can do

printf("%02x", c);

or use sprintf

to store in another variable

Hex -> Decimal

code

printf("%c",16 * hexToInt('F') + hexToInt('0'));


int hexToInt(char c)
{
    if(c >= 'a' && c <= 'z')
        c = c - ('a' - 'A');

    int sum;

    sum = c / 16 - 3;
    sum *= 10;
    sum += c % 16;

    return (sum > 9) ? sum - 1 : sum;
}

+1

Filip Kočica 09 Aug 17 at 19:06

source to share

The articles below compare different ways to convert digits to string, hex numbers are not covered, but it seems not a big problem to switch from dec to hex

Whole numbers

Fixed and floating point

@EDIT Thanks for pointing out that the answer above is irrelevant. The usual way without LUT is to split the integer by nibbles and map them to ASCII

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#define HI_NIBBLE(b) (((b) >> 4) & 0x0F)
#define LO_NIBBLE(b) ((b) & 0x0F)

void int64_to_char(char carr[], int64_t val){
    memcpy(carr, &val, 8);
}

uint64_t inp = 0xF05C1E3A;
char tmp_st[8];

int main()
{
    int64_to_char(tmp_st,inp);
    printf("Sample: %x\n", inp);
    printf("Result: 0x");
    for (unsigned int k = 8; k; k--){
        char tmp_ch = *(tmp_st+k-1);
        char hi_nib = HI_NIBBLE(tmp_ch);
        char lo_nib = LO_NIBBLE(tmp_ch);
        if (hi_nib || lo_nib){
            printf("%c%c",hi_nib+((hi_nib>9)?55:48),lo_nib+((lo_nib>9)?55:48));
        }
     }
     printf("\n");
    return 0;
}

Another way is to use Alnison's Algorithm. I am a complete noob at ASM, so I am posting the code as I was looking for it.

Option 1:

ADD AL,90h
DAA
ADC AL,40h
DAA

Option 2:

CMP  AL, 0Ah
SBB  AL, 69h
DAS

+1

Gryphon 09 Aug 17 at 19:24

source to share

Mark Ransom · Accepted Answer · 2017-08-09T23:08:50+0000

There is a solution here that has nothing but shifts and / or, and add / subtract. There are no cycles.

uint64_t x, m;
x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8)  | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4)  | (x & 0x000f000f000f000fLL);
x += 0x0606060606060606LL;
m = ((x & 0x1010101010101010LL) >> 4) + 0x7f7f7f7f7f7f7f7fLL;
x += (m & 0x2a2a2a2a2a2a2a2aLL) | (~m & 0x3131313131313131LL);

Above was a simplified version with which I came up with a little time to ponder. Below is the original answer.

uint64_t x, m;
x = 0xF05C1E3A;
x = ((x & 0x00000000ffff0000LL) << 16) | (x & 0x000000000000ffffLL);
x = ((x & 0x0000ff000000ff00LL) << 8) | (x & 0x000000ff000000ffLL);
x = ((x & 0x00f000f000f000f0LL) << 4) | (x & 0x000f000f000f000fLL);
x += 0x3636363636363636LL;
m = (x & 0x4040404040404040LL) >> 6;
x += m;
m = m ^ 0x0101010101010101LL;
x -= (m << 2) | (m << 1);

See it in action: http://ideone.com/nMhJ2q

Efficient binary to hexadecimal string conversion

More articles: