Bits interleaved from 16-bit value

I have a 16 bit value and its bits are interleaved.

I want to get an array of 8 elements (values ​​0 to 3) that store the bits in the following order:

  • element 0: bits 7 and 15
  • position 1: bits 6 and 14
  • element 2: bits 5 and 13
  • ...
  • element 7: bits 0 and 8

This is a trivial solution:

function uninterlace(n) {
  return [((n>>7)&1)|((n>>14)&2), // bits 7 and 15
          ((n>>6)&1)|((n>>13)&2), // bits 6 and 14
          ((n>>5)&1)|((n>>12)&2), // bits 5 and 13
          ((n>>4)&1)|((n>>11)&2), // bits 4 and 12
          ((n>>3)&1)|((n>>10)&2), // bits 3 and 11
          ((n>>2)&1)|((n>> 9)&2), // bits 2 and 10
          ((n>>1)&1)|((n>> 8)&2), // bits 1 and 9
          ((n>>0)&1)|((n>> 7)&2)];// bits 0 and 8
}

      

Does anyone know a better (faster) way to do this?

Edit:

Notes:

  • Building a pre-calculated table is not an option.
  • Cannot use assembler or cpu optimizations.
+2


source to share


4 answers


Ok now with 3 operations per element (tested and works).

This is a variation on Novolog's answer. It uses variable masks and slides.



function uninterlace(n) {     
     return [((n & 0x8080) + 0x3FFF) >> 14,
             ((n & 0x4040) + 0x1FFF) >> 13,
             ((n & 0x2020) + 0x0FFF) >> 12,
             ((n & 0x1010) + 0x07FF) >> 11,
             ((n & 0x0808) + 0x03FF) >> 10,
             ((n & 0x0404) + 0x01FF) >> 9,
             ((n & 0x0202) + 0x00FF) >> 8,
             ((n & 0x0101) + 0x007F) >> 7];
}

      

+1


source


Faster than a manual open loop? I doubt it.



The code can be made less repetitive by using for

-loop, but it won't make it run faster.

+1


source


def uninterlace(n) {
    mask = 0x0101 // 0b0000_0001_0000_0001
    slide = 0x7f  // 0b0111_1111
    return [(((n >> 0) & mask) + slide) >> 7,
            (((n >> 1) & mask) + slide) >> 7,
            (((n >> 2) & mask) + slide) >> 7,
            (((n >> 3) & mask) + slide) >> 7,
            (((n >> 4) & mask) + slide) >> 7,
            (((n >> 5) & mask) + slide) >> 7,
            (((n >> 6) & mask) + slide) >> 7,
            (((n >> 7) & mask) + slide) >> 7]
}

      

This is just four operations for each record instead of 5. The trick is to reuse the shifted value. Adding slide

moves the corresponding bits adjacent to each other, and a shift of 7 puts them in the low order position. Use +

can be weak.

The big weakness might be that each input must be done completely in sequence, creating a 4 instruction delay from entering the processor pipeline to leave it. They can be fully pipelined, but will still have some latency. The variant of the question provides some level of parallelism at the instruction level and could potentially have a latency of only 3 instructions per write, given sufficient execution resources.

It might be possible to combine multiple selections into fewer operations, but I haven't seen a way to do this yet. The weave really makes it hard.

Edit: A two-pass approach to compute low and high order bits symmetrically with 0 interleaving, shifting them one apart, and the result can be much faster and scaled to longer bitstrings.

Edited to correct comment slide

for comment in Pedro. Sorry to waste my time on my poor base conversion skills. It was originally one 0xef

that puts 0 bits in the wrong place.

+1


source


How about a small pre-computed table of 128 records 2 times?

int[128] b1 = { 2, 3, 3, .. 3};
int[128] b0 = { 0, 1, 1, .. 1};

function uninterlace(n) {
  return [(n & 0x8000) ? b1 : b0)[n & 0x80],
          (n & 0x4000) ? b1 : b0)[n & 0x40],
          (n & 0x2000) ? b1 : b0)[n & 0x20],
          (n & 0x1000) ? b1 : b0)[n & 0x10],
          (n & 0x0800) ? b1 : b0)[n & 0x08],
          (n & 0x0400) ? b1 : b0)[n & 0x04],
          (n & 0x0200) ? b1 : b0)[n & 0x02],
          (n & 0x0100) ? b1 : b0)[n & 0x01]
         ];
}

      

This uses bit masking and table lookups instead of shifts and additions and may be faster.

0


source







All Articles