Bits interleaved from 16-bit value
I have a 16 bit value and its bits are interleaved.
I want to get an array of 8 elements (values 0 to 3) that store the bits in the following order:
- element 0: bits 7 and 15
- position 1: bits 6 and 14
- element 2: bits 5 and 13
- ...
- element 7: bits 0 and 8
This is a trivial solution:
function uninterlace(n) {
return [((n>>7)&1)|((n>>14)&2), // bits 7 and 15
((n>>6)&1)|((n>>13)&2), // bits 6 and 14
((n>>5)&1)|((n>>12)&2), // bits 5 and 13
((n>>4)&1)|((n>>11)&2), // bits 4 and 12
((n>>3)&1)|((n>>10)&2), // bits 3 and 11
((n>>2)&1)|((n>> 9)&2), // bits 2 and 10
((n>>1)&1)|((n>> 8)&2), // bits 1 and 9
((n>>0)&1)|((n>> 7)&2)];// bits 0 and 8
}
Does anyone know a better (faster) way to do this?
Edit:
Notes:
- Building a pre-calculated table is not an option.
- Cannot use assembler or cpu optimizations.
source to share
Ok now with 3 operations per element (tested and works).
This is a variation on Novolog's answer. It uses variable masks and slides.
function uninterlace(n) {
return [((n & 0x8080) + 0x3FFF) >> 14,
((n & 0x4040) + 0x1FFF) >> 13,
((n & 0x2020) + 0x0FFF) >> 12,
((n & 0x1010) + 0x07FF) >> 11,
((n & 0x0808) + 0x03FF) >> 10,
((n & 0x0404) + 0x01FF) >> 9,
((n & 0x0202) + 0x00FF) >> 8,
((n & 0x0101) + 0x007F) >> 7];
}
source to share
def uninterlace(n) {
mask = 0x0101 // 0b0000_0001_0000_0001
slide = 0x7f // 0b0111_1111
return [(((n >> 0) & mask) + slide) >> 7,
(((n >> 1) & mask) + slide) >> 7,
(((n >> 2) & mask) + slide) >> 7,
(((n >> 3) & mask) + slide) >> 7,
(((n >> 4) & mask) + slide) >> 7,
(((n >> 5) & mask) + slide) >> 7,
(((n >> 6) & mask) + slide) >> 7,
(((n >> 7) & mask) + slide) >> 7]
}
This is just four operations for each record instead of 5. The trick is to reuse the shifted value. Adding slide
moves the corresponding bits adjacent to each other, and a shift of 7 puts them in the low order position. Use +
can be weak.
The big weakness might be that each input must be done completely in sequence, creating a 4 instruction delay from entering the processor pipeline to leave it. They can be fully pipelined, but will still have some latency. The variant of the question provides some level of parallelism at the instruction level and could potentially have a latency of only 3 instructions per write, given sufficient execution resources.
It might be possible to combine multiple selections into fewer operations, but I haven't seen a way to do this yet. The weave really makes it hard.
Edit: A two-pass approach to compute low and high order bits symmetrically with 0 interleaving, shifting them one apart, and the result can be much faster and scaled to longer bitstrings.
Edited to correct comment slide
for comment in Pedro. Sorry to waste my time on my poor base conversion skills. It was originally one 0xef
that puts 0 bits in the wrong place.
source to share
How about a small pre-computed table of 128 records 2 times?
int[128] b1 = { 2, 3, 3, .. 3};
int[128] b0 = { 0, 1, 1, .. 1};
function uninterlace(n) {
return [(n & 0x8000) ? b1 : b0)[n & 0x80],
(n & 0x4000) ? b1 : b0)[n & 0x40],
(n & 0x2000) ? b1 : b0)[n & 0x20],
(n & 0x1000) ? b1 : b0)[n & 0x10],
(n & 0x0800) ? b1 : b0)[n & 0x08],
(n & 0x0400) ? b1 : b0)[n & 0x04],
(n & 0x0200) ? b1 : b0)[n & 0x02],
(n & 0x0100) ? b1 : b0)[n & 0x01]
];
}
This uses bit masking and table lookups instead of shifts and additions and may be faster.
source to share