Why is this cycle so slow in Keaton?

This code reorders the bits in the RGBA4 534x713 texture.

cpdef bytes toDDSrgba4(bytearray data):
    cdef bytes new_data = b''

    cdef int pixel
    cdef int red
    cdef int green
    cdef int blue
    cdef int alpha
    cdef int new_pixel
    cdef int i

    for i in range(len(data) // 2):
        pixel = int.from_bytes(data[2*i:2*i+2], "big")

        red = (pixel >> 12) & 0xF
        green = (pixel >> 8) & 0xF
        blue = (pixel >> 4) & 0xF
        alpha = pixel & 0xF

        new_pixel = (red << 8) | (green << 4) | blue | (alpha << 12)

        new_data += (new_pixel).to_bytes(2, "big")

    return new_data

      

This is as fast as the Python equivalent, which is:

def toDDSrgba4(data):
    new_data = b''

    for i in range(len(data) // 2):
        pixel = int.from_bytes(data[2*i:2*i+2], "big")

        red = (pixel >> 12) & 0xF
        green = (pixel >> 8) & 0xF
        blue = (pixel >> 4) & 0xF
        alpha = pixel & 0xF

        new_pixel = (red << 8) | (green << 4) | blue | (alpha << 12)

        new_data += (new_pixel).to_bytes(2, "big")

    return new_data

      

Both are very slow.

I have written a complicated code-swizzle really , who does not even optimized and tested it on this texture, and it is still waaay faster than this.

+3


source to share


2 answers


You add to the object bytes

with +=

. This is very slow as it has to copy the entire existing object bytes

every time.



Do not do this. One of the best options is to use bytearray

and only build the object bytes

from bytearray

the end.

+4


source


from_bytes and to_bytes are way too slow I guess. try instead of from_bytes:

pixel = (data[2*i]) << 8 | (data[2*i+1])

      



it is really faster than your code, i tested it. but for to_bytes, I can't imagine a swift version right now.

0


source







All Articles