Split byte object into n blocks of equal size based on index

I am working on writing a script to break an XOR (Vigenere) cipher with a duplicate key.

This involves defining some number (0 <n <possibly 50) and then splitting the byte object into n smaller blocks, where the first block contains (from the original object) indices n, 2n, 3n, the next contains n + 1, 2n + 1, 3n + 1 ... then n + y, 2n + y, 3n + y, where y <n.

If n = 3, bytes [0, 2, 5, 8, etc.] must be in one block, bytes [1,3,6,9] in the next block, and bytes [2,4,7,10] in the final block.

I could implement this easily with strings, but I don't know how to make it work with byte objects. I searched and found and adapted this code:

blocks = [ciphertext[i:i+most_likely_keylength] for i in range(0, len(ciphertext)+1, most_likely_keylength)]

transposedBlocks = list(zip_longest(*blocks, fillvalue=0))

##ciphertext is a bytes object resulting from the following line:
##ciphertext = base64.b64decode(open('Q6.txt', 'r').read())

      

This, however, returns a list of tuples filled with integers, and I don't know how to "concatenate" those integers again so that they are long blobs like they used to be. (So ​​that I can run something like Crypto.Util.strxor_c on each tuple.

Any help with this "string manipulation" for byte objects?

Note. I am working on calling a re-key XOR at cryptopals.com. I have looked at other people's solutions on Github, but they mostly use specialized crypto modules and I want to see the courage of what I am doing.

+3


source to share


1 answer


Conceptually, an object bytes

is a sequence of integers:

>>> tuple(b'ciphertext')
(99, 105, 112, 104, 101, 114, 116, 101, 120, 116)

      

... so its constructor will happily accept one thing:

>>> bytes((99, 105, 112, 104, 101, 114, 116, 101, 120, 116))
b'ciphertext'

      

Knowing that you can change your second line like this:

transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0))]

      

... and you get the objects bytes

back:

from itertools import zip_longest

ciphertext = b'ciphertext'
keylength = 3

blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext)+1, keylength)]
# [b'cip', b'her', b'tex', b't']

transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'chtt', b'iee\x00', b'prx\x00']

      



However, there is a bug in your code - because you use len(ciphertext)+1

, not just len(ciphertext)

in your call to range()

, you get the final blank bytes in blocks

if the ciphertext is an exact multiple keylength

:

ciphertext = b'SplitsEvenly'

blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext)+1, keylength)]
# [b'Spl', b'its', b'Eve', b'nly', b'']

      

..., which results in extra null bytes at the end of all elements in transposed

:

transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'SiEn\x00', b'ptvl\x00', b'lsey\x00']

      

If you cast +1

, it works correctly in both cases:

ciphertext = b'ciphertext'

blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext), keylength)]
# [b'cip', b'her', b'tex', b't']

transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'chtt', b'iee\x00', b'prx\x00']

      

ciphertext = b'SplitsEvenly'

blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext), keylength)]
# [b'Spl', b'its', b'Eve', b'nly']

transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'SiEn', b'ptvl', b'lsey']

      

+1


source







All Articles