Split byte object into n blocks of equal size based on index
I am working on writing a script to break an XOR (Vigenere) cipher with a duplicate key.
This involves defining some number (0 <n <possibly 50) and then splitting the byte object into n smaller blocks, where the first block contains (from the original object) indices n, 2n, 3n, the next contains n + 1, 2n + 1, 3n + 1 ... then n + y, 2n + y, 3n + y, where y <n.
If n = 3, bytes [0, 2, 5, 8, etc.] must be in one block, bytes [1,3,6,9] in the next block, and bytes [2,4,7,10] in the final block.
I could implement this easily with strings, but I don't know how to make it work with byte objects. I searched and found and adapted this code:
blocks = [ciphertext[i:i+most_likely_keylength] for i in range(0, len(ciphertext)+1, most_likely_keylength)]
transposedBlocks = list(zip_longest(*blocks, fillvalue=0))
##ciphertext is a bytes object resulting from the following line:
##ciphertext = base64.b64decode(open('Q6.txt', 'r').read())
This, however, returns a list of tuples filled with integers, and I don't know how to "concatenate" those integers again so that they are long blobs like they used to be. (So that I can run something like Crypto.Util.strxor_c on each tuple.
Any help with this "string manipulation" for byte objects?
Note. I am working on calling a re-key XOR at cryptopals.com. I have looked at other people's solutions on Github, but they mostly use specialized crypto modules and I want to see the courage of what I am doing.
source to share
Conceptually, an object bytes
is a sequence of integers:
>>> tuple(b'ciphertext')
(99, 105, 112, 104, 101, 114, 116, 101, 120, 116)
... so its constructor will happily accept one thing:
>>> bytes((99, 105, 112, 104, 101, 114, 116, 101, 120, 116))
b'ciphertext'
Knowing that you can change your second line like this:
transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0))]
... and you get the objects bytes
back:
from itertools import zip_longest
ciphertext = b'ciphertext'
keylength = 3
blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext)+1, keylength)]
# [b'cip', b'her', b'tex', b't']
transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'chtt', b'iee\x00', b'prx\x00']
However, there is a bug in your code - because you use len(ciphertext)+1
, not just len(ciphertext)
in your call to range()
, you get the final blank bytes in blocks
if the ciphertext is an exact multiple keylength
:
ciphertext = b'SplitsEvenly'
blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext)+1, keylength)]
# [b'Spl', b'its', b'Eve', b'nly', b'']
..., which results in extra null bytes at the end of all elements in transposed
:
transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'SiEn\x00', b'ptvl\x00', b'lsey\x00']
If you cast +1
, it works correctly in both cases:
ciphertext = b'ciphertext'
blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext), keylength)]
# [b'cip', b'her', b'tex', b't']
transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'chtt', b'iee\x00', b'prx\x00']
ciphertext = b'SplitsEvenly'
blocks = [ciphertext[i:i+keylength] for i in range(0, len(ciphertext), keylength)]
# [b'Spl', b'its', b'Eve', b'nly']
transposed = [bytes(t) for t in zip_longest(*blocks, fillvalue=0)]
# [b'SiEn', b'ptvl', b'lsey']
source to share