Get key-string knowing only XOR byte arrays and key size

I have a key of a known size, for example:

String key = "A B C"; // Unknown / This is what I need to guess in the end
int keySize = key.length(); // Known

      

I know that both the key and the texts only contain the following characters:

String AVAILABLE_CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ .,!?-"; // Known

      

I have some texts that have been XOR-ing the text with a key. The method encode

does the following: Checks if the UPPERCASE key and text are not null, not empty, and contain only valid characters, and then creates UTF-8 string and XOR byte arrays along with one byte []. (If the text is longer than the key, the key is repeated again.)

byte[][] encryptedTexts = new byte[5][];
// The original texts are Unknown, the encrypted byte-arrays are Known
encryptedTexts[0] = encode(key, "THIS IS A TEST");
encryptedTexts[1] = encode(key, "This is another test!"); // Note: encode first makes the String UPPERCASE, so this encrypts correctly.
encryptedTexts[2] = encode(key, "SOME OTHER RANDOM TEXT");
encryptedTexts[3] = encode(key, "AND LET SEE HOW THIS GOES"); // Should return null since ' in LET isn't valid
encryptedTexts[0] = encode(key, "OK, THAT WILL BE ENOUGH FOR NOW..");

      

After encoding, I have the following encrypted byte arrays ( Arrays.toString(byte_array)

):

ENCRYPTED TEXT 1: [21, 104, 11, 115, 99, 8, 115, 98, 97, 99, 21, 101, 17, 116]
ENCRYPTED TEXT 2: [21, 104, 11, 115, 99, 8, 115, 98, 97, 13, 14, 116, 10, 101, 17, 97, 116, 7, 115, 23, 96]
ENCRYPTED TEXT 3: [18, 111, 15, 101, 99, 14, 116, 10, 101, 17, 97, 114, 3, 110, 7, 14, 109, 98, 116, 6, 25, 116]
ENCRYPTED TEXT 4: null
ENCRYPTED TEXT 5: [14, 107, 110, 0, 23, 9, 97, 22, 0, 20, 8, 108, 14, 0, 1, 4, 0, 7, 110, 12, 20, 103, 10, 0, 5, 14, 114, 98, 110, 12, 22, 14, 108]

      

So now my question is, how can I get the key knowing only the ciphers and the key size?

Some thoughts:

Is it even possible to get the key when you only know the encrypted byte arrays (their number is unspecified) and the key size? And if so, what would be the best approach?

Some NOTES:

  • I don't need to decrypt the ciphertexts, my goal is to get the key-string.
  • If you are going to post sample code, please do it in Java, as this is the programming language I am working with.
  • This is just an assignment (not for school, but for Java cursus), so I'm not going to hack anything. (Although I will probably laugh at people using XOR encryption with the same key. XOR encryption should only be done with a truly random generated key the same size as text or larger, also known as a one-time "Keyed which is truly random, the result is a one-off panel that is indestructible even in theory. "[ source ].)

EDIT 1:

Ok, forget about the random generated plaintext, just assume I have a large English text that has been encrypted. If I know in advance that the text is English, I can use the Frequency Response Table . Therefore, I not only know the cipher texts and the key size, but also these letter frequencies. How can I use these additional frequencies to get the key. (Suppose I have an infinite amount of ciphertext for my use to recreate / guess the key using XOR decryption.)

+3


source to share


2 answers


You may only be interested in the key, but instead try to focus on getting one of the plaintext. This will of course trivially give a clue.

Start by matching pairs of plaintext at the same time (if they have different lengths, truncate the longest one). This removes the key and leaves you with pairs of English sentences (-fragments) grouped together.

Assuming unlimited ciphers, we can take a simple approach:

Take one ciphertext and compare it with 1000 other ciphertexts. Look at all positions where the 6th bit is 1 in about 90% of the pairs. These positions must have one of [.,!? -] in the first ciphertext with approximately 80% chance of being space. Let's assume this is a space and figure out what the equivalent key byte should be if true.



Repeat this for a bunch of other ciphertexts and you should be able to decide which of the [.,!?] Was actually a space (~ 80% will have the same key value at this point).

Here is a Java implementation. Usually several thousand messages are used to find the key:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.Random;

public class MultitimePad {
    private final static int SPACE_TEST_NUM = 10;
    private final static int SPACE_TEST_MIN = 8;
    private final static int KEY_GUESS_MIN = 10;
    private final static double KEY_GUESS_MIN_PERCENTAGE = 0.8;

    public static void main(String[] args) throws IOException {
        MultitimePad p = new MultitimePad();
        byte[] key = new byte[256];
        new Random().nextBytes(key);
        byte[][] messages = p.generate(key);
        byte[] solvedKey = p.solve(key.length, messages);
        if (compareBytes(key, solvedKey)) {
            System.out.println("Success");
        } else {
            System.out.println("Failure");
        }
    }

    private byte[][] generate(byte[] key) throws IOException {
        byte[] data = Files.readAllBytes(Paths.get("src/ulysses.txt"));
        byte[] filteredData = new byte[data.length];
        int filteredDataLength = 0;
        for (int i = 0; i < data.length; i++) {
            byte p = data[i];
            if (p >= 'a' && p <= 'z') {
                filteredData[filteredDataLength] = (byte) (p - 'a' + 'A');
                filteredDataLength++;
            } else if (p >= 'A' && p <= 'Z') {
                filteredData[filteredDataLength] = p;
                filteredDataLength++;
            } else if (p == ' ' || p == '.' || p == ',' || p == '!' || p == '?' || p == '-') {
                filteredData[filteredDataLength] = p;
                filteredDataLength++;
            }
        }
        int numMessages = filteredDataLength / key.length;
        byte[][] messages = new byte[numMessages][];
        for (int i = 0; i < numMessages; i++) {
            messages[i] = new byte[key.length];
            for (int j = 0; j < key.length; j++) {
                byte p = filteredData[i * key.length + j];
                messages[i][j] = (byte) (p ^ key[j]);
            }
        }
        return messages;
    }

    private static boolean compareBytes(byte[] b1, byte[] b2) {
        if (b1.length != b2.length) {
            return false;
        }
        for (int i = 0; i < b1.length; i++) {
            if (b1[i] != b2[i]) {
                return false;
            }
        }
        return true;
    }

    private byte[] solve(int length, byte[][] messages) {
        byte[] key = new byte[length];
        for (int i = 0; i < length; i++) {
            key[i] = solvePosition(i, messages);
        }
        return key;
    }

    private byte solvePosition(int pos, byte[][] messages) {
        int[] keyGuessCount = new int[256];
        int totalKeyGuess = 0;
        for (int i = 0; i < messages.length - SPACE_TEST_NUM; i++) {
            int success = 0;
            for (int j = 0; j < SPACE_TEST_NUM; j++) {
                if (((messages[i][pos] ^ messages[i + j][pos]) & ' ') != 0) {
                    success++;
                }
            }
            if (success >= SPACE_TEST_MIN) {
                int keyGuess = (messages[i][pos] ^ ' ') & 0xFF;
                keyGuessCount[keyGuess]++;
                totalKeyGuess++;
                if (keyGuessCount[keyGuess] >= KEY_GUESS_MIN && keyGuessCount[keyGuess] > totalKeyGuess *
                        KEY_GUESS_MIN_PERCENTAGE) {
                    System.out.println("Found " + pos + " using " + (i + 1 + SPACE_TEST_NUM) + " messages");
                    return (byte) keyGuess;
                }
            }
        }
        throw new IllegalArgumentException("Too few messages");
    }
}

      

+2


source


Since you only allow a subset of characters in keys and data, ciphertext wraps around information about both. Take a look at the binary representation of valid input:

           01000001 : A              01010001 : Q    
           01000010 : B              01010010 : R    
           01000011 : C              01010011 : S    
           01000100 : D              01010100 : T    
           01000101 : E              01010101 : U    
           01000110 : F              01010110 : V    
           01000111 : G              01010111 : W    
           01001000 : H              01011000 : X    
           01001001 : I              01011001 : Y                           
           01001010 : J              01011010 : Z                           
           01001011 : K          >   00100000 :     <  7th bit is 0
           01001100 : L          >   00101110 : .   <      ""
           01001101 : M          >   00101100 : ,   <      ""
           01001110 : N          >   00100001 : !   <      "" 
           01001111 : O          >   00111111 : ?   <      "" 
           01010000 : P          >   00101101 : -   <      "" 

      

Pay attention to the location of the bits. The pattern is that 6 legal characters have a 7th bit equal to 0, where the rest of the legal characters have this bit equal to 1.

Now consider the first encrypted string:

ENCRYPTED TEXT 1:        21       104        11       115        99         8  ...
Binary:            00010101  01101000  00001011  01110011  01100011  00001000  ...
                    ^         ^         ^         ^         ^         ^
  Bit 7             0         1         0         1         0         1

      



Notice how the encrypted data has a toggle bit at position 7. In the first byte, the 7th bit is set to 0, this can only happen if both keys and data have 0 in bit 7, or both keys and data have 1 in position 7.With this we can subtract that either:

  • the first char of the key, and the first char of the data is in the range [AZ]

or

  • the first char of the key, and the first char of the data included [0.,!? -]

This only shows the most obvious pattern, but the method can be applied to all bits, and if repetition can be used to build a statistical model of possible key and data values. If you have a duplicate key, you can leak enough in such a way that it is only the actual key and data that is possible.

+1


source







All Articles