Is it possible to search at the row level using java?

I am developing a java application that uses algorithms to import data from other sources into a database. And the application also looks for records in the database.

How can I implement row-level security such that the database doesn't even know the data is encrypted, but also so that it allows the database to be searched using queries invoked from java code?

I can use BouncyCastle to encrypt every field in every row of data before it is inserted into the database. But then how can I search for strings if each row and field in the database is encrypted separately? Is the answer as simple as encrypting each search parameter using the same keys before the search parameters are passed to SQL or JPA SELECT queries? Or is a more complex approach required?

I am using MySQL at the moment, but it would be nice if it were database vendor agnostic.

+3


source to share


1 answer


One of the most important properties of good encryption is that similar plaintext is encrypted into completely different ciphertext. About half the bits of the two ciphertexts will match. This property makes it difficult (impossible) to formulate any query that searches for substrings through LIKE

or determines whether there are more or less field values ​​than a given value.

There is another property and that is semantic security. When the same plaintext is encrypted under the same key, the ciphertexts generated must be different. This property makes it impossible for an attacker to gain some meta information about the plaintext blocks, but this property should be removed due to how the proposed solution works.

Take AES for example as the basic CBC mode encryption primitive. It blocks in size 16 bytes, so the ciphers will be short. If this is too much overhead, you should use Triple DES with three different keys (= 24-byte key for 168-bit security).

Simple example:

All table cells are encrypted using the same key. Now you want to query the table to get rows where one column has a specific value. First, you encrypt a value that must match the same key, and since we said there is no semantic security, the resulting ciphertext will exactly match the ciphertext in the table.

query("SELECT * FROM table WHERE col = '" + encrypt(x) + "';");

      

Then you iterate over the result set and decrypt each value. Warning: the request is not parameterized for simplicity. Use prepared statements to disable SQL injection.

Achieving non-semantic security:

ECB mode is a pillar for security and I would suggest using CBC mode with a static IV (all 0x00 bytes perhaps :) new byte[16];

. There are other ways of working that are also deterministic, but more on that later.

Limitations:

  • Not order by

  • No computation with values ​​directly in SQL
  • No <

    , >

    , <=

    , or >=

    in the where clause
  • ... (I can't think of it now)

Make it interesting:



There are several things you can do to improve security.

If you know ahead of time that you will never try to see if two columns have the same value, you can use a semi-randomized approach in which each column of each table assigns a different random initialization vector (IV). Thus, an attacker cannot try to match ciphers from one column with ciphers from another column in order to find similarities, in order to get some metadata about the plaintext.

If reducing the overhead is not a big deal, you can opt out of a deterministic authenticated encryption mode like SIV, but not CCM or GCM (not sure about EAX). It only has an authentication tag overhead (16 bytes for AES). Using it, you can always check if the ciphertext has been manipulated by someone, and you can check if the ciphertext value has been moved from another cell in the table, because you can just use the column name as the associated data. It is still difficult to determine if it has been moved in a column without severely affecting performance.

Fantasies about removing restrictions

Order-Preserving Encryption can be used to fix the 1st limitation above, but you compromise security because

Intuitively, he says that some attackers can learn half a bit of the plaintext given its ciphertext.

Source: How does preserving encryption work?

The second limitation can be avoided (and possibly others) if the SQL provides encryption functionality directly in SQL, but this is probably too slow to be used on a large scale.

Crypto with public key

You may have noticed that I only mentioned symmetric cryptogram. It is not necessary to use only symmetric crypto, but the problem with, for example, RSA is that the ciphertexts are huge (256 bytes for a 2048-bit key) compared to the small overhead for AES. Fingerprint for ECC based encryption is much better (e.g. ElGamal Encrypt).

Another nice thing about Public Key Crypto is that you can request all the data you want, but you cannot decrypt it without the private key. This way you can always put the data (using the public key), but only get the data using the private key.

+3


source







All Articles