Python public key log encryption using pycrypto

I am developing a web application (using gevent, but it is not essential) that needs to write some sensitive information in a log. The obvious idea is to encrypt sensitive information using the public key that is hardcoded in my application. A secret key is needed to read it, and 2048-bit RSA seems to be secure enough. I chose pycrypto (also tried M2Crypto but found no difference for my purpose) and implemented log encryption as a subclass logging.Formatter

. However, I'm new to pycrypto and cryptoraphy and I'm not sure if my choice of how to encrypt my data is reasonable. Is the PKCS1_OAEP

module what I need? Or are there friendlier ways to encrypt without dividing the data into small chunks?

So what I did:

import logging
import sys

from Crypto.Cipher import PKCS1_OAEP as pkcs1
from Crypto.PublicKey import RSA

PUBLIC_KEY = """ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDe2mtK03UhymB+SrIbJJUwCPhWNMl8/gA9d7jex0ciSuFfShDaqJ4wYWG4OOl\
VqKMxPrPcZ/PMSwtc021yI8TXfgewb65H/YQw4JzzGANq2+mFT8jWRDn+xUc6vcWnXIG3OPg5DvIipGQvIPNIUUP3qE7yDHnS5xdVdFrVe2bUUXmZJ9\
0xJpyqlTuRtIgfIfEQC9cggrdr1G50tXdXZjS0M1WXl5P6599oH/ykjpDFrCnh5fz9WDwUc0mNJ+11Qh+yfDp3k7AhzhRaROKLVWnfkklFaFm7LsdVX\
KPjp7dPRcTb84c2OnlIjU0ykL74Fy0K3eaPvM6TLe/K1XuD3933 pupkin@pupkin"""

PUBLIC_KEY = RSA.importKey(PUBLIC_KEY)

LOG_FORMAT = '[%(asctime)-15s - %(levelname)s: %(message)s]'

# May be more, but there is a limit.
# I suppose, the algorithm requires enough padding,
# and size of padding depends on key length.
MAX_MSG_LEN = 128

# Size of a block encoded with padding. For a 2048-bit key seems to be OK.
ENCODED_CHUNK_LEN = 256


def encode_msg(msg):
    res = []
    k = pkcs1.new(PUBLIC_KEY)
    for i in xrange(0, len(msg), MAX_MSG_LEN):
        v = k.encrypt(msg[i : i+MAX_MSG_LEN])
        # There are nicer ways to make a readable line from data than using hex. However, using
        # hex representation requires no extra code, so let it be hex.
        res.append(v.encode('hex'))
        assert len(v) == ENCODED_CHUNK_LEN
    return ''.join(res)


def decode_msg(msg, private_key):
    msg = msg.decode('hex')
    res = []
    k = pkcs1.new(private_key)
    for i in xrange(0, len(msg), ENCODED_CHUNK_LEN):
        res.append(k.decrypt(msg[i : i+ENCODED_CHUNK_LEN]))
    return ''.join(res)


class CryptoFormatter(logging.Formatter):
    NOT_SECRET = ('CRITICAL',)
    def format(self, record):
        """
        If needed, I may encode only certain types of messages.
        """
        try:
            msg = logging.Formatter.format(self, record)
            if not record.levelname in self.NOT_SECRET:
                msg = encode_msg(logging.Formatter.format(self, record))
            return msg
        except:
            import traceback
            return traceback.format_exc()


def decrypt_file(key_fname, data_fname):
    """
    The function decrypts logs and never runs on server. In fact,
    server does not have a private key at all. The only key owner
    is server admin.
    """
    res = ''
    with open(key_fname, 'r') as kf:
        pkey = RSA.importKey(kf.read())
    with open(data_fname, 'r') as f:
        for l in f:
            l = l.strip()
            if l:
                try:
                    res += decode_msg(l, pkey) + '\n'
                except Exception: # A line may be unencrypted
                    res += l + '\n'
    return res

# Unfortunately dictConfig() does not support altering formatter class.
# Anyway, in demo code I am not going to use dictConfig().


logger = logging.getLogger()
handler = logging.StreamHandler(sys.stderr)
handler.setFormatter(CryptoFormatter(LOG_FORMAT))
logger.handlers = []
logger.addHandler(handler)

logging.warning("This is secret")
logging.critical("This is not secret")

      

UPDATE: Thanks to the accepted answer below, I can now see:

  • My solution seems pretty up to date at the moment (very few log entries, no performance considerations, more or less reliable storage). As far as security is concerned, the best thing I can do right now is remember to prevent the user who is running my daemon from writing to files .py

    and .pyc

    programs. :-) However, if the user is hacked, they can still try to attach a debugger to my process. daemon, so I have to disable login for it as well. Quite obvious points, but very important.

  • Of course, there are solutions that are much more scalable. A very common method is to encrypt AES keys with slow but strong RSA, and to encrypt data with AES, which is pretty fast. Data encryption in case is symmetric, but getting the AES key requires either breaking RSA or getting it from memory when I run my program. Stream encryption with higher level libraries and the binary log file format is also the way to exit, although the binary log format encrypted as a stream should be very vulnerable to log corruption, even a sudden reboot due to a power outage can be a problem unless I doing some things at a lower level (at least rotating the log every time the daemon starts).

  • I changed .encode('hex')

    to .encode('base64').replace('\n').replace('\r')

    . Fortunately, the base64 codec works fine without end-of-line. This saves space.

  • Using an untrusted store may require signing records, but that seems to be a different story.

  • Checking that the string is encrypted based on catching exceptions is ok, because unless the log was hacked by an attacker, it is the base64 codec that throws the exception, not the RSA decryption.

+3


source to share


1 answer


You seem to be encrypting data directly using RSA. This is relatively slow and the problem is that you can only encrypt small portions of the data. Distinguishing encrypted data from plaintext based on "decryption does not work" is also not a very clean solution, although it will probably work. You are using OAEP, which is good. You can use base64 instead of hex to save space.

However, crypto is easy to go wrong. For this reason, you should always use high-level cryptographic libraries where possible. Anything you need to specify the padding schemas yourself is not "high-level". I'm not sure if you can create an effective string-based log encryption system without resorting to low-level libraries. Nevertheless,

Unless you have a reason to encrypt only portions of the log, consider encrypting the entire object.



If you are really desperate for linear encryption, then you can do the following: generate a random symmetric AES key from a secure source of randomness and give it a short but unique identifier. Encrypt this key with RSA and write the result to a log file on a tag-prefixed line, eg. "KEY" along with an identifier. For each log line, generate a random IV, encrypt the message from the AES256 in CBC mode using the IV specified (you have no length limit per line now!) And enter the key id, IV, and encrypted message into the log, with a tag prefix like "ENC ". After a while, destroy the symmetric key and repeat (generate a new one, write to the log). The disadvantage of this approach is that the attacker,who can recover the symmetric key from memory, can read messages encrypted with said key. The advantage is that you can use higher level building blocks and it is much faster (on my cpu you can encrypt 70,000 1KB log lines per second with AES-128, but only about 3500 chunks at a maximum of 256 bytes with RSA2048) ... RSA decryption is REALLY slow by the way (about 100 chunks per second).RSA decryption is REALLY slow by the way (about 100 chunks per second).RSA decryption is REALLY slow by the way (about 100 chunks per second).

Please note that you have no authentication, i.e. you won't notice changes in your logs. For this reason, I assume you trust the log store. Otherwise, see RFC 5848 .

+2


source







All Articles