What's the fastest way for bogus strings to send SMTP?

I am coding an email application that creates messages to send via SMTP. This means that I need to change all the lonely \ n and \ r characters to the canonical sequence that we all know and love. Here is the code I have now:

CRLF = '\r\n'
msg = re.sub(r'(?<!\r)\n', CRLF, msg)
msg = re.sub(r'\r(?!\n)', CRLF, msg)

      

The problem is, it's not very fast. On large messages (about 80 thousand), it takes about 30% of the time to send a message.

Can you do better? I am looking forward to your Python gymnastics.

+2


source to share


5 answers


This regex did the trick:

re.sub(r'\r\n|\r|\n', '\r\n', msg)

But this code turned out to be a win:



msg.replace('\r\n','\n').replace('\r','\n').replace('\n','\r\n')

The original regex took .6s to convert / usr / share / dict / words from \ n to \ r \ n, the new regex took .3s and replace () s took 0.08.

+2


source


Perhaps it's a fact that adding an extra character in the middle of the line kills him.

When you replace the text "hello \ r world", it should actually increase the size of the entire line by one character to "hello \ r \ n world".

I would suggest looping over a line and looking at the characters one by one. If it's not \ r or \ n, just add it on a new line. If it is \ r or \ n add a new line with correct values

Code in C # (converting to python should be trivial)

        string FixLineEndings(string input)
    {
        if (string.IsNullOrEmpty(input))
            return string.Empty;

        StringBuilder rv = new StringBuilder(input.Length);

        for(int i = 0; i < input.Length; i++)
        {
            char c = input[i];
            if (c != '\r' && c != '\n')
            {
                rv.Append(c);
            }
            else if (c == '\n')
            {
                rv.Append("\r\n");
            }
            else if (c == '\r')
            {
                if (i == input.Length - 1)
                {
                    rv.Append("\r\n"); //a \r at the end of the string
                }
                else if (input[i + 1] != '\n')
                {
                    rv.Append("\r\n");
                }

            }
        }

        return rv.ToString();
    }

      

It was interesting enough to write a sample program for testing. I used the regex mentioned in the other answer and the code for using regex is:



static readonly Regex _r1 = new Regex (@ "(?

I tried with a bunch of test cases. Outputs:

------------------------
Size: 1000 characters
All \ r
        String: 00: 00: 00.0038237
        Regex: 00: 00: 00.0047669
All \ r \ n
        String: 00: 00: 00.0001745
        Regex: 00: 00: 00.0009238
All \ n
        String: 00: 00: 00.0024014
        Regex: 00: 00: 00.0029281
No \ r or \ n
        String: 00: 00: 00.0000904
        Regex: 00: 00: 00.0000628
\ r at every 100th position and \ n at every 102th position
        String: 00: 00: 00.0002232
        Regex: 00: 00: 00.0001937
------------------------
Size: 10000 characters
All \ r
        String: 00: 00: 00.0010271
        Regex: 00: 00: 00.0096480
All \ r \ n
        String: 00: 00: 00.0006441
        Regex: 00: 00: 00.0038943
All \ n
        String: 00: 00: 00.0010618
        Regex: 00: 00: 00.0136604
No \ r or \ n
        String: 00: 00: 00.0006781
        Regex: 00: 00: 00.0001943
\ r at every 100th position and \ n at every 102th position
        String: 00: 00: 00.0006537
        Regex: 00: 00: 00.0005838

which show that the string replacement function is improved in cases where the number of \ r and \ n is large. For regular use, although the original regex approach is much faster (see the last set of test cases - the ones not available and with a few \ r and \ n)

This was of course coded in C # and not python, but I assume there will be runtime similarities across languages

+1


source


Replace them on the fly as you write the string to where it will be. If you are using a regex or whatever, you will be doing two passes: one to replace characters and then one to write it. Getting a new Stream class and wrapping it around everything you write is pretty efficient; that we are doing this with System.Net.Mail and that means I can use the same stream encoder to write to both files and network streams. I have to see some of your code to give you a really good way to do this. Also, keep in mind that the actual replacement will not really be faster, however the overall execution time will be reduced since you only do one pass instead of two (assuming you are actually writing the email output somewhere).

+1


source


You can start by pre-compiling the regular expressions, eg.

FIXCR = re.compile(r'\r(?!\n)')
FIXLN = re.compile(r'(?<!\r)\n')

      

Then use FIXCR.sub and FIXLN. Then you can try to combine the regular expressions into one, with | thingy, which should also help.

0


source


Something like that? Compile your regular expression.

CRLF = '\r\n'
cr_or_lf_regex = re.compile(r'(?:(?<!\r)\n)|(?:\r(?!\n))')

      

Then when you want to replace stuff use this:

cr_or_lf_regex.sub(CRLF, msg)

      

EDIT: . Since the above action is slower, let me take another hit.

last_chr = ''

def fix_crlf(input_chr):
    global last_chr
    if input_chr != '\r' and input_chr != '\n' and last_chr != '\r':
        result = input_chr
    else:
        if last_chr == '\r' and input_chr == '\n': result = '\r\n'
        elif last_chr != '\r' and input_chr == '\n': result = '\r\n'
        elif last_chr == '\r' and input_chr != '\n': result = '\r\n%s' % input_chr
        else: result = ''

    last_chr = input_chr
    return result

fixed_msg = ''.join([fix_crlf(c) for c in msg])

      

-1


source







All Articles