What's the fastest way for bogus strings to send SMTP?
I am coding an email application that creates messages to send via SMTP. This means that I need to change all the lonely \ n and \ r characters to the canonical sequence that we all know and love. Here is the code I have now:
CRLF = '\r\n'
msg = re.sub(r'(?<!\r)\n', CRLF, msg)
msg = re.sub(r'\r(?!\n)', CRLF, msg)
The problem is, it's not very fast. On large messages (about 80 thousand), it takes about 30% of the time to send a message.
Can you do better? I am looking forward to your Python gymnastics.
source to share
This regex did the trick:
re.sub(r'\r\n|\r|\n', '\r\n', msg)
But this code turned out to be a win:
msg.replace('\r\n','\n').replace('\r','\n').replace('\n','\r\n')
The original regex took .6s to convert / usr / share / dict / words from \ n to \ r \ n, the new regex took .3s and replace () s took 0.08.
Perhaps it's a fact that adding an extra character in the middle of the line kills him.
When you replace the text "hello \ r world", it should actually increase the size of the entire line by one character to "hello \ r \ n world".
I would suggest looping over a line and looking at the characters one by one. If it's not \ r or \ n, just add it on a new line. If it is \ r or \ n add a new line with correct values
Code in C # (converting to python should be trivial)
string FixLineEndings(string input)
{
if (string.IsNullOrEmpty(input))
return string.Empty;
StringBuilder rv = new StringBuilder(input.Length);
for(int i = 0; i < input.Length; i++)
{
char c = input[i];
if (c != '\r' && c != '\n')
{
rv.Append(c);
}
else if (c == '\n')
{
rv.Append("\r\n");
}
else if (c == '\r')
{
if (i == input.Length - 1)
{
rv.Append("\r\n"); //a \r at the end of the string
}
else if (input[i + 1] != '\n')
{
rv.Append("\r\n");
}
}
}
return rv.ToString();
}
It was interesting enough to write a sample program for testing. I used the regex mentioned in the other answer and the code for using regex is:
static readonly Regex _r1 = new Regex (@ "(?
I tried with a bunch of test cases. Outputs:
------------------------ Size: 1000 characters All \ r String: 00: 00: 00.0038237 Regex: 00: 00: 00.0047669 All \ r \ n String: 00: 00: 00.0001745 Regex: 00: 00: 00.0009238 All \ n String: 00: 00: 00.0024014 Regex: 00: 00: 00.0029281 No \ r or \ n String: 00: 00: 00.0000904 Regex: 00: 00: 00.0000628 \ r at every 100th position and \ n at every 102th position String: 00: 00: 00.0002232 Regex: 00: 00: 00.0001937 ------------------------ Size: 10000 characters All \ r String: 00: 00: 00.0010271 Regex: 00: 00: 00.0096480 All \ r \ n String: 00: 00: 00.0006441 Regex: 00: 00: 00.0038943 All \ n String: 00: 00: 00.0010618 Regex: 00: 00: 00.0136604 No \ r or \ n String: 00: 00: 00.0006781 Regex: 00: 00: 00.0001943 \ r at every 100th position and \ n at every 102th position String: 00: 00: 00.0006537 Regex: 00: 00: 00.0005838
which show that the string replacement function is improved in cases where the number of \ r and \ n is large. For regular use, although the original regex approach is much faster (see the last set of test cases - the ones not available and with a few \ r and \ n)
This was of course coded in C # and not python, but I assume there will be runtime similarities across languages
source to share
Replace them on the fly as you write the string to where it will be. If you are using a regex or whatever, you will be doing two passes: one to replace characters and then one to write it. Getting a new Stream class and wrapping it around everything you write is pretty efficient; that we are doing this with System.Net.Mail and that means I can use the same stream encoder to write to both files and network streams. I have to see some of your code to give you a really good way to do this. Also, keep in mind that the actual replacement will not really be faster, however the overall execution time will be reduced since you only do one pass instead of two (assuming you are actually writing the email output somewhere).
source to share
Something like that? Compile your regular expression.
CRLF = '\r\n'
cr_or_lf_regex = re.compile(r'(?:(?<!\r)\n)|(?:\r(?!\n))')
Then when you want to replace stuff use this:
cr_or_lf_regex.sub(CRLF, msg)
EDIT: . Since the above action is slower, let me take another hit.
last_chr = ''
def fix_crlf(input_chr):
global last_chr
if input_chr != '\r' and input_chr != '\n' and last_chr != '\r':
result = input_chr
else:
if last_chr == '\r' and input_chr == '\n': result = '\r\n'
elif last_chr != '\r' and input_chr == '\n': result = '\r\n'
elif last_chr == '\r' and input_chr != '\n': result = '\r\n%s' % input_chr
else: result = ''
last_chr = input_chr
return result
fixed_msg = ''.join([fix_crlf(c) for c in msg])
source to share