Emails sent from mobile devices are strangely decoded using email lib
I am using the imaplib and Python email modules to grab a list of emails from smtp and then do something with them. This is the snippet I am using to capture and decode email messages:
import imaplib
import email
# Connect to server
box = imaplib.IMAP4(CSMTP_SERVER)
box.login(CSMTP_USERNAME, CSMTP_PASSWORD)
# List inbox
box.select('INBOX')
# Retrieve email list ID matching search patterns
# Return from search is this:
# ('OK', ['1 2 3 4 5 6 7 8 9 10 11 12 13 14'])
data = box.search(None, 'ALL')[1]
for num in data[0].split():
# Retrieve message headers and body
headers = email.message_from_string(box.fetch(num, '(RFC822)')[1][0][1])
body = headers.get_payload()
if not isinstance(body, str):
body = headers.get_payload()[0].get_payload()
print headers, body
It works like a charm when an email is sent from Hotmail or Gmail, but whenever an email is sent, for example from the default Android mail APP, the message will look like this:
=?utf-8?B?RndkOiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z? U2VudCBmcm9tIG15IEhUQwoKLS0tLS0gRm9yd2FyZGVkIG1lc3NhZ2UgLS0tLS0KRnJvbTogIkFs ZXhhbmRlciBBdnRhbnNraSIgPGFsZXhAYXZ0YW5za2kuY29tPgpUbzogIlBlam1hbiBNYWtoZmki IDxwakBtYWtoZmkuY29tPgpTdWJqZWN0OiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z CkRhdGU6IFdlZCwgU2VwIDEwLCAyMDE0IDk6MDYgUE0KCkhpIFBlam1hbiwKCkkgd2FzIHBsYXlp bmcgd2l0aCBDYXBzaGFyZSB0b2RheSBhbmQgZm91bmQgc29tZXRoaW5nIG1pc3NpbmcuIEkgZ3Vl c3MgeW91CmhhdmUgcGxhbnMgZm9yIGl0LCBidXQgaXQgZG9lc24ndCBodXJ0IHRvIG1lbnRpb24g aXQsIGp1c3Qgb24gY2FzZS4uLgoKV2hlbiBpbXBvcnRpbmcgcGhvdG9zLCBJIGhhdmUgdGhlIG9w dGlvbiB0byBlaXRoZXIgZ2V0IG9uZSBvZiB0aGUgaW1hZ2VzCnRoYXQgYXJlIGRvd25sb2FkZWQg b24gbXkgcGhvbmUsIG9yIHRvIHRha2UgYSBuZXcgcGljdHVyZS92aWRlby4gV2hhdCdzCm1pc3Np bmcgaXMgYWJpbGl0eSB0byBnZXQgcGhvdG9zIGZyb20gbXkwcyBJJ3ZlIHVzZWQgZG9uJ3Qg Y2FyZSB3aGVyZSB0aGUgcGhvdG8gaXMgbG9jYXRlZCBhbmQgYWxsCnBpY3R1cmVzIGFyZSBlcXVh bGx5IGFjY2Vzc2libGUgKG9yIG1heWJlIHRoaXMgYXBwbGllcyBvbmx5IHRvIEdvb2dsZQphcHBz PykuCgpOb3QgaW1wb3J0YW50LCBubyBpZGVhIGlmIGl0IGlzIGp1c3QgYSBsaW5lIG9yIHR3byBm aXggb3Igc29tZXRoaW5nIG1vcmUKY29tcGxpY2F0ZWQuCgpUYWtlIGNhcmUsCgotIEFsZXgsIGJl dGEgdGVzdGVyLCBRQSB2b2x1bnRlZXIsIGFuZCBzZW5pb3IgcGVza3kgc3RpY2tsZXI=
When I received this message, I sent an email from my mobile device. I doubt this is something that needs to be done, it looks more like some email clients don't build the email headers correctly based on RFC822, but I need to fix this somehow and get every email.
I would appreciate how to deal with this. Thanks in advance.
source to share
This is a MIME message - it is not listed on RFC822, but rather on the new 2045-2047.
The vast majority of emails today use MIME in some way, so you should definitely support it.
Of particular importance for this message is RFC2047, which indicates Encoded-Word
. There is a nice good overview on wikipedia , which I will partially decipher:
Form: "=? Charset? Encoding? Encoded text? =".
encoding can be either "Q" for Q encoding, which is similar to quoted encoding, or "B" for base64 encoding.
So, for this particular post, you have Base64 ( B
) encoded utf-8
. The actual message starts right after B?
, not on the second line.
Here are some simple python codes to handle all this:
if body.startswith("=?"):
i1= body.index("?")
i2= body.index("?", i1+1)
i3= i2+2
encoding= body[i1+1:i2]
assert body[i2:i3]=="?B" #don't handle Q format, it not commonly used
body= base64.b64decode(body[i3+1:]).decode(encoding)
source to share
Strange encoding - base64
>>> import base64
>>> base64.decodestring('RndkOiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z?').decode('utf8')
u'Fwd: Capshare: importing from Photos'
>>> base64.decodestring('''U2VudCBmcm9tIG15IEhUQwoKLS0tLS0gRm9yd2FyZGVkIG1lc3NhZ2UgLS0tLS0KRnJvbTogIkFs
... ZXhhbmRlciBBdnRhbnNraSIgPGFsZXhAYXZ0YW5za2kuY29tPgpUbzogIlBlam1hbiBNYWtoZmki
... IDxwakBtYWtoZmkuY29tPgpTdWJqZWN0OiBDYXBzaGFyZTogaW1wb3J0aW5nIGZyb20gUGhvdG9z
... CkRhdGU6IFdlZCwgU2VwIDEwLCAyMDE0IDk6MDYgUE0KCkhpIFBlam1hbiwKCkkgd2FzIHBsYXlp
... bmcgd2l0aCBDYXBzaGFyZSB0b2RheSBhbmQgZm91bmQgc29tZXRoaW5nIG1pc3NpbmcuIEkgZ3Vl
... c3MgeW91CmhhdmUgcGxhbnMgZm9yIGl0LCBidXQgaXQgZG9lc24ndCBodXJ0IHRvIG1lbnRpb24g
... aXQsIGp1c3Qgb24gY2FzZS4uLgoKV2hlbiBpbXBvcnRpbmcgcGhvdG9zLCBJIGhhdmUgdGhlIG9w
... dGlvbiB0byBlaXRoZXIgZ2V0IG9uZSBvZiB0aGUgaW1hZ2VzCnRoYXQgYXJlIGRvd25sb2FkZWQg
... b24gbXkgcGhvbmUsIG9yIHRvIHRha2UgYSBuZXcgcGljdHVyZS92aWRlby4gV2hhdCdzCm1pc3Np
... bmcgaXMgYWJpbGl0eSB0byBnZXQgcGhvdG9zIGZyb20gbXkgYWxidW1zIG9yIGJhY2tlZC11cCBw
... aG90b3MgdGhhdAphcmUgbm90IHBoeXNpY2FsbHkgc3RvcmVkIG9uIHRoZSBkZXZpY2UgLSBmb3Ig
... ZXhhbXBsZSB0aG9zZSBvbiBHb29nbGUKZHJpdmUuIE1vcbmQgYWxsCnBpY3R1cmVzIGFyZSBlcXVh
... bGx5IGFjY2Vzc2libGUgKG9yIG1heWJlIHRoaXMgYXBwbGllcyBvbmx5IHRvIEdvb2dsZQphcHBz
... PykuCgpOb3QgaW1wb3J0YW50LCBubyBpZGVhIGlmIGl0IGlzIGp1c3QgYSBsaW5lIG9yIHR3byBm
... aXggb3Igc29tZXRoaW5nIG1vcmUKY29tcGxpY2F0ZWQuCgpUYWtlIGNhcmUsCgotIEFsZXgsIGJl
... dGEgdGVzdGVyLCBRQSB2b2x1bnRlZXIsIGFuZCBzZW5pb3IgcGVza3kgc3RpY2tsZXI=''').decode('utf8')
u'Sent from my HTC\n\n----- Forwarded message -----\n....
source to share