Separate Python conversation from email flow line

I want to separate replies and transitions from the flow of emails in conversations.

An example is this:

On July 31, 2013 at 5:15 pm, John Doe wrote:

> example email text
>
>
> *From:* Me [mailto:me@gmail.com]
> *Sent:* Thursday, May 31, 2012 3:54 PM
> *To:* John Doe
> *Subject:* RE: subject
>
> example email text
>
>> Dear David,
>> 
>> Greetings from Doha!
>> Kindly enlighten me. I am confused.
>> 
>> With regards,
>> Smith
>>
>>> Dear Smith,
>>>
>>> Happy New year!
>>> Love
>>>
>>>> Dear Mr Wong,
>>>> Greetings!
>>>> Yours,
>>>> O

      

The above example is cleanly composed, but the format is quite correct. Some emails contain multiple conversations.

I have tried https://github.com/zapier/email-reply-parser and other packages, but unfortunately they cannot be put into production as performance is unstable.

The pattern is clear enough, the conversation can be divided by counting the number ">". My initial idea is to go through the whole document, find out how many ">" there are, and then extract each ">" "→" "→>" and "→ →" in each conversation.

I want to know if there is a better way out there?

Many thanks!

+3


source to share


1 answer


Here's one extremely simple solution with itertools.groupby

, assuming the email bodies do not contain '>'

:

In [165]: for _, v in itertools.groupby(text.splitlines(), key=lambda x: x.count('>')):
     ...:     print('\n'.join(v))
     ...:     print('-' * 20)
     ...:     

      

groupby

makes the count for you. For a more thorough solution, you'll need something line by line key=lambda x: len(re.match(r'\>+', x).group(0))

.



Output:

> example email text
>
>
> *From:* Me [mailto:me@gmail.com]
> *Sent:* Thursday, May 31, 2012 3:54 PM
> *To:* John Doe
> *Subject:* RE: subject
>
> example email text
>
--------------------
>> Dear David,
>> 
>> Greetings from Doha!
>> Kindly enlighten me. I am confused.
>> 
>> With regards,
>> Smith
>>
--------------------
>>> Dear Smith,
>>>
>>> Happy New year!
>>> Love
>>>
--------------------
>>>> Dear Mr Wong,
>>>> Greetings!
>>>> Yours,
>>>> O
--------------------

      

+3


source







All Articles