Getting only last sent message from forwarded email using regex in php

I need to parse the email content of a forwarded message. I need to categorize email messages into two categories:

  • Primary content (last email)
  • Secondary content (All other emails)

This is my e-mail address:

---------- Forwarded message ----------
From: Khalil Ahmad <otheremail@gmail.com>
Date: Fri, May 12, 2017 at 10:27 AM
Subject: Re: Discussion
To: Hammad Rasheed <myemail@gmail.com>


ok no problem

On Fri, May 12, 2017 at 10:26 AM, Hammad Rasheed <myemail@gmail.com>
wrote:

> Great.
>
> I just want to check how this reply thing works
>
> On Fri, May 12, 2017 at 10:23 AM, Khalil Ahmad <otheremail@gmail.com> wrote:
>
>> yup fine
>>
>> On Fri, May 12, 2017 at 10:23 AM, Hammad Rasheed <myemail@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How are you doing?
>>>
>>
>>
>
>
> --
> ---------------------------------------------------------------
> Hammad Rasheed
> Ph: 0332-123456
> myemail@gmail.com <myemail@gmail.com>
> Connect with me on Linkedin:
> http://www.linkedin.com/in/xxxxxxx
>

-- 
---------------------------------------------------------------
Hammad Rasheed
Ph: 0332-12345852
myemail@gmail.com <myemail@gmail.com>
Connect with me on Linkedin:
http://www.linkedin.com/in/xxxxxxx

--001a114b102aa2bdb7054f4d014a

      

I was able to extract all forwarded messages using the following regex:

preg_match_all('~^>.*~m', $body, $secondary);

      

Now I need to return a message (the last one) that was sent ie "ok no problem" a message that does not start with the ">" character. But I can't seem to make a regex for this.

Can anyone please help?

+3


source to share


2 answers


The solution I propose matches any text between the first line starting with To:

, followed by something containing an electronic string like inside <...>

, and up to the first line starting with On

, followed by a space, some letters, spaces and numbers (date / time ). This can be strengthened, but the idea is the same:

if (preg_match('~^To:[^<>]*\s+<[^>@]+@[^><]+>\s*(.*?)\ROn [A-Za-z]+, [A-Za-z]+ \d+, \d+ at \d+:\d+~ms', $email, $m)) {
    echo $m[1];
}

      

See PHP demo



Pay attention to the modifiers /ms

. The modifier m

ensures that it ^

matches the beginning of the line, but s

ensures that it .

matches the line terminators.

Template details :

  • ^

    - beginning of line
  • To:

    - literal substring
  • [^<>]*

    - characters 0+ except <

    and >

    (add \n\r

    to stay on the same line if necessary)
  • \s+

    - 1 + spaces (replace with \h+

    to stay on one line, matching only horizontal space)
  • <[^>@]+@[^><]+>

    is an email-like substring in <...>

  • \s*

    - spaces 0+
  • (.*?)

    - Group 1: any 0+ characters as few as possible (since it is *?

    lazy and matches as few characters as possible that are needed to match correctly
  • \R

    - line break
  • On [A-Za-z]+, [A-Za-z]+ \d+, \d+ at \d+:\d+

    - literal substring On

    , space and datetime pattern ( [A-Za-z]+

    matches 1 + letters and \d+

    matches 1 + numbers).
+1


source


You can go for

(?P<primary>(?:(?!^>).*\R))+
(?P<secondary>(?:^>.*\R)+)

      



In mode verbose

and multiline

see demo at regex101.com .

+1


source







All Articles