Getting only last sent message from forwarded email using regex in php
I need to parse the email content of a forwarded message. I need to categorize email messages into two categories:
- Primary content (last email)
- Secondary content (All other emails)
This is my e-mail address:
---------- Forwarded message ----------
From: Khalil Ahmad <otheremail@gmail.com>
Date: Fri, May 12, 2017 at 10:27 AM
Subject: Re: Discussion
To: Hammad Rasheed <myemail@gmail.com>
ok no problem
On Fri, May 12, 2017 at 10:26 AM, Hammad Rasheed <myemail@gmail.com>
wrote:
> Great.
>
> I just want to check how this reply thing works
>
> On Fri, May 12, 2017 at 10:23 AM, Khalil Ahmad <otheremail@gmail.com> wrote:
>
>> yup fine
>>
>> On Fri, May 12, 2017 at 10:23 AM, Hammad Rasheed <myemail@gmail.com>
>> wrote:
>>
>>> Hi,
>>>
>>> How are you doing?
>>>
>>
>>
>
>
> --
> ---------------------------------------------------------------
> Hammad Rasheed
> Ph: 0332-123456
> myemail@gmail.com <myemail@gmail.com>
> Connect with me on Linkedin:
> http://www.linkedin.com/in/xxxxxxx
>
--
---------------------------------------------------------------
Hammad Rasheed
Ph: 0332-12345852
myemail@gmail.com <myemail@gmail.com>
Connect with me on Linkedin:
http://www.linkedin.com/in/xxxxxxx
--001a114b102aa2bdb7054f4d014a
I was able to extract all forwarded messages using the following regex:
preg_match_all('~^>.*~m', $body, $secondary);
Now I need to return a message (the last one) that was sent ie "ok no problem" a message that does not start with the ">" character. But I can't seem to make a regex for this.
Can anyone please help?
source to share
The solution I propose matches any text between the first line starting with To:
, followed by something containing an electronic string like inside <...>
, and up to the first line starting with On
, followed by a space, some letters, spaces and numbers (date / time ). This can be strengthened, but the idea is the same:
if (preg_match('~^To:[^<>]*\s+<[^>@]+@[^><]+>\s*(.*?)\ROn [A-Za-z]+, [A-Za-z]+ \d+, \d+ at \d+:\d+~ms', $email, $m)) {
echo $m[1];
}
See PHP demo
Pay attention to the modifiers /ms
. The modifier m
ensures that it ^
matches the beginning of the line, but s
ensures that it .
matches the line terminators.
Template details :
-
^
- beginning of line -
To:
- literal substring -
[^<>]*
- characters 0+ except<
and>
(add\n\r
to stay on the same line if necessary) -
\s+
- 1 + spaces (replace with\h+
to stay on one line, matching only horizontal space) -
<[^>@]+@[^><]+>
is an email-like substring in<...>
-
\s*
- spaces 0+ -
(.*?)
- Group 1: any 0+ characters as few as possible (since it is*?
lazy and matches as few characters as possible that are needed to match correctly -
\R
- line break -
On [A-Za-z]+, [A-Za-z]+ \d+, \d+ at \d+:\d+
- literal substringOn
, space and datetime pattern ([A-Za-z]+
matches 1 + letters and\d+
matches 1 + numbers).
source to share
You can go for
(?P<primary>(?:(?!^>).*\R))+
(?P<secondary>(?:^>.*\R)+)
In mode verbose
and multiline
see demo at regex101.com .
source to share