How can I remove HTML and attachments from emails?

I am using the following program to sort and eventually print out emails. Some messages may contain attachments or HTML that is not suitable for printing. Is there an easy way to strip attachments and strip HTML but not HTML formatted text from posts?

#!/usr/bin/perl
use warnings;
use strict;
use Mail::Box::Manager;

open (MYFILE, '>>data.txt');
binmode(MYFILE, ':encoding(UTF-8)');


my $file = shift || $ENV{MAIL};
my $mgr = Mail::Box::Manager->new(
    access          => 'r',
);

my $folder = $mgr->open( folder => $file )
or die "$file: Unable to open: $!\n";

for my $msg ( sort { $a->timestamp <=> $b->timestamp } $folder->messages)
{
    my $to          = join( ', ', map { $_->format } $msg->to );
    my $from        = join( ', ', map { $_->format } $msg->from );
    my $date        = localtime( $msg->timestamp );
    my $subject     = $msg->subject;
    my $body        = $msg->decoded->string;

    # Strip all quoted text
    $body =~ s/^>.*$//msg;

    print MYFILE <<"";
From: $from
To: $to
Date: $date
Subject: $subject
\n
$body

}

      

+1


source to share


4 answers


Mail::Message::isMultipart

will tell you if the message has any attachments. Mail::Message::parts

will provide you with a list of parts of the mail.

Thus:



if ( $msg->isMultipart ) {
    foreach my $part ( $msg->parts ) {
        if ( $part->contentType eq 'text/html' ) {
           # deal with html here.
        }
        elsif ( $part->contentType eq 'text/plain' ) {
           # deal with text here.
        }
        else {
           # well?
        }
    }
}

      

+3


source


The stripping-HTML aspect is explained in FAQ # 9 (or in the first item from perldoc -q html

). In short, the respective modules are HTML :: Parser and HTML :: FormatText.



For attachments, email messages with attachments are sent as MIME. From this example, you can see that the format is simple enough that you can easily find a solution or learn the MIME modules in CPAN .

+1


source


It looks like someone has already solved this on the linuxquestions forum .

From the forum:

            # This is part of Mail::POP3Client to get the headers and body of the POP3 mail in question
            $body = $connection->HeadAndBody($i);
            # Parse the message with MIME::Parser, declare the body as an entitty
            $msg = $parser->parse_data($body);
            # Find out if this is a multipart MIME message or just a plaintext
            $num_parts=$msg->parts;
            # So its its got 0 parts i.e. is a plaintext
            if ($num_parts eq 0) {
            # Get the message by POP3Client
            $message = $connection->Body($i);
            # Use this series of regular expressions to verify that its ok for MySQL
            $message =~ s/</&lt;/g;
            $message =~ s/>/&gt;/g;
            $message =~ s/'//g;
                                  }
            else {
                  # If it is MIME the parse the first part (the plaintext) into a string
                 $message = $msg->parts(0)->bodyhandle->as_string;
                  }

      

0


source


0


source







All Articles