HTML looks good in browser, but not in email

I'm having problems with email encoding. I am reading an HTML file from disk and sending it via Gmail. When I open the HTML in a browser, it looks great. When I copy the HTML string from Visual Studio and save it as an HTML file, it looks great. When I receive an email, it contains a bunch of invalid characters. Even the bullets of the list are messed up! I'm pretty sure this is an encoding issue, but the file is encoded as UTF-8 and looks good until converted to RAW and sent via Gmail.

This is the process. We read from docx using the OpenXML SDK , then we use HtmlConverter to save the document as HTML. The HTML is later read from the file, converted to RAW formatting, and sent through the GMail API.

Here are some relevant code abbreviations:

Here we save our HTML file using the HtmlConverter.

HtmlConverterSettings settings = new HtmlConverterSettings()
{
    AdditionalCss = "body { margin: 1cm auto; max-width: 20cm; padding: 0; }",
    FabricateCssClasses = true,
    RestrictToSupportedLanguages = false,
    RestrictToSupportedNumberingFormats = false,
};

XElement htmlElement = HtmlConverter.ConvertToHtml( wdWordDocument, settings );
var html = new XDocument(
    new XDocumentType( "html", null, null, null ),
    htmlElement );

var htmlString = html.ToString( SaveOptions.DisableFormatting );
File.WriteAllText( destFileName.FullName, htmlString, Encoding.UTF8 );

      

Here we read the saved HTMl and convert it to send via Gmail. (We are using Mimekit for the conversion.)

// Create the message using MimeKit/System.Net.Mail.MailMessage
MailMessage msg = new MailMessage();
msg.Subject = strEmailSubject; // Subject
msg.From = new MailAddress( strUserEmail ); // Sender
msg.To.Add( new MailAddress( row.email ) ); // Recipient
msg.BodyEncoding = Encoding.UTF8;
msg.IsBodyHtml = true; 

// We need to loop through our HTML Document and replace the images with a CID so that they will display inline
var vHtmlDoc = new HtmlAgilityPack.HtmlDocument();
vHtmlDoc.Load( row.file ); // Read the body, from HTML file
...
msg.Body = vHtmlDoc.DocumentNode.OuterHtml;

// Convert our System.Net.Mail.MailMessage to RAW with Base64 encoding for Gmail
MimeMessage mimeMessage = MimeMessage.CreateFromMailMessage( msg );

Google.Apis.Gmail.v1.Data.Message message = new Google.Apis.Gmail.v1.Data.Message();
message.Raw = Base64UrlEncode( mimeMessage.ToString() );
var result = vGMailService.Users.Messages.Send( message, "me" ).Execute();

      

And this is how we base64 encode:

private static string Base64UrlEncode( string input )
{
var inputBytes = System.Text.Encoding.UTF8.GetBytes( input );
// Special "url-safe" base64 encode.
return Convert.ToBase64String( inputBytes )
                  .Replace( '+', '-' )
                  .Replace( '/', '_' )
                  .Replace( "=", "" );
}

      

The email ends as "Content-Type: multipart / mixed" with two alternatives. One of them -

Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

      

and the other is

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

      

Both text and HTML contain strings such as = C3 = A2 = E2 = 82 = AC = E2 = 84 = A2 for the apostrophe, and the HTML part contains the HTML heading containing strange "3D" characters.

<meta charset=3D"UTF-8"><title></title><meta name=3D"Generator"=
 content=3D"PowerTools for Open XML">

      

None of these oddities were in HTML prior to conversion to Base64 and submission.

Any ideas what the problem might be? Does this have anything to do with UTF8 and Mimekit?

+3


source to share


2 answers


This is how your code should look like to get the raw post data for use with the Google API:

using (var stream = new MemoryStream ()) {
    message.WriteTo (stream);

    var buffer = stream.ToArray ();
    var base64 = Convert.ToBase64String (buffer)
        .Replace( '+', '-' )
        .Replace( '/', '_' )
        .Replace( "=", "" );

    message.Raw = base64;
}

      

As brand927 pointed out, the content of the text / html mime part is encoded in quotes. This is the MIME encoding used for transport to make sure it is in the 7 bit ascii range.



You will need to decode this to get the original HTML.

With MimeKit, this is done for you if you use either mimeMessage.HtmlBody

, or if you draw MimeEntity

, representing the text / html part in TextPart

and access the property Text

.

0


source


The answer to your question is: no problem. It's just a raw encoded representation quoted-printable

. This is how Gmail also introduced it if you send and mail and look at its source.



0


source







All Articles