HTML output via XSL transformation using special characters

I am having trouble converting certain characters from an XML feed to XHTML.

I am using the following example to demonstrate the problem.

Here is my XML file:

<?xml version="1.0" encoding="UTF-8"?>
<paragraph>some text including the –, ã and ’ characters</paragraph>

      

Here's the XSLT I'm using:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" 
            encoding="UTF-8" 
            indent="yes"
            doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
            doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
    <xsl:template match="paragraph">
    <html xmlns="http://www.w3.org/1999/xhtml">
            <head></head>
            <body>
        <p><xsl:apply-templates/></p>
            </body>
        </html>
</xsl:template>
</xsl:stylesheet>

      

Here is the XHTML output:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
    <head></head>
    <body>
    <p>some text including the â€", ã and ’ characters</p>
    </body>
</html>

      

The original XML characters are replaced with new ones.

First, I want to check if there is something wrong with my encoding that is causing this problem?

Should I be doing something with entities if I want to display special characters to display correctly in XHTML? If so, how can I use them in XSLT and need to know every possible value my XML feed might have in advance?

+2


source to share


2 answers


I agree with kdgregory: the output file looks like UTF-8, but the reader thinks it is something else (ISO-8859-1 or CP-1252 (Windows default)). Try adding a content type directly to your HTML header element:

<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>

      



and see if that helps.

+10


source


This may sound silly, but are you sure the xml file is actually utf-8? It's one thing to put it in the prologue, but the file itself may use a different encoding.



0


source







All Articles