HTML output via XSL transformation using special characters
I am having trouble converting certain characters from an XML feed to XHTML.
I am using the following example to demonstrate the problem.
Here is my XML file:
<?xml version="1.0" encoding="UTF-8"?>
<paragraph>some text including the –, ã and ’ characters</paragraph>
Here's the XSLT I'm using:
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html"
encoding="UTF-8"
indent="yes"
doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />
<xsl:template match="paragraph">
<html xmlns="http://www.w3.org/1999/xhtml">
<head></head>
<body>
<p><xsl:apply-templates/></p>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
Here is the XHTML output:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html>
<head></head>
<body>
<p>some text including the â€", ã and ’ characters</p>
</body>
</html>
The original XML characters are replaced with new ones.
First, I want to check if there is something wrong with my encoding that is causing this problem?
Should I be doing something with entities if I want to display special characters to display correctly in XHTML? If so, how can I use them in XSLT and need to know every possible value my XML feed might have in advance?
source to share
I agree with kdgregory: the output file looks like UTF-8, but the reader thinks it is something else (ISO-8859-1 or CP-1252 (Windows default)). Try adding a content type directly to your HTML header element:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
</head>
and see if that helps.
source to share