Encoding to UTF-8 from PHP

I'm not that good with encoding, but I even fall off the basics here.

I am trying to create a file that is recognized as UTF-8

header("Content-Type: text/plain; charset=utf-8");
header("Content-disposition: attachment; filename=test.txt");
echo "test";
exit();

      

also tried

header("Content-Type: text/plain; charset=utf-8");
header("Content-disposition: attachment; filename=test.txt");
echo utf8_encode("test");
exit();

      

Then I open the file with Notepad ++ and it says its current encoding is ANSI and not UTF-8, which I am missing is how I should output this file.

I will eventually output the product XML file for the Affiliate Window program. Also if it helps my Centos web server, Apache2, PHP 5.2.8.

Thanks in advance for your help!

+2


source to share


6 answers


As Philip said, encoding is not an internal file attribute; This is implicit. This means that if you do not know what encoding the file should interpret, it is impossible to determine it. The best you can do is make a guess. These are probably programs like Notepad ++. Since the actual data you submitted can be interpreted in many different encodings, it just chooses the candidate he likes best. To Notepad ++, it looks like ANSI (which is a rather imprecise classification in itself), while other programs may default to something else.

The reason you need to specify the encoding in the HTTP header is precisely because the file itself does not contain this information, so the browser needs to be told about it. Once you have saved the file to disk, this information is thus not available.

If the file you intend to serve is an XML document, you have the option to put the encoding information inside the actual document. So it is saved after the file is saved to disk. For example. if you are using utf-8 you should put this at the top of your document:



<?xml version="1.0" encoding="utf-8" ?>

      

Note that in addition to getting meta information about the encoding in general, you also need to make sure that the data you are serving is actually utf-8 encoded. It's pretty much the same scenario: you need to know implicitly what the encoding of your data is. The function utf8_encode

(despite the name) is clearly intended to convert iso-8859-1 to utf-8. Thus, if you use it on already utf-8 encoded data, you will end up with double encoding with garbled data.

Wallets aren't that complicated on their own. The problem is, if you're not careful about keeping things straight, you'll ruin it. Whenever you have a string, you must be absolutely sure that you know what encoding it is in. Otherwise, it's not a string - it's just a bit of binary data.

+7


source


test

is all ASCII. So there is no need to use UTF-8 for this.



But actually the first 128 characters of Unicode are the same as ASCII. And UTF-8 uses the same code for these characters as ASCII. See Wikipedia's description of UTF-8 for more details.

+6


source


Once you download the file, it no longer carries any encoding information, so Notepad ++ has to guess from the content. There's a thing called Byte-Order-Mark, which allows you to prefix UTF encodings in the content.

See the question "When the specification is used, is it only in 16-bit Unicode text?" ...

I would guess I used something like echo "\xEF\xBB\xBF"

before writing the actual content will make Notepad ++ recognize the file correctly.

+5


source


There are no such things as headers for uploaded txt files. As you try to create XML files anyway and you can specify the encoding in the XML declaration, try to create a simple XML structure and save / open it, then it should work if the OS supports utf-8 which any modern should have Linux distribution.

+2


source


+1


source


0


source







All Articles