How do I create a Persian .txt file and then blow it up?
I have a lot of Persian text and I want to blow it up, I save the text in file.txt
. (So I have file.text containing Persian text). Now my problem is encoding. When I save the text to file.text
, it gives me an error:
This file contains Unicode characters that will be lost if you save this file as an ANSI-encoded text file. To keep the information in Unicode, click Undo below and then select one of the Unicode options from the Encoding drop-down list. Proceed?
I continue. Now when I open file.text
all the symbols are fine and when they are exploded all the symbols are crashing.
Note: when I put text into php variable everything is fine, actually my problem is with file.text.
What should I do?
My code: (for bang)
$text=file_get_contents('file.txt');
$var = explode("\n", $text);
foreach ($var as $sentence) {
echo $sentence.'<br>'; // or save into databse
}
source to share
Be sure to save the text file in UTF-8 encoding. (Use UTF-8 for HTML output and database connection to match.)
If you save the file as an encoding, which Microsoft misleads as "Unicode", you end up with UTF-16LE, a double-byte, non-ASCII-compatible encoding, which is generally a bad idea.
PHP baseline string operations such as explode
work on a byte basis, so if you split UTF-16 by one byte \n
, you end up splitting the double-byte character in the middle and messing up the byte order of the next line (and each alternate string).
Use a decent text editor that gives you the ability to save as UTF-8 without BOM, because Notepad will provide you with UTF-8-faux-BOM at the beginning of the file, which means when you read it into PHP your first line (but none of the other lines) will have a U + FEFF byte order mark at the beginning of the line, which will result in wide output.
Prefer a text editor that saves to BOM-free-UTF-8 by default. Notepad's choice of ANSI, UTF-16LE and faux-BOM makes it a pretty terrible choice for editing on the web.
source to share