What is the character set if default_charset is empty

In PHP 5.6 onwards, the string is default_charset

set to "UTF-8"

as described eg. in the documentationphp.ini

. He says the line is empty for earlier versions.

Since I am creating a Java library to communicate with PHP, I need to know what values ​​I should expect when the string is treated as internal bytes. What happens if the string is default_charset

empty and the string (literal) contains characters outside the ASCII range? Should I expect the platform's default encoding, or the character encoding used for the source file?

+3


source to share


2 answers


Short answer

For literal strings, always the source file encoding. default_charset

The value does nothing here.

Longer answer

PHP strings are "binary", meaning they have no internal string encoding. Basically a string in PHP is just byte buffers.

For literal strings, eg. $s = "Γ„"

this means the string will contain any bytes stored in the file between the quotes. If the file was saved in UTF-8 it will be equivalent $s = "\xc3\x84"

, if the file was saved in ISO-8859-1 (latin1) it will be equivalent $s = "\xc4"

.

The setting value has default_charset

no effect on the bytes stored in the strings.

What does it do default_charset

??



Some functions, which must deal with strings as text and know the encoding, take $encoding

as an argument (usually optional). This is talking about a function that encodes text in a string.

Before the default PHP 5.6 parameter value for these optional arguments $encoding

was either in the function definition (for example htmlspecialchars()

), or configured in different php.ini settings for each extension separately (for example mbstring.internal_encoding

, iconv.input_encoding

).

PHP 5.6 introduced a new php.ini setting default_charset

. The old settings were deprecated, and all functions that take an optional argument $encoding

should now default to a value default_charset

when no encoding is explicitly specified.

However, it is the developer's responsibility to ensure that the text in the string is actually encoded in the encoding that was specified.


Links:

+6


source


It seems like you should n't rely on the internal encoding. The encoding of the internal symbol can be seen / set using mb_internal_encoding .

phpinfo () example

  • PHP version 5.5.9-1ubuntu4.5
  • default_charset no value

file1.php

<?php
$string = "e";
echo mb_internal_encoding(); //ISO-8859-1

      



file2.php

<?php
$string = "Γ‰";
echo mb_internal_encoding(); //ISO-8859-1

      

both files will output ISO-8859-1 unless you manually change the internal encoding.

<?php
echo bin2hex("ΓΆ"); //c3b6 (utf-8)

      

Receiving the hex character of that character returns UTF-8 encoding. If you save the file using UTF-8, the string in this example will be 2 bytes, even if the internal encoding is not set to UTF-8. Therefore, you must rely on the character encoding used for the source file.

+1


source







All Articles