Corrupted data using UTF-8 and mb_substr

Question

Corrupted data using UTF-8 and mb_substr

I am getting data from MySQL db, varchar (255) utf8_general_ci and trying to write text to PDF from PHP. I need to determine the length of a string in a PDF in order to limit the text output in a table. But I noticed that the output is mb_substr

/ is substr

really strange.

For example:

mb_internal_encoding("UTF-8");

$_tmpStr = $vfrow['title'];
$_tmpStrLen = mb_strlen($vfrow['title']);
for($i=$_tmpStrLen; $i >= 0; $i--){
     file_put_contents('cutoffattributes.txt',$vfrow['field']." ".$_tmpStr."\n",FILE_APPEND);
     file_put_contents('cutoffattributes.txt',$vfrow['field']." ".mb_substr($_tmpStr, 0, $i)."\n",FILE_APPEND);
}

outputs this:

screen shot from npp

npp file link

Database:

enter image description here

My question is, where did the additional character come from?

+3

php utf-8 mbstring

b3wii Apr 22 15 at 16:08

source to share

3 answers

The extra character is the first part of a two-byte UTF-8 sequence. You may have problems with the internal encoding of multibyte string functions. Your code treats the text as a fixed, 1 byte encoding. ń in UTF-8, hex C5 84, treated as Ĺ " in CP-1250 and Ĺ_[IND] in ISO-8859-2, two characters.

Try this at the top of your script:

mb_internal_encoding("UTF-8");

http://php.net/manual/en/function.mb-internal-encoding.php

+1

Michas Apr 22 15 at 18:07

source to share

Besides the table and field being set to UTF-8, you need to set mysqli_set_charset ('UTF-8') to UTF-8 (if you are using mysqli).

And have you tried?

$_tmpStr = utf8_encode( $vfrow['title'] );

0

Izzy Apr 22 15 at 16:50

source to share

deceze · Accepted Answer · 2015-04-22T18:43:23+0000

You need to make sure that you are actually getting data from the database in UTF-8 encoding by properly setting the connection encoding. It depends on your database adapter, see UTF-8 throughout for details .
You need to tell your functions mb_

that the data is in UTF-8 so that they can process it correctly. Either set this globally for all functions using mb_internal_encoding

, or pass a parameter $encoding

to your function when you call it:
```
mb_substr($_tmpStr, 0, $i, 'UTF-8')

      

        
        
        
      

    
```

Corrupted data using UTF-8 and mb_substr

More articles: