PHP for MySql to CSV to Excel UTF-8

I know this has been discussed several times, but still I am crazy with this issue. I have a form with a submit.php action. At first I didn't change anything in encodings, I didn't use any utf8 header information. As a result, I was able to correctly read all ä, ö, ü etc inside the database. Now, exporting them to .csv

and importing them into Excel as UTF-8 charset

(I also checked all the others), an incorrect encoding is obtained.

Now I tried:

PHP:

header("Content-Type: text/html; charset=utf-8");
$mysqli->set_charset("utf8");

      

MySQL: I deleted my database and created a new one:

create database db CHARACTER SET utf8 COLLATE utf8_general_ci;
create table ...

      

I changed my.cnf and restarted my SQL server:

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

[mysql]
default-character-set=utf8

      

If I connect to my db via bash, I get the following output:

| character_set_client     | utf8                             |
| character_set_connection | utf8                             |
| character_set_database   | utf8                             |
| character_set_filesystem | binary                           |
| character_set_results    | utf8                             |
| character_set_server     | utf8                             |
| character_set_system     | utf8                             |
| character_sets_dir       | /usr/local/mysql/share/charsets/ |

      

Php test:

var_dump($mysqli->get_charset());

      

Giving me:

Current character set: utf8 object(stdClass)#3 (8) { ["charset"]=> string(4) "utf8"     ["collation"]=> string(15) "utf8_general_ci" ["dir"]=> string(0) "" ["min_length"]=> int(1) ["max_length"]=> int(3) ["number"]=> int(33) ["state"]=> int(1) ["comment"]=> string(13) "UTF-8 Unicode" }

      

Now I am using:

mysql -uroot -ppw db < require.sql > /tmp/test.csv

      

require.sql is just a

select * from table;

      

And again, I cannot import it as a csv into Excel, no matter if I choose UTF-8 or whatever. It always gives me crypto ...

Hopefully someone got a hint of what might go wrong here.

Greetings

E: TextMate is giving me the correct output, so it seems that the conversion actually worked, and is this a problem with Excel too? Using Microsoft Office 2011.

E2: Also tried the same stuff with latin1 - same problem, can't import special characters into excel without breaking them. Any hints or workarounds?

E3: I found a workaround that works with the Excel import function, but not double-clicking the .csv.

    iconv -f utf8 -t ISO-8859-1 test.csv > test_ISO.csv

      

I can now import csv to excel using Windows (ANSI). It is still annoying to have to use this function instead of double clicking. Also I don't understand why UTF8 doesn't work, even with import function added by BOM and full database in UTF8.

Separating with a comma turned out to be useless. 1. Concat_WS only works in part because it adds the silly concat_ws (..) header to the CSV file. Also the "test.csv file" doesn't give me a "comma". This means that even everything is separated by commas. Excel doesn't notice this using double click. 2. sed / awk: found some code snippets but they all separate the table very strongly. For example. colum street "street number" remained "street", "number" from which 2 columns were made and the table was screwed up.

So it seems to me that Excel can only open .csv with a double click, which is a) encoded with ISO-8859-1 (and only under windows, because the standard Mac encoding is Macintosh) b) A file having a comma attribute. This means that if I generate the .csv myself via the output itself

file test1.csv

      

will be

test1.csv: ISO-8859 text, with CRLF line terminators

      

while the modified iconv encoding with RegEx used to add commas would look like this:

test1.csv: ISO-8859 text

      

Quite strange behavior - maybe someone has a working solution.

+3


source to share


2 answers


Thanks everyone for the help, finally I managed to get a double click working file that opens separately and displays the email correctly. For those interested in a good workflow, we go:

1.) My database is fully UTF8. 2.) I am exporting a form to my database via php. I am using mysqli and as header information:

header("Content-Type: text/html; charset=ISO-8859");

      

I know this makes everything look crappy in the database, feel free to use utf8 to make it look correct, but it doesn't matter in my case.

3.) I wrote a script executed by the cron daemon that a) deletes the CSV files created earlier

rm -f path/to/csv ##I have 3 due to some renaming see below

      

b) creating a new csv using mysql (it's still UTF8)

mysql -hSERVERIP -uUSER -pPASS DBNAME -e "select * from DBTABLE;" > PATH/TO/output.csv

      

Now you have a tab separated by .csv and (if u exported from PHP to UTF8) it will display correctly in OpenOffice, etc., but not in Excel. Even import as UTF8 doesn't work.

c) Splitting SEMICOLON file (Excel standard, double clicking on comma separated file will not work, at least not with European version of Excel). I used a small python script semicolon.py:

import sys
import csv

tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, delimiter=";")
for row in tabin:
  commaout.writerow(row)

      



d) Now I had to call a script inside my cron sh file:

/usr/bin/python PATH/TO/semicolon.py < output.csv > output_semi.csv

      

Make sure you use the full path for each file if u is using the script as cron.

e) Change the encoding from UTF8 to ISO-8859-1 (Windows ANSI Excel standard) using the icon:

iconv -f utf8 -t ISO-8859-1 output_semi.csv > output_final.csv

      

What is it. csv opens with double click on Mac / Windows Excel 2010 (tested).

Perhaps this is help for someone with similar problems. It was driving me crazy.

Edit: for some servers, you don't need an icon because the output from the database is already ISO8859. You should check your csv after executing mysql command:

file output.csv

      

Use iconv only if encoding is not iso8859-1

0


source


This is how I store data taken from utf-8 mysql tables. You need to add a BOM first. Example:

<?php
$fp = fopen(dirname(__FILE__).'/'.$filename, 'wb'); 
fputs($fp, "\xEF\xBB\xBF"); 
fputcsv($fp, array($utfstr_1,$utfstr_2);
fclose($fp);

      

Make sure you also tell MySQL that you are going to use UTF-8

mysql_query("SET CHARACTER SET utf8"); 
mysql_query("SET NAMES utf8"); 

      



You need to complete this before you select any data.

Propaply won't be bad if you set the locale: setlocale(LC_ALL, "en_US.UTF-8");

Hope it helps.

0


source







All Articles