Getting UTF-8 data from MySQL to Linux C ++ application

I'm having big problems displaying UTF-8 data extracted from MySQL to a Linux C ++ application. UTF text is displayed as question marks.

The app uses the MySQL C API. So I went through UTF-8 after mysql_init()

and before mysql_real_connect()

:

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, 'utf8');

      

and

mysql_options(&mysql,MYSQL_INIT_COMMAND, 'SET NAMES utf8');

      

But no luck. The test still displays as question marks. I ran some tests with a Perl script (I'm more familiar with it;)). And the text is displayed correctly if I specify the UTF-8 option for the connection:

$dbh->{'mysql_enable_utf8'} = 1;
$dbh->do('SET NAMES utf8');

      

Any idea how to properly display UTF-8 data in C ++?

+2


source to share


2 answers


It could be a simple typo. You write:

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, 'utf8');

      

Single quotes are for character literals, not strings. So change this to:

mysql_options(&mysql, MYSQL_SET_CHARSET_NAME, "utf8");

      



Also check the type mysql

. If it is MYSQL *

, write:

mysql_options(mysql, MYSQL_SET_CHARSET_NAME, "utf8");

      

The same applies to the line with MYSQL_INIT_COMMAND

.

+6


source


You don't need to tweak the character set parameters to get the desired result. They just help the DB to do sane things with sorting etc.

I suspect that you are indeed receiving your data in UTF-8 format, but simply are not processing it correctly. Passing UTF-8 around in C is the easiest thing in the world. Getting the correct printout can be more difficult, but of course this is not a MySQL issue.

Based on your tags for this post, I am assuming you are running this program on Linux. If so, you should just print it to the console (printf (), cout, whatever) to get the correct view, as Linux consoles are almost always UTF-8 by default nowadays. Check your LANG environment variable.



When working with Unicode, it can be helpful to write test programs that only receive a very small amount of non-ASCII data - it is best to use one character - print only that and redirect the output of that program to a file.Then view the file in a hex editor and compare with at least UCS-2LE to make sure you are seeing the wrong encoding.

I support MySQL ++ and I can say that MySQL ++ deals with UTF-8 quite naturally on Linux, but we don't play any games to get it to do so. I don't understand why straight C API code should behave as naturally. You can try building MySQL ++ on your system and running the examples as they include UTF-8 tests. Run resetdb to set the settings and then simple1 to show the UTF-8 data that resetdb puts into the test database. See the distribution's README-examples.txt for details.

I am not telling you to switch to MySQL ++, just using this as a well-known benchmark. Once you get it working, you can either modify these examples to work against your own DB to see if it breaks.

+3


source







All Articles