Lamp stack / user input and character encoding

Is there a one-stop solution to solve all character encoding problems? I always have problems somewhere along the line between user input, database storage and data retrieval (html forms. I want all my data and web pages to be utf-8 encoded, but it seems that I always get invalid utf is 8 characters.

I am not very good at coding character encoding, but since I started working with French characters, I always get problems. One of the other developers urlencodes everything before submitting it to the database and then urldecodes everything that makes me shake.

As I understand it, the html form will accept any characters depending on the user environment, and up to the server side to try to convert it to UTF-8 or whatever?

Any additional information would be greatly appreciated!

+2


source to share


2 answers


Using UTF-8 throughout is the only solution. Unfortunately, it comes with an understanding of the problems that arise in practice. If you have a specific issue, please ask a specific question on SO.



Wrt. HTML Forms: No, it doesn't really depend on the user environment. The browser will (or should - really really) send data in the same encoding as the page on which the form was created. Make sure every HTML page you submit to the user has a charset = field in the HTTP Content-type header; for good measure, also put the http-equiv meta tag in the HTML file itself (which helps if the user is caching or saving the HTML page). So when the HTML page is in UTF-8, the data sent by the browser is also in UTF-8.

+1


source


In my projects, the first request that is sent to my database is

SET NAMES 'utf8';

Just after establishing a MySQL connection.



The same goes for data dumps. When I dump the database in the .sql file, I insert the above query first.

It has worked for me for several years now without issue on many hosting companies and dedicated servers.

0


source







All Articles