How to fix ANSII character in SQL Server table for UTF-8
I have a data import process to import data from a csv file into a table in a SQL server.
I noticed that some columns contain some accented characters.
For example, I noticed the following text in the database table
CAFA ‰
I open a new file in Notepad ++, change the encoding to ANSI and save the file with the above text.
Then change the encoding to UTF-8
The result was:
A CAFE
I'm not sure what went wrong. But is there a way to fix this problem in the database table?
I would like to display the same CAFÉ in the database table instead of CAFÃ ‰
Because when this column is displayed on a website, even UTF- * encoding on web pages still shows the string as CAFÃ ‰ instead of CAFÉ.
I also checked the collation type of the column:
SQL_Latin1_General_CP1_CI_AS
Thank,
source to share
I found a solution to this problem by creating a mapping table between expected and valid characters at this website http://www.i18nqa.com/debug/utf8-debug.html
.Once I have a collation table, I join my original table where I like the real characters and replace those characters with the expected ones.
UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 3;
UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 2;
UPDATE rd
SET rd.Name = REPLACE(Name, m.Actual,m.Expected)
FROM RawData rd
INNER JOIN dbo.UtfMapping m ON rd.Name LIKE '%'+m.Actual+'%' and LEN(m.Actual) = 1;
source to share