Classic ASP, SQL Server and character encodings
I have a classic ASP page that gets POSTED. The data gets POSTED as UTF-8 (I can see this in Fiddler). Then I open an ADODB connection to the database and store the data in a VARCHAR field. If the data can be represented by 8859-1 (for example, iñtërnâtiônàlizætiøn), it is stored correctly in the varchar field. If I try strings that cannot be matched against 8859 (e.g. Hello!) I get ????????????!. This all makes sense, since a varchar field cannot contain unicode. I also understand that using an nvarchar field should allow me to store utf-8 strings.
My question is that. What parameters in SQL Server or in the ADODB object control how strings are converted from UTF-8 to 8859-1? VBScript (ASP) sends strings to ADODB.Connection.Execute as UTF-8 (or as I think it really does - UTF-16) and the database itself handles the conversion? Is this controlled by database mapping (SQL_Latin1_General_CP1_CI_AS in this case)?
If you switch to using NVARCHAR, you will instead need to remember the use of the N specifier in your SQL commands, for example whenever you use a Unicode string
INSERT INTO SOME_TABLE (someField) VALUES (N'Some Unicode Text')
SELECT * FROM SOME_TABLE WHERE someField=N'Some Unicode Text'
If you don't, strings will not be treated as Unicode and your data will silently be converted to Latin1 or any other default character set for the corresponding database / table / field, even if that field is NVARCHAR
source to share
You're right.
VBScript and ADODB only know strings as Unicode (or UTF-16 as it is sometimes called).
Its part of the database mapping settings, which determines the encoding order of VARCHAR fields.
This SQL_Latin1_General_CP1_CI_AS
is really the bit CP1
that determines CodePage
to use. In this case, 1
an obsolete reference to Windows-1252
, which is a superset of ISO-8859-1
.
source to share