Gracefully convert from Unicode to single byte encoding
My database (10gR2) is single byte (NLS_CHARACTERSET = WE8DEC).
I have a Unicode XML file that I would like to parse. If I read the file into a CLOB and try to convert it to XMLType, Oracle throttles when the XML contains special characters (in this case, Norwegian characters such as "øæå").
ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00216: invalid character 184 (0xB8)
If I read the file into NCLOB, then explicitly convert it to CLOB with TO_CLOB, the XMLType constructor succeeds. However, this transformation leads to "ugly" results. For example,
bølle gjær
becomes
bÿlle gjÿr
Is there any way to convert from NCLOB with Unicode to single-byte CLOB and keep special characters? (I'm especially interested in the correct transformation of only the three Norse characters "øæå", the other special characters and characters are not that important in this case.)
source to share
TO_CLOB
it is supposed to convert from national character set to database character set correctly. You won't have a problem if all the characters can be displayed.
Then I suspect that your problem occurs when reading the file in the NCLOB part. Unicode is rather vague information:
- XML files are very often encoded in the UTF-8 character set (with or without Byte Order Sign ).
- By default, the national character set is set to UTF-16 (AL16UTF16) on Oracle.
To move from one to the other requires a specific transformation. You must first make sure that the NCLOB containing your XML file contains the correct information.
source to share
I don't know the exact answer to your question, but this method may be helpful for you.
Here is the query I'm using to convert from a character set to another.
SELECT CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC')
FROM table
Try:
NE8ISO8859P10 ISO 8859-10 Northern Europe
NEE8ISO8859P4 ISO 8859-4 North and North East Europe
This page is a list of Oracle 8i NLS settings
source to share