Gracefully convert from Unicode to single byte encoding

Question

Gracefully convert from Unicode to single byte encoding

My database (10gR2) is single byte (NLS_CHARACTERSET = WE8DEC).

I have a Unicode XML file that I would like to parse. If I read the file into a CLOB and try to convert it to XMLType, Oracle throttles when the XML contains special characters (in this case, Norwegian characters such as "øæå").

ORA-31011: XML parsing failed
ORA-19202: Error occurred in XML processing
LPX-00216: invalid character 184 (0xB8)

If I read the file into NCLOB, then explicitly convert it to CLOB with TO_CLOB, the XMLType constructor succeeds. However, this transformation leads to "ugly" results. For example,

bølle gjær

becomes

bÃ¿lle gjÃ¿r

Is there any way to convert from NCLOB with Unicode to single-byte CLOB and keep special characters? (I'm especially interested in the correct transformation of only the three Norse characters "øæå", the other special characters and characters are not that important in this case.)

+2

oracle plsql unicode xmltype

ObiWanKenobi 10 Sep 09 at 16:03

source to share

3 answers

Adam Hawkes · Answer 1 · 2009-09-10T18:50:09+0000

It is possible to re-encode those characters that do not fit into one byte using character references. This can be done by looking at the unicode value by placing it in a link. For example, it A

will look likeA

Mac · Answer 2 · 2009-09-11T15:06:44+0000

TO_CLOB

it is supposed to convert from national character set to database character set correctly. You won't have a problem if all the characters can be displayed.

Then I suspect that your problem occurs when reading the file in the NCLOB part. Unicode is rather vague information:

XML files are very often encoded in the UTF-8 character set (with or without Byte Order Sign ).
By default, the national character set is set to UTF-16 (AL16UTF16) on Oracle.

To move from one to the other requires a specific transformation. You must first make sure that the NCLOB containing your XML file contains the correct information.

Jean-philippe martin · Answer 3 · 2009-09-12T01:58:03+0000

I don't know the exact answer to your question, but this method may be helpful for you.

Here is the query I'm using to convert from a character set to another.

SELECT CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC')  
  FROM table

Try:

NE8ISO8859P10 ISO 8859-10 Northern Europe

NEE8ISO8859P4 ISO 8859-4 North and North East Europe

This page is a list of Oracle 8i NLS settings

Gracefully convert from Unicode to single byte encoding

More articles: