I have a field in a table which holds XML entities for special characters, since the table is in latin-1.
E.g. "Hallöle slovenčina
" (the "ö" is in latin-1, but the "č" in "slovenčina" had to be converted to an entity by some application that stores the values into the database)
Now I need to export the table into a utf-8 encoded file by converting the XML entities to their original characters.
Is there a function in Oracle that might handle this for me, or do I really need to create a huge key/value map for that?
Any help is greatly appreciated.
EDIT: I found the function DBMS_XMLGEN.convert
, but it only works on <
,>
and &
. Not on &#NNN;
:-(
I believe the problem with dbms_xmlgen is that there are technically only five XML entities. Your example has a numeric HTML entity, which corresponds with Unicode:
http://theorem.ca/~mvcorks/cgi-bin/unicode.pl.cgi?start=0100&end=017F
Oracle has a function UNISTR which is helpful here:
I've converted 269 to its hex equivalent
010d
in the example above (in Unicode it isU+010D
). However, you could pass a decimal number and do a conversion like this:EDIT: The PL/SQL solution:
Here's an example I've rigged up for you. This should loop over and replace any occurrences for each row you select out of your table(s).
Notice that I've stubbed in calls to the UTL_FILE function to write NVARCHAR lines (Oracle's extended character set) to a file on the database server. The dbms_output, while great for debugging, doesn't seem to support extended characters, but this shouldn't be a problem if you use UTL_FILE to write to a file. Here's the DBMS_OUTPUT: