I have a XML string encoded in big5:
atob('PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iYmlnNSIgPz48dGl0bGU+pKSk5TwvdGl0bGU+')
(<?xml version="1.0" encoding="big5" ?><title>中文</title> in UTF-8.)
I'd like to extract the content of <title>. How can I do that with pure Javascript in browsers? Better to have lightweight solutions without jquery or emscripten.
Have tried DOMParser:
(new DOMParser()).parseFromString(atob('PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iYmlnNSIgPz48dGl0bGU+pKSk5TwvdGl0bGU+'), 'text/xml')
But neither Chromium nor Firefox respects the encoding attribute. Is it a standard that DOMParser supports UTF-8 only?
I suspect the issue isn't
DOMParser, butatob, which can't properly decode what was originally a non-ascii string.*You will need to use another method to get at the original bytes, such as using https://github.com/danguer/blog-examples/blob/master/js/base64-binary.js
and then some method to convert the bytes (i.e. decode the big5 data) into a Javascript string. For Firefox / Chrome, you can use
TextDecoder:And then pass to
DOMParserYou can see this at https://plnkr.co/edit/TBspXlF2vNbNaKq8UxhW?p=preview
*One way of understanding why:
atobdoesn't take the encoding of the original string as a parameter, so while it must internally decode base64 encoded data to bytes, it has to make an assumption on what character encoding those bytes are to then give you a Javascript string of characters, which I believe is internally encoded as UTF-16.