I'm trying to use juniversalchardet to auto-detect encoding of a saved webpage, my first test use www.wikipedia.org, which uses UTF-8 encoding according to HTTP response header (this information is lost after being saved to disk)
This is my scala code in doing so:
val content = <...load Wikipedia.html from disk...>
val charsetD = new UniversalDetector(null)
charsetD.handleData(content, 0, content.length)
val charset = charsetD.getDetectedCharset
However regardless of what I load, the charset result is always 'null', is it because the juniversalchardet library is defective? Or I'm using it wrong?
problem solved, charsetD.handleData(content, 0, content.length) cannot handle a batch longer than 4096 bytes. Everything works after this function is used several times on chunks of data.