Chinese Character Encoding (UTF-8, GBK)

3k Views Asked by At

I have a web crawler that is run on different websites (Chinese in this case).

Now when I retrieve the data and display it on my website, the Chinese characters all end up as garbage. Well I read about character encoding, And I found out that UTF-8 is generally the best encoding.

Now the problem is when I use UTF-8 - The data crawled from WEBSITE-1 are shown correctly but not for WEBSITE-2.

For WEBSITE-2, the character encoding gb18030 is working correctly.

My question is, is there a way to know the character encoding for a website so that I can build a generic solution ? I mean I can render a page on my local website knowing what character encoding to use. In this way I can code in the backend, and not really worry on the front end what encoding is required to open a page.

Right now I have two pages, 1 for UTF-8 chinese characters, and one for GB18030 chinese characters.

1

There are 1 best solutions below

0
On

Use the html meta tag "Content-Type" for html < 5 or the meta tag "char-set" for html 5

W3schools charset