Chinese Character Encoding (UTF-8, GBK)

3k Views Asked by UnitedSince88 At 30 November 2025 at 00:13

I have a web crawler that is run on different websites (Chinese in this case).

Now when I retrieve the data and display it on my website, the Chinese characters all end up as garbage. Well I read about character encoding, And I found out that UTF-8 is generally the best encoding.

Now the problem is when I use UTF-8 - The data crawled from WEBSITE-1 are shown correctly but not for WEBSITE-2.

For WEBSITE-2, the character encoding gb18030 is working correctly.

My question is, is there a way to know the character encoding for a website so that I can build a generic solution ? I mean I can render a page on my local website knowing what character encoding to use. In this way I can code in the backend, and not really worry on the front end what encoding is required to open a page.

Right now I have two pages, 1 for UTF-8 chinese characters, and one for GB18030 chinese characters.

Original Q&A

There are 1 best solutions below

Edo Post On 14 January 2015 at 15:40

Use the html meta tag "Content-Type" for html < 5 or the meta tag "char-set" for html 5

W3schools charset

Chinese Character Encoding (UTF-8, GBK)

There are 1 best solutions below

Related Questions in C#

Related Questions in ASP.NET

Related Questions in UTF-8

Related Questions in GLOBALIZATION

Related Questions in GBK

Trending Questions

Popular # Hahtags

Popular Questions