And I'm just suffering from the question about python crawler.
First, the websites have two different hexadecimal of Chinese chracters. I can convert one of them (which is E4BDA0E5A5BD), the other one is C4E3BAC3 which I have no method to convert, or maybe I am missing some methods. The two hexadecimal values are '你好' in Chinese.
Second, I have found a website which can convert the hexadecimal, and to my surprise the answer is exactly what I cannot covert by myself.
The url is http://www.uol123.com/hantohex.html
Then I made a question: how to get the result which is in the text box (well I don't know what it is called exactly). I used firefox + httpfox to observe the post's data, and I find that the result which is converted by the website is in the Content, here is the pic:
And then I print the post, it has POST Data, and some headers, but no info about Content.
Third, then I google how to use ajax, and I really found a code about how to use ajax.
Here is the url http://outofmemory.cn/code-snippet/1885/python-moni-ajax-request-get-ajax-request-response But when I run this, it has an error which says "ValueError: No JSON object could be decoded."
And pardon that I am a newbie, so I cannot post images!!!
I am looking forward to your help sincerely.
Any help will be appreciated.
you're talking about different encodings for these chinese characters. there are at least three different widely used encodings
guobiao
(for mainland China),big5
(on Taiwan) andunicode
(everywhere else).here's how to convert your kanji into the different encodings:
You may check other available encodings here.
Ah, almost forgot. to convert from Unicode into the encoding you use
encode()
method. to convert back from the encoded contents of the web site you may usedecode()
method. just don't forget to specify the correct encoding.