I want to count the number of words in a passage that contains both English and Chinese. For English, it's simple. Each word is a word. For Chinese, we count each character as a word. Therefore, 香港人 is three words here.
So for example, "I am a 香港人" should have a word count of 6.
Any idea how can I count it in Javascript/jQuery?
Thanks!
Try a regex like this:
For example,
"I am a 香港人".match(/[\u00ff-\uffff]|\S+/g)
gives:Then you can just check the length of the resulting array.
The
\u00ff-\uffff
part of the regex is a unicode character range; you probably want to narrow this down to just the characters you want to count as words. For example, CJK Unified would be\u4e00-\u9fcc
.