How to identify the tones in Chinese text?

142 Views Asked by At

Is there a programmatic way to identify the tones in Chinese text?

For an input string like 苹果 I need to extract the tones as 2 and 3 as it would be indicated in the pinyin transliteration píng guǒ or ping2 guo3.

I guess a possible workaround would be converting Chinese hanzi text to pinyin (e.g. with pinyin4j) and then extract the tones from pinyin, but I assume there must be a better and direct way to do it.

Context

The question is about if there is some algorithmic way to identify the tones or if the only way is a map lookup against an authoritative source e.g. the publicly available CEDICT database.

1

There are 1 best solutions below

3
On BEST ANSWER

I'm a native speaker, and I doubt that it's possible. Chinese character can have multiple tones depending on the context. The only reliable way to do this is to call some APIs with the full context.

Since you can't be sure what tone the character is just by judging it individually, there's no such "algorithm" to map them to their tones.

For instance, "一" can be tone 1, 2, 4, or neutral depending on the context.