I am looking to get the pinyin of Simplified Mandarin characters, and have come across two packages:
- pinyin 0.4.0 which is 6 years old (GitHub repo here)
- pinyin_jyutping_sentence which is 2> years old. (GitHub repo here)
Both offer similar features in terms of the ability to print character pinyin with and without the diacritics, but I am curious if one is more efficient than the other.
Right off the bat, I noticed that on the first import pinyin_jyutping_sentence
that the package builds out a Prefix dict
:
import pinyin_jyutping_sentence as pnyn
Building prefix dict from Path\to\python\lib\site-packages\pinyin_jyutping_sentence\dict.txt.big ...
Dumping model to file cache Path\to\AppData\Local\Temp\jieba.ue5a383df573783d4e379d21ab891d92a.cache
Loading model cost 0.793 seconds.
Prefix dict has been built successfully.
Whereas running import pinyin
did not result in the creation of any kind of a dictionary.
Is there a difference between the two packages in speed and accuracy?
NOTE: Due to StackOverflow's rules about the inclusion of Mandarin characters, I was unable to include both the 294 character long mandarin string and 8-index long list of mandarin names I used to test this.
Because this seems to be an obscure question for which there are no questions/answers here on StackOverflow, I did some quick efficiency/accuracy analysis for each package using
timeit
anddatetime
.Here is the code:
With the following output:
Based on the output of the
timeit
anddatetime
modules,pinyin_jyutping_sentence
is much slower thanpinyin
. However, after examining the pinyin output of bothpinyin_jyutping_sentence
andpinyin
in relation to one another and the original mandarin characters,pinyin_jyutping_sentence
is far more accurate and readable.*pinyin
contained several errors in it's output of the 294 character long string, and on closer examination of the pinyin output of the list of names,pinyin
got the character tone wrong in several places, whereaspinyin_jyutping_sentence
got it right in (as far as I was able to identify) every case. I will update this answer if I find/test other mandarin characters to pinyin packages in python.*Interestingly,
pinyin_jyutping_sentence
converted numbers in the string into the number's corresponding pinyin.