I'm using fugashi to extract words from sentences. How do I add new terms that are not in the fugacy dictionary to the dictionary?
For example, YouTube is divided into "You" and "Tube."
import fugashi
tagger = fugashi.Tagger()
nodes = tagger.parseToNodeList("ユーチューブ")
goodpos = ['名詞']
nodes = [nn.surface for nn in nodes if nn.feature.pos1 in goodpos]
=> ['ユー', 'チューブ']
I haven't gotten around to making a proper guide for this yet, but basically you should follow the MeCab docs, but you can use
fugashi-build-dictinstead ofmecab-dict-index.To give brief instructions, first you need to make a CSV file that uses the same format as your system dictionary. This is based on
unidic-lite.You can make this by copying entries from the UniDic source and editing fields. Then you run this command:
dicdiris the location of your system dictionary,mydic.csvis the csv file you made. This will create themydic.dicfile, which you can then use with fugashi by specifying-u mydic.dic.