Morphological text analysis with Python using *.dic *.aff

2.3k Views Asked by At

I have 2 files in hunspell format(.dic and .aff) for Ukrainian language. My program has to get base form of the input word. So, it can use word form from .dic file and affices from .aff files. I don't know how to achieve this even with Hunspell util, but suppose it is possible.

Which python libraries can get base form of the word using .dic and .aff files?

2

There are 2 best solutions below

1
On BEST ANSWER

As said before hunspell is the library you require. Examples from https://code.google.com/p/pyhunspell/wiki/UsingPyHunspell:

import hunspell
hobj = hunspell.HunSpell('/usr/share/myspell/en_US.dic', '/usr/share/myspell/en_US.aff')
hobj.spell('spookie')
>>>>False

hobj.suggest('spookie')
>>>>['spookier', 'spookiness', 'spooky', 'spook', 'spoonbill']

hobj.spell('spooky')
>>>>True

hobj.analyze('linked')
>>>>[' st:link fl:D']
hobj.stem('linked')
>>>>['link']
0
On

Just an update to say that le pyhunspell project is no longer on googlecode. Here are the new links:

As for the add function (mentionned in comment of first answer), it is now documented in the pydoc.