After reading through the Hunspell docs, I started looking at the seemingly most advanced instance of a set of Hunspell dictionary files, and it seems the Hungarian one (Hun-garian Spell) is the most robust.
I have a few questions that seem to be unanswered by the 17 page PDF docs (which are the only real resource on Hunspell it appears, other than the source code).
1. The meaning of the decimal numbers?
For example, the number 1547
. We see it here:
AF @ # 1547
And it is used in PFX but not SFX:
PFX r 0 legújra/1547 . 24583
PFX r 0 legújjá/1547 . 24584
PFX r 0 legössze/1547 . 24585
PFX r 0 legát/1547 . 24586
PFX r 0 legáltal/1547 . 24587
PFX r 0 legvégig/1547 . 24588
PFX r 0 legvégbe/1547 . 24589
...
The thing after the slash is a flag as far as I learned, but where is that flag defined? The line AF @ # 1547
has 1547 as a comment, so not sure. Looking further at AF
it appears the first line of AF 1548
means there are 1548 AF values that follow, and AF @
is the second to last one in the list, so maybe that's it?!
So then when does the @
symbol mean in regards to AF
, which is said to be:
Hunspell can substitute affix flag sets with ordinal numbers in affix rules (alias compression, see
makealias
tool).
I'm not following....
2. The meaning of the last decimal numbers on PFX
?
Like we have from above:
PFX r 0 legát/1547 . 24586
That is the only place 24586
appears in the .aff
file. So what does it mean? Same for all the numbers in that position. Line #24586 in the .dic
file doesn't seem related either:
lódenkabát/39 1
What do the /number
mean in the .dic
file?
Regarding that last example:
lódenkabát/39 1
What does /39
and the 1
mean? Where are those defined, I would have assumed to find a PFX 39
or SFX 39
defined in the .aff
file, but I don't seem to see that.
Learned more by looking at the tests around alias2.aff (and other alias2 files):
Files
alias2.aff:
alias2.dic:
alias2.good:
alias2.morph:
Explanation
Explaining the
AM
Stands for "morphological alias"?
So this is saying we are dealing with line numbers relative to when the
AM
andAF
start! That is crazy to me, so brittle. But anyways....That
1
is referring toAM morphological_fields
(from the docs). So it is marking this suffix asAM 1
which is the first AM:is:affix_x
. That corresponds to ouralias2.morph
file, where it shows:Notice the
is:affix_x
.Now,
foox
has more. This is because in the.dic
file, it says:That
3
is pointing to another AM, which is the last one.So that gives us all three of the AMs shown in the
alias2.morph
:Explaining the
AF
Stands for "affix flag".
The
/1
here in the.dic
references the AF position:And the
/2
in the.aff
does as well:So for the
y/2
, that is saying thaty
can come after suffixx
, since2
links toAF 2
which isAF A
, which is linking toSFX A
, which is thex
suffix.I'm a bit confused at
foo/1
, which is an alias tofoo/AB
, couldn't you just writefoo/A
and it knows to allowfoo/AB
because of they/2
definition? Orfoo/1
/foo/AB
must be sayingfoo/A and foo/B allowed
, butfoo/B
is only allowed afterfoo/A
, as per theSFX B
definition. That must be it.