What is the use of mkcls in giza++?
while running mkcls, giza++ generates four files *.vcb.classes and *.vcb.classes.cats for both source and target language.
The output of *.vcb.classes is:
. 9
book 10
gave 4
he 3
him 5
i 7
loved 8
read 8
the 2
What does this numbers refer to? Is it is word class numbers? If it is a word class number then how is it generated or how is it categorized into different classes, on what basis?
The 'mkcls' program groups words into equivalence classes. The output is used by GIZA++ for word alignment. See Franz Josef Och, An Efficient Method for Determining Bilingual Word Classes