maximum score in WordNet based similarity

1.7k Views Asked by At

some of simlarity score are ranged between 0 and 1 such as shortest path and WuP. therefore the similarity between car and automobile will be 1 but other measures such as LCh will be

lch( car, automobile ) = 3.6889

I want to know the maximum score for these measures. Is 3.6889 consider to be the maximum value? Is these means LCH score between 0 and 3.6889.

I addition the following measures

jcn( car, automobile ) = 12876699.5
res( car, automobile ) = 9.3679
lesk( car, automobile ) = 9519 
1

There are 1 best solutions below

3
On

It seems that 3.6375861597263857 is the maximum for lch_similarity (I can't get 3.6889...). lch_similarity, according to the documentation has the following properties:

Leacock Chodorow Similarity:
        Return a score denoting how similar two word senses are, based on the
        shortest path that connects the senses (as above) and the maximum depth
        of the taxonomy in which the senses occur. The relationship is given as
        -log(p/2d) where p is the shortest path length and d is the taxonomy
        depth.
...
:return: A score denoting the similarity of the two ``Synset`` objects,
            normally greater than 0. None is returned if no connecting path
            could be found. If a ``Synset`` is compared with itself, the
            maximum score is returned, which varies depending on the taxonomy
            depth.

Given that rock_hind.n.01 is at the deepest level (19) in the WordNet taxonomy and that change.n.06 is at the shallowest level (2), we can experiment with varying depths:

>>> from nltk.corpus import wordnet as wn
>>> rock = wn.synset('rock_hind.n.01')
>>> change = wn.synset('change.n.06')
>>> rock.lch_similarity(rock)
3.6375861597263857
>>> change.lch_similarity(change)
3.6375861597263857
>>> change.lch_similarity(rock)
0.7472144018302211
>>> rock.lch_similarity(change)
0.7472144018302211

Similar experiments can be made for the other measures, where the ranges seem quite a bit larger:

>>> from nltk.corpus import wordnet_ic, genesis
>>> brown_ic = wordnet_ic.ic('ic-brown.dat')
>>> semcor_ic = wordnet_ic.ic('ic-semcor.dat')
>>> genesis_ic = wn.ic(genesis, False, 0.0)
>>> rock.res_similarity(rock, brown_ic) # res_similarity, brown
1e+300
>>> rock.res_similarity(change, brown_ic)
-0.0
>>> rock.res_similarity(rock, semcor_ic) # res_similarity, semcor
1e+300
>>> rock.res_similarity(change, semcor_ic)
-0.0
>>> rock.res_similarity(rock, genesis_ic) # res_similarity, genesis
1e+300
>>> rock.res_similarity(change, genesis_ic)
-0.08306855877006339
>>> change.res_similarity(rock, genesis_ic)
-0.08306855877006339
>>> rock.jcn_similarity(rock, brown_ic) # jcn, brown - results are identical with semcor and genesis
1e+300
>>> rock.jcn_similarity(change, brown_ic)
1e-300
>>> change.jcn_similarity(rock, brown_ic)
1e-300