Similarity measure between 2 semantic vectors with COLT

390 Views Asked by At

I'm using spreading-activation to get related concepts to a given one.

If I want to calculate the similarity between 'London' and 'Paris', I get 2 vectors such as:

vector for 'Paris':
Paris : 1.0
City : 0.9
Capital : 0.7
France : 0.6
Europe : 0.5
...

vector for 'London':
London : 1.0
City : 0.9
England : 0.9
United Kingdom : 0.8
Europe : 0.5
...

The issue is that the vectors can have different lengths. What similarity measure can be used in this situation? As far as I know the cosine measure can be applied only on vectors having the same size.

I found these packages: SimMetrics: http://staffwww.dcs.shef.ac.uk/people/S.Chapman/simmetrics.html and COLT: http://nlp.stanford.edu/nlp/javadoc/colt-docs/overview-summary.html

How is it possible to use them in my scenario?

Thanks! Mulone

1

There are 1 best solutions below

1
On

You could just default all unassigned values to 0 to get matched vectors and then use any distance metric of your choice. You probably want to have some way of weighting the different attributes, though, since some are likely to be better signifiers of relevance than others.

Also, by what measure is London more "Europe" than Paris?