Why is ConceptNet Numberbatch word embedding giving poor results for analogy queries?

445 Views Asked by At

I've been playing around with analogy queries over some publicly available word embeddings, in particular using the following:

I'm doing some basic queries that include (where queryTarget is what I am looking for):

baseSource:baseTarget :: querySource:queryTarget e.g. man:woman :: king:queen

  • maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource)
  • maximize cosine_similarity(baseTarget-baseSource, queryTarget-querySource) * cosine_similarity(baseTarget-queryTarget,baseSource-querySource)
  • minimize L2norm(baseTarget-baseSource+querySource, queryTarget)

For the query: man:woman :: king:?

The glove data gives me the correct queen, lady, princess results for the various matching strategies. However, conceptnet gives female_person, adult_female, king_david's_harp as top 3, which I would not expect (queen is not in the top 20). Similarly, I see poor results regularly displace expected results that I do see in the glove results.

Does the conceptnet embedding require some sort of additional tweaking before I can use it? Or is it just not tailored/suited for English analogies?

0

There are 0 best solutions below