Word2Vec- does the word embedding change?

614 Views Asked by At

just wanted to know if there are 2 sentences-

  1. The bank remains closed on public holidays
  2. Don't go near the river bank

The word 'bank' will have different word embeddings or same? If we use word2vec or glove?

1

There are 1 best solutions below

0
On BEST ANSWER

You can't meaningfully train a dense word embedding on just 2 texts. You'd need these, and dozens (or ideally hundreds) more examples of the use of 'bank' in subtly-varying contexts to get a good word-vector for 'bank'. (And that word-vector would only have meaning in comparison to other word-vectors for other well-sampled words in the same trained model.)

Let's assume you do have a large, diverse training corpus with many examples of 'bank' in contexts. And you've trained a model, either word2vec or GLoVe on that corpus.

Then, imagine that corpus was changed so that there were relatively more contexts that included the 'river' sense. (Perhaps, a bunch of new texts are added that talk about nature, parks, boating, & irrigation.) Then, you retrain your model, from scratch, on the new corpus.

In the new model, 'bank' (and related words) will typically have been nudged to have more 'river bank'-like neighbors.

These words may be in totally different coordinates, overall, as each run includes enough randomness to change words' ending positions a lot. But their relative neighborhoods & relative directions will tend to be of similar value from subsequent runs, and changes in the mix of examples will tend to nudge results in one direction or another.

This is the case for both GLoVe and word2vec: their end results will both be influenced by the relative preponderance of alternate word senses.

(That words have multiple contrasting meanings is generally referred to in the relevant literature as 'polysemy', so searches like [polysemy word-vectors] should turn up a lot more work related to your question.)