I’m working on word embeddings and a little bit confused about number of word vector's dimensions. I mean, take word2vec as an example, my question is why we should use lets say 100 hidden neurons for our hidden layer? Does this number have any meaning or logic behind? or if it is arbitrary, why not 300? or 10? why not more or less? As we all know the simplest way to display vectors is on 2 dimensions space (only X and Y), why more dimensions? I read some resources about it and in one example they choose 100 dimensions, in the other they choose the other numbers like 150, 200, 80, etc.
I know the larger the number, the bigger the space for displaying relations between words, but we couldn't display relations on 2 dimensions vector space (only X and Y)?! why we need bigger space? each word is displayed by a vector so why we have to use high dimensional space when we can display vectors on 2 or 3 dimensions space? and then its more simple to use similarity techniques like cosine to find the similarities on 2 or 3 dimensions rather than 100 (from computation time viewpoint), right?
Well.. If just displaying the vectors is your end game, you can use 2 or 3-dimensional vectors and it would just work the best.
Often in NLP, we have well-defined tasks like tagging, parsing, understanding the meanings, etc. For all of these purposes, higher dimensional vectors will ALWAYS perform better than 2-d, 3-d vectors. Because it has more degrees of freedoms to capture the relationships you are after. You can contain richer information through them.
its more simple to use similarity techniques like cosine to find the similarities on 2 or 3 dimensions rather than 100 (from computation time viewpoint), right?
No. This is saying adding 2 number is more simple than adding 100 numbers. The method (consine distance) is exactly the same.