To calculate self-attention, For each word, we create a Query vector, a Key vector, and a Value vector. These vectors are created by multiplying the embedding by three matrices that we trained during the training process defined as WQ, WK, WV matrix.
Question: are these matrices WQ, WK, WV same for every input word (embedding) or they are different for different different words?