How is a query in this example turned into a unit vector using term frequency and normalization?

734 Views Asked by At

In the Stanford Information Retreival book, I'm trying to figure out how a query is turned into a unit vector. Consider the query q = jealous gossip. This query turns into the unit vector ⃗v(q) = (0, 0.707, 0.707) on the three coordinates of Figures 6.12 and 6.13. How is (0, 0.707, 0.707) achieved?

enter image description here

enter image description here

1

There are 1 best solutions below

0
On BEST ANSWER

It's a standard vector normalization. A vector is said to be normalized if its distance from the 0 [(0,0,0) in your case] is exactly 1, which as you can see is the case in this vector (up to rounding difference).

Normalization is done by:

Let the vector be x = (x_1,x_2,...,x_n)
let s = x_1^2 + x_2^2 + ... + x_n^2 [sum of squares]
let the normalized vector n = x/sqrt(s) = (x_1/sqrt(s), x_2/sqrt(s), ..., x_n/sqrt(s))

In your example, the query "jealous gossip" produces the unormalzied vector (0,1,1) (since the dictionary consists of the 3 words affection, jealous, gossip). By invoking the above algorithm you get s=2, and thus the normalized vector is n=(0/sqrt(2), 1/sqrt(2), 1/sqrt(2)) ~= (0,0.707,0.707)