I need to code a similarity score in python to find matches based on movie genre.
The comparison is for 1 user to find similarity between their genre scores in binary and a dataframe of genre scores in binary for 40,000 movie titles. I need to iterate through the dataframe and compare each item with the users score to find similarity.
For instance take user 1: Score [0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,1]
Compare similarity to Movies dataframe: Movies Dataframe
I would like to come up with a score for a similarity measure between the user and each title in order to rank the titles that are most similar to the users preference.
I have found that Hamming distance is probably the best method for binary values. How can I implement this? Thanks
Try:
You may wish to go one step further and define a function that will tell you the index of most similar output for an index of input of interest: