Hey I'm trying to learn some of the recommendation algorithms that's being used in websites like Amazon.com. So I have this simple java (spring hibernate postgres) book store application where in Book has the attributes title, category, tags, author. For simplicity there's no content inside the book. A book has to be identified by its title, category, author and tags. For each user logging into the application I should be able to recommend some books. Each user can view a book, add them to cart and buy it anytime. So in the database I'm storing how many times each user looked at a book, the books in his cart and the books the user has bought. At the moment there's no rating option but that can be added too.
So can someone tell me what are the algorithms I could use to demonstrate some recommendation of books for each user? I want to keep it really simple. Its not a project to sell but only to expand my knowledge on recommendation algorithms. So assume there are only about 30 books in total(5 categories and 6 books in each). It would be really helpful if someone could also tell me what should be the attributes I should be using to calculate similarities between two users and how to go about it with the algorithms recommended.
Thanks in advance. SerotoninChase.
As a particular concrete example, one option is a "nearest K neighbours" algorithm.
To simplify things, imagine you only had ten books, and you were only tracking how many times each user viewed each book. Then, for each user, you might have an array
int timesViewed[10], where the value oftimesViewed[i]is the number of times the user has viewed book numberi.You can then compare the user to all of the other users using a correlation function, such as the Pearson correlation for example. Computing the correlation between the current user
cand another userogives a value between -1.0 and 1.0, where -1.0 means "this usercis the complete opposite of the other usero", and 1.0 means "this usercis the same as the other usero".If you compute the corellation between
cand every other user, you get a list of results of how similar the user's viewing pattern is to that of each other user. You then pick theK(e.g. 5, 10, 20) most similar results (hence the name of the algorithm), that is, theKusers with the correlation scores closest to 1.0.Now, you can do a weighted average of each of those user's
timesViewedarrays. For example, we'll sayaverageTimesViewed[0]is the average of thetimesViewed[0]for each of those K users, weighted by their correlation score. Then do the same for each otheraverageTimesViewed[i].Now you have an array
averageTimesViewedwhich contains, roughly speaking, the average number of times the K users with the most similar viewing patterns tochas viewed each book. Recommend the book which has the highestaverageTimesViewedscore, since this is the book the other users have shown most interest in.It's usually worth also excluding books the user has already viewed from being recommended, but it is still important to keep those accounted for when computing similarity/correlation.
Also note that this can be trivially extended to take other data into account (such as cart lists etc). Also, you can select all users if you want (i.e.
K= number of users), but that doesn't always produce meaningful results, and usually picking a reasonably smallKis sufficient for good results, and is quicker to compute.