I have created a content based recommender, which will recommend 10 similar products based on their description. Now I want to evaluate its accuracy and efficiency. Everything works well till now when I want to evaluate the accuracy of the system. Some formulas that I found on Google just evaluate the accuracy based on rating values (comparing predicted rating and actual rating like RMSE). I did not change similarity score into rating (scale from 1 to 5) so I couldn't apply any formula.
I have used cosine similarity and tfidf vectorizer. When I used surprise for cross validation "no raw rating" error occurred. I need some parameter to evaluate the recommendation system accuracy and efficiency.
Code for tfidf:
from sklearn.feature_extraction.text import TfidfVectorizer
##remove stop words
tfidf=TfidfVectorizer(stop_words='english')
###replace non with empty string NaN
df1['product_desc']=df1['product_desc'].fillna('')
##construct tfidf
tfidf_matrix=tfidf.fit_transform(df1['product_desc'])
tfidf_matrix.shape
and cosine similiarity
###cosine similarity
from sklearn.metrics.pairwise import linear_kernel
cosine_sim=linear_kernel(tfidf_matrix, tfidf_matrix)
###create reverse map
indices=pd.Series(df1.index,index=df1['product_name']).drop_duplicates()
def get_recommendation(title, cosine_sim=cosine_sim):
idx=indices[title]
sim_score=list(enumerate(cosine_sim[idx]))
sim_score=sorted(sim_score, key=lambda x:x[1], reverse=True)
sim_score=sim_score[1:2]
p_indices=[i[0] for i in sim_score]
name=df1['product_name'].iloc[p_indices]
return idx
so need some formula to evaluate content based recommender