Can anyone tell me how does mahout's RecommenderIRStatsEvaluator work? More specifically how it randomly splits training and testing data and what data the result is compare against? Based on my understating, you need some sort of ideal/expected result which you need to compare against actual result from the recommendation algorithm to find out TP or FP and thus compute precision or recall. But it looks like mahout provides a precision/recall score without that ideal/result.
How mahout's recommendation evaluator works
2.7k Views Asked by rusho1234 At
1
There are 1 best solutions below
Related Questions in MAHOUT
- Strange predictions using SVD in mahout
- Recommendation Based on the Item properties and user preference for item properties
- Mahout clustering: How to retrieve the name of a named vector
- Incorporating new articles in tfidf vector for online clustering
- Too small RMSE. Recommender systems
- Why is the evaluation of Mahout Recommender Systems with Movielens dataset so slow?
- Apache Mahout: how to add weight to neighborhood and get a recommendation?
- Mahout parallel k-means in Hadoop
- Hierarchical clustering of text, at scale
- Creating an Item-based Recommender using Apache Mahout
- Combine search engine and machine learning
- Mahout recommender evaluation - how to use a fixed test set
- PredictionIO suggest to like items that have already been liked
- Mahout 0.9: Using own test set instead of using split command
- How to resolve log4j warnings while executing 20newsgroup classification example of Mahout?
Related Questions in MAHOUT-RECOMMENDER
- Strange predictions using SVD in mahout
- Recommendation Based on the Item properties and user preference for item properties
- How do I create recommendation system to show unread items?
- Too small RMSE. Recommender systems
- Why is the evaluation of Mahout Recommender Systems with Movielens dataset so slow?
- Java Heap Size Error
- Creating an Item-based Recommender using Apache Mahout
- Mahout recommender evaluation - how to use a fixed test set
- mahout datamodel for amazon redshift Recommendation Engine
- Get scores on continuous scale from Mahout recommender with boolean data
- Creating data model for mahout
- Mahout-Recommendation of Users
- mahout for content based recomendation
- How to read SEQ files in pig
- Exposing Mahout recommender as a web service
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
The data is split into training and testing set using some relevance threshold value which you supply in the
evaluatemethod of theRecommenderIRStatsEvaluatorclass. If this values isnullthere is method that computes it (computeThreshold). The class that splits the data into training and testing isGenericRelevantItemsDataSplitter. If you take a look into the code you can see that first the preferences for each user are sorted according the the value in descending order, and than only those that have value bigger than therelevanceThresholdare taken as relevant. Also notice that at mostatare put into this set.How the precision and the recall are computed you can see in the
RecommenderIRStatsEvaluator.evaluatemethod. In short it is like this: First only one user is evaluated at a time. His preference values are split into relevant (as described above) and other. The relevant ones are used as test set, and the other together with all other users as training. Thentop-atrecommendations are produced for this user. Next the method looks whether some of the items that were taken aside as test set appear in the recommendation, and how many:The precision than is computed as follows:
Where
numRecommendedItemsis usually youratif the recommender produces at leastatrecommendations, otherwise smaller.Similar, the recall is computed as follows:
where
numRelevantItemsis the number of items in the test set for this user.The final precision and recall are macro average of all precisions and recalls for all users.
Hope this answers your question.
EDIT: To continue with your question, it is very tricky when evaluating IR statistics (precision and recall) for recommender systems, especially if you have small number of user preferences per user. In this book you can find very useful insights. It says that