When I run the LensKit demo program I get this error:
[main] ERROR org.grouplens.lenskit.data.dao.DelimitedTextRatingCursor - C:\Users\sean\Desktop\ml-100k\u - Copy.data:4: invalid input, skipping line
I reworked the ML 100k data set so that it only holds this line although I dont see how this would effect it:
196 242 3 881250949
186 302 3 891717742
22 377 1 878887116
244
Here is the code I am using too:
public class HelloLenskit implements Runnable {
public static void main(String[] args) {
HelloLenskit hello = new HelloLenskit(args);
try {
hello.run();
} catch (RuntimeException e) {
System.err.println(e.getMessage());
System.exit(1);
}
}
private String delimiter = "\t";
private File inputFile = new File("C:\\Users\\sean\\Desktop\\ml-100k\\u - Copy.data");
private List<Long> users;
public HelloLenskit(String[] args) {
int nextArg = 0;
boolean done = false;
while (!done && nextArg < args.length) {
String arg = args[nextArg];
if (arg.equals("-e")) {
delimiter = args[nextArg + 1];
nextArg += 2;
} else if (arg.startsWith("-")) {
throw new RuntimeException("unknown option: " + arg);
} else {
inputFile = new File(arg);
nextArg += 1;
done = true;
}
}
users = new ArrayList<Long>(args.length - nextArg);
for (; nextArg < args.length; nextArg++) {
users.add(Long.parseLong(args[nextArg]));
}
}
public void run() {
// We first need to configure the data access.
// We will use a simple delimited file; you can use something else like
// a database (see JDBCRatingDAO).
EventDAO base = new SimpleFileRatingDAO(inputFile, "\t");
// Reading directly from CSV files is slow, so we'll cache it in memory.
// You can use SoftFactory here to allow ratings to be expunged and re-read
// as memory limits demand. If you're using a database, just use it directly.
EventDAO dao = new EventCollectionDAO(Cursors.makeList(base.streamEvents()));
// Second step is to create the LensKit configuration...
LenskitConfiguration config = new LenskitConfiguration();
// ... configure the data source
config.bind(EventDAO.class).to(dao);
// ... and configure the item scorer. The bind and set methods
// are what you use to do that. Here, we want an item-item scorer.
config.bind(ItemScorer.class)
.to(ItemItemScorer.class);
// let's use personalized mean rating as the baseline/fallback predictor.
// 2-step process:
// First, use the user mean rating as the baseline scorer
config.bind(BaselineScorer.class, ItemScorer.class)
.to(UserMeanItemScorer.class);
// Second, use the item mean rating as the base for user means
config.bind(UserMeanBaseline.class, ItemScorer.class)
.to(ItemMeanRatingItemScorer.class);
// and normalize ratings by baseline prior to computing similarities
config.bind(UserVectorNormalizer.class)
.to(BaselineSubtractingUserVectorNormalizer.class);
// There are more parameters, roles, and components that can be set. See the
// JavaDoc for each recommender algorithm for more information.
// Now that we have a factory, build a recommender from the configuration
// and data source. This will compute the similarity matrix and return a recommender
// that uses it.
Recommender rec = null;
try {
rec = LenskitRecommender.build(config);
} catch (RecommenderBuildException e) {
throw new RuntimeException("recommender build failed", e);
}
// we want to recommend items
ItemRecommender irec = rec.getItemRecommender();
assert irec != null; // not null because we configured one
// for users
for (long user: users) {
// get 10 recommendation for the user
List<ScoredId> recs = irec.recommend(user, 10);
System.out.format("Recommendations for %d:\n", user);
for (ScoredId item: recs) {
System.out.format("\t%d\n", item.getId());
}
}
}
}
I am really lost on this one and would appreciate any help. Thanks for your time.
The last line of your input file only contains one field. Each input file line needs to contain 3 or 4 fields.