I'm trying to do some crawling with Nutch and I'd like to test out Cassandra as a backend, however using the latest version of nutch and its dependencies Cassandra throws a variety of errors as you move through the inject, generate, fetch, etc. process.
The errors are all related to actual problems in code, not out of memory or configuration. I've fixed some of them by modifying code within gora-cassandra, but it's still not functional.
My question is, does a working version of these 2 projects exist? By working i mean you can run through inject, generate, fech, parse, updatedb on at least a small set of urls, without error.
Here's an example of one of the classes giving an error during fetch:
java.lang.NullPointerException at org.apache.gora.cassandra.query.CassandraSuperColumn.getUnionIndex
I have used HBase as the backend and that just works, although HBase itself is a monster to manage so that's why i'd like to test out Cassandra. However, i'm about to give up on this as I don't think I should be having to modify gora-cassandra code just to get a basic example to run.
Thanks
According to this link it's just broken, which is about 3 months old http://lucene.472066.n3.nabble.com/Re-user-Digest-3-Jun-2017-19-27-20-0000-Issue-2758-td4339060.html
I would agree with the guy making the original statement, Its unclear why backends that do not work are even documented.
Really tired of this project and its lack usable documentation.