Galago 3.5 Indexing

568 Views Asked by At

Downloaded Galago 3.5 bin version and tried to index wiki-small.corpus following this guide. Strangely I get a File Not Found Exception for the .index file when trying to run the build index command. This error goes away when I explicitly use the inputPath and indexPath but instead now I get this exception -

Created executor: org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor@69107c05 Running without server! Use --server=true to enable web-based status page. Stage inputSplit completed with 0 errors. Mar 14, 2014 3:26:01 PM org.lemurproject.galago.core.parse.UniversalParser process INFO: Processing split: /Users/nanz/Downloads/wiki-small.corpus java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:137) at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:52) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$TupleUnshredder.processTuple(DocumentSplit.java:2033) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$DuplicateEliminator.processTuple(DocumentSplit.java:1989) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyTuples(DocumentSplit.java:1705) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntilFileId(DocumentSplit.java:1732) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedBuffer.copyUntil(DocumentSplit.java:1740) at org.lemurproject.galago.core.types.DocumentSplit$FileIdOrder$ShreddedReader.run(DocumentSplit.java:1940) at org.lemurproject.galago.tupleflow.FileOrderedReader.run(FileOrderedReader.java:76) at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:96) at java.lang.Thread.run(Thread.java:695) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.lemurproject.galago.core.parse.UniversalParser.constructParserWithSplit(UniversalParser.java:213) at org.lemurproject.galago.core.parse.UniversalParser.process(UniversalParser.java:132) ... 10 more Caused by: java.lang.NullPointerException at org.lemurproject.galago.core.index.KeyValueReader.getManifest(KeyValueReader.java:35) at org.lemurproject.galago.core.index.corpus.CorpusReader.init(CorpusReader.java:41) at org.lemurproject.galago.core.index.corpus.CorpusReader.(CorpusReader.java:32) at org.lemurproject.galago.core.parse.CorpusSplitParser.(CorpusSplitParser.java:33) ... 16 more Stage parsePostings completed with 1 errors. java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException Exception in thread "main" java.util.concurrent.ExecutionException: Stage threw an exception: at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.waitForStages(JobExecutor.java:1062) at org.lemurproject.galago.tupleflow.execution.JobExecutor$JobExecutionStatus.run(JobExecutor.java:971) at org.lemurproject.galago.tupleflow.execution.JobExecutor.runWithoutServer(JobExecutor.java:1122) at org.lemurproject.galago.tupleflow.execution.JobExecutor.runLocally(JobExecutor.java:1177) at org.lemurproject.galago.core.tools.AppFunction.runTupleFlowJob(AppFunction.java:101) at org.lemurproject.galago.core.tools.apps.BuildIndex.run(BuildIndex.java:789) at org.lemurproject.galago.core.tools.AppFunction.run(AppFunction.java:55) at org.lemurproject.galago.core.tools.App.run(App.java:82) at org.lemurproject.galago.core.tools.App.run(App.java:73) at org.lemurproject.galago.core.tools.App.main(App.java:69) Caused by: java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException at org.lemurproject.galago.tupleflow.execution.LocalCheckpointedStageExecutor$LocalExecutionStatus.run(LocalCheckpointedStageExecutor.java:99) at java.lang.Thread.run(Thread.java:695)

I tried building the source code and I got the same results in that case as well. Can somebody point out where I am going wrong ? Hardly anybody seems to have faced this issue so there's not much I get via a simple Google search.

2

There are 2 best solutions below

0
On

Solved. Just in case someone else faces this issue, one of my friends figured it out that Galago would not work directly on the wiki-small.corpus file as it tries to look for corpus.keys which do not exist for this. Just replace this .corpus file instead with the directory of documents and everything will work just fine. Do specify the indexPath and inputPath parameters explicitly. Use "galago build help" to view the exact syntax. Cheers.

0
On

I know this is late, but the wiki-small.corpus file from the textbook's website was built with an old version of galago, namely the 1.0 series, which is preserved in this google code repository: https://code.google.com/p/galagosearch/

The newer releases of Galago (2.0 ... 3.5 ...3.7) are part of newer development under the Lemur Project on sourceforge, and the corpus format has since changed. If you had a corpus file built with Galago 3.5, your commands should have worked.