Working with RAMDictionary and hadoop

167 Views Asked by At

I am trying to use the MIT jwi wordnet interface when working from hadoop. This interface uses a RAMDictionary Object whose constructor needs to receive a file indicating the location of the wordnet folder. I have copied this folder to hdfs, but I can't create a File object from it, only a Path.

Anyone knows how I can work around this?

1

There are 1 best solutions below

0
On

It depends on what you are trying to do.

You say you are working from Hadoop. Are you trying to use Hadoop for processing the WordNet dictionary files themselves? If so, you may not need the RAMDictionary, just the parser. For example:

// for each line in each WordNet data file
ISynset synset = DataLineParser.getInstance().parseLine(line);
// do stuff with each synset

However, if you are processing something else and would like to use the WordNet dictionary as a tool to help you do it, then yes, this is slightly more complicated. You could:

  1. Convert the Path to a File as described in How to convert a Hadoop Path object into a Java File object (but the accepted answer suggests that this is not sensible)
  2. Extend JWI to work with a Path instead of a File
  3. Use WordNet remotely. There is a REST interface provided by abbreviations.com. If that isn't suitable, you could write your own, or you could even import WordNet into a database (e.g. Titan or neo4j) and then search that from anywhere in your Hadoop cluster.