Lucene 4 - How to discard numeric terms in index?

91 Views Asked by At

I'm using Apache Tika to parse xml document before indexing with Apache Lucene.

This is Tika part:

  BodyContentHandler handler = new BodyContentHandler(10*1024*1024);
  Metadata metadata = new Metadata();
  FileInputStream inputstream = new FileInputStream(f);
  ParseContext pcontext = new ParseContext();

  //Xml parser
  XMLParser xmlparser = new XMLParser(); 
  xmlparser.parse(inputstream, handler, metadata, pcontext);

  return handler.toString();// return simple text

I use StandardAnalyzer with stop words list to Tokenize my document :

 analyzer = new StandardAnalyzer(StandardAnalyzer.STOP_WORDS_SET);  // using stop words

Can I discard numeric terms because I dont need it?

Thanks for your help.

0

There are 0 best solutions below