Apache Solr - Search Text in File

442 Views Asked by At

Well, I'm looking into Solr to fulfill my below specific requirement:

Requirements:

There would be one "X" name of the folder where thousands of XML structured files are situated, now I want to search one term (i.e "Hello World"), In result, I want to get the number of files which would have the name "Hello World".

So Can we achieve using Solr, if yes then can anyone give me bit guide to achieve the same?

Note: XML file would be in any format, i.e (https://i.stack.imgur.com/wNPTW.png)

Question: Is structure define in "wNPTW.png" is valid for Solr to search text? or we must need to depend on Solr specific document structure. i.e (https://i.stack.imgur.com/sqn5q.png)

In addition, performance is my primary requirement.

Please suggest me how I can move ahead on this? if is there any other technology available then kindly suggest me.

Looking forward to hearing from you guys :)

1

There are 1 best solutions below

1
On

Yes.

If the XML format is more or less identical across all documents, you can use the Data Import Handler to configure a mapping (using xpath) from nodes to fields. You can do this to map almost any XML field to a common Solr field as well (if the XML files aren't well defined).

Another option is to use the built-in support with Apache Tika to parse files and use that to extract data into a content field and search against that.

If you require more specific handling of the files, writing a small indexer and performing the required transformation in that layer is probably the easiest path ahead.