Can we search for .txt files in Solr search engine?

1.3k Views Asked by At

I am using solr search engine for my project purpose in document retrival. My dataset is in .txt file format. But solr gives options for json,xml,pdf and some other file formats only. There is no option for text files.
Do I need some modifications in solr for using .txt files as dataset?

5

There are 5 best solutions below

0
Mysterion On

All you need to do - is to index your txt file.

For more info and concrete examples take a look here - http://www.slideshare.net/LucidImagination/indexing-text-and-html-files-with-solr-4063407

0
javacreed On

Most probably you will be having space separated documents in .txt files.So to index .txt file you can write python script to stream your documents to solr and perform a commit.

0
Jayesh Chandrapal On

Apart from txt files, Solr can also index several other document formats. Take a look at Apache Tika for details.

0
Marty On

You can use the CSV request Handler to take care of this. https://wiki.apache.org/solr/UpdateCSV Here, you can configure the delimiters and escape characters. For eg: if you have a "|" delimited file, you can specify "&separator=|"

Below is for Indexing a tab limited text file:

curl 'http://localhost:8983/solr/update/csv?commit=true&separator=%09&escape=\&stream.file=/tmp/result.txt'

0
Nate On

I found a very useful line in the quickstart guide https://lucene.apache.org/solr/5_3_1/quickstart.html

java -classpath /solr-5.0.0/dist/solr-core-5.0.0.jar -Dauto=yes
-Dc=gettingstarted -Ddata=files -Drecursive=yes org.apache.solr.util.SimplePostTool docs/

The part that is especially useful for me is -Dauto=yes. When this option is turned on, Solr can handle many type of files (don't ask me why)

Entering auto mode. File endings considered are xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log

All I know is that I turned that option on, and now my instance will accept pdf, xml and txt files.