In The Solr, How can i index a plain text file that contained a special characters

218 Views Asked by At

enter image description here

In The Solr, How can I index a plain text file that contained special characters

In the upper case, tried in The Windows environment.

And in The Linux environment, tried for document of example.

enter image description here

But I got failure too.

1

There are 1 best solutions below

1
Jang-Ho Bae On

Thanks MatsLindh. I succeeded in indexing to pdf, txt files in The Linux. But I failed it in Windows. My configurations for Extracting Request Handler was the same in both environments. This is my solrconfig.xml file

<lib dir="${solr.install.dir:../../../..}/contrib/extraction/lib" regex=".*\.jar" />
<lib dir="${solr.install.dir:../../../..}/dist/" regex="solr-cell-\d.*\.jar" />
.
.
.
<requestHandler name="/update/extract"
            startup="lazy"
            class="solr.extraction.ExtractingRequestHandler" >
    <lst name="defaults">
        <str name="lowernames">true</str>
        <str name="fmap.content">_text_</str>
    </lst>
</requestHandler>

And the failed my command in windows.

E:\work\private\JAVA\solr8>java -Dc=test -Dparams="literal.id=doc1" -jar ./bin/post.jar "./example/exampledocs/solr-word.pdf"
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/test/update?literal.id=doc1 using content-type application/xml...
POSTing file solr-word.pdf to [base]
SimplePostTool: WARNING: Solr returned an error #400 (Bad Request) for url: http://localhost:8983/solr/test/update?literal.id=doc1
SimplePostTool: WARNING: Response: <?xml version="1.0" encoding="UTF-8"?>
<response>

<lst name="responseHeader">
  <int name="status">400</int>
  <int name="QTime">0</int>
</lst>
<lst name="error">
  <lst name="metadata">
    <str name="error-class">org.apache.solr.common.SolrException</str>
    <str name="root-error-class">java.io.CharConversionException</str>
  </lst>
  <str name="msg">Invalid UTF-8 middle byte 0xe5 (at char #10, byte #-1)</str>
  <int name="code">400</int>
</lst>
</response>
SimplePostTool: WARNING: IOException while reading response: java.io.IOException: Server returned HTTP response code: 400 for URL: http://localhost:8983/solr/test/update?literal.id=doc1
1 files indexed.
COMMITting Solr index changes to http://localhost:8983/solr/test/update?literal.id=doc1...
Time spent: 0:00:00.064

Why did not run this in Windows?