NoClassDefFoundError MimeTypeException with PDF extraction

2.8k Views Asked by At

I am getting an exception trying to use update/extract with PDF files

My Set up is:- Ubuntu Server 11.10 Tomcat 6 Solr 3.5.0.2011.11.22.15.54.38

I can browse to solr/admin OK

I have put all the contrib/extract and apache-solr-cell3.5.0.jar libraries into the tomcat folder webapps/solr/WEB-INF/lib

I am calling extract using:-

curl "http://localhost:8080/solr/update/extract?uprefix=attr_&fmap.content=attr_content&commit=true" -F "file=/path/to/my.pdf"

error is

java.lang.NoClassDefFoundError: org/apache/tika/mime/MimeTypeException
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:383)
at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:425)
at org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java:461)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.getWrappedHandler(RequestHandlers.java:248)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:239)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)

Would appreciate any pointers - the only time this error seems to come up elsewhere is with Nutch and cached results.

I have tried sending the mimetype in the querystring and also a *.doc file but got the same error.

3

There are 3 best solutions below

0
On BEST ANSWER

This was due to the basic error of copying the necessary tika libraries (to tomcat6/webapps/solr/WEB-INF/lib) but leaving ownership of the jar files as ROOT instead of chown-ing them to TOMCAT6. After setting the right permission and restarting Tomcat it started working OK

0
On

According to the error message it is not a MimeTypeException exception you get: The problem is a NoClassDefFoundError, because Solr cannot load the class MimeTypeException.

Normally this class is present in tika-core.jar.

Make sure you actually have that file and also check if you have a lib statement in your solrconfig.xml pointing to the right directory.

0
On

Found the solution of this problem, I was using SolrJ to update my pdf indexing.

after deploy solr to tomcat, I didn't include the following libraries into the tomcat/webapp

and I get all the lazy loading problem, etc etc I even try to get apache tika... until I do this...

shutdown tomcat

\apache-solr-3.5.0\contrib\extraction

copy the libraries above to below

\apache-tomcat-7.0.26\webapps\solr\WEB-INF\lib

startup tomcat

cheers