Tika server cant parse text from encrypted doc

54 Views Asked by At

Any files saved from libreOffice (odt or docx etc) that is encrypted by password produce such an exception on tika server

WARN  [qtp310212872-45] 08:52:13,393 org.apache.tika.server.core.resource.TikaResource tika/text: Text extraction failed (null)
org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.odf.OpenDocumentParser@6bc5bd75
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:304) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:203) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:357) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.server.core.resource.TikaResource.parseToMetadata(TikaResource.java:611) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.server.core.resource.TikaResource.getJson(TikaResource.java:581) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at jdk.internal.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) ~[?:?]
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?]
    at java.lang.reflect.Method.invoke(Method.java:568) ~[?:?]
    at...
Caused by: java.util.zip.ZipException: only DEFLATED entries can have EXT descriptor
    at java.util.zip.ZipInputStream.readLOC(ZipInputStream.java:312) ~[?:?]
    at java.util.zip.ZipInputStream.getNextEntry(ZipInputStream.java:124) ~[?:?]
    at org.apache.tika.parser.odf.OpenDocumentParser.handleZipStream(OpenDocumentParser.java:219) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.parser.odf.OpenDocumentParser.parse(OpenDocumentParser.java:170) ~[tika-server-standard-2.9.1.jar:2.9.1]
    at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298) ~[tika-server-standard-2.9.1.jar:2.9.1]
    ... 43 more

TIKA-198 - is old closed issue

And also i cant unpack any zip, even if i send any password. Return 204 no content for such a zip files (Password is correct)

P.S. Header "Password" is used when i send request to server

0

There are 0 best solutions below