Tika Docx Scanning for 2 MB file (Pure text docx file) taking more than 30 seconds

49 Views Asked by At

I am using tika 2.6.x with java opts as XX:MaxMetaspaceSize=200M -Xss512K -XX:MaxDirectMemorySize=64M for below code. It seems that processing time is very high(around a minute) for a pure text containing docx file of size more than equals to 2 MB. Same code logic is working very efficient for 2 MB csv, pptx and other files and sending the response in less than 5seconds. Any more configuration needed? please suggest, thanks.

Parser pasrer=new AutoDetectParser()
BufferWriter=Files.newBufferedWriter("MyFile")
Handler handler=new BodyContentHandler(BufferWriter)
//some code logic for embdedded image
context.set(classOf[EmbeddedDocumentExtractor], imageChecker)
parse(TikaInputStreamObj, handler, Metadata, context)
0

There are 0 best solutions below