High CPU consumption by Apache Tika

80 Views Asked by At

I am using Apache Tika to extract text from PDF files. My problem is that the Tika service shows CPU spikes from 100-400% in Linux.

I'm using Tika 2.9.1, which is the latest stable version of Tika. I also observed the same CPU spikes when using Tika 1.20.

I'm using this section of code to get text from PDF file:

BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("Example.pdf"));
ParseContext pcontext = new ParseContext();
  
//parsing the document using PDF parser
PDFParser pdfparser = new PDFParser(); 
pdfparser.parse(inputstream, handler, metadata,pcontext);
  
//getting the content of the document
System.out.println("Contents of the PDF :" + handler.toString());

Is there any parameter that I can set to reduce Tika's CPU usage?

0

There are 0 best solutions below