I am using Apache Tika to extract text from PDF files. My problem is that the Tika service shows CPU spikes from 100-400% in Linux.
I'm using Tika 2.9.1, which is the latest stable version of Tika. I also observed the same CPU spikes when using Tika 1.20.
I'm using this section of code to get text from PDF file:
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("Example.pdf"));
ParseContext pcontext = new ParseContext();
//parsing the document using PDF parser
PDFParser pdfparser = new PDFParser();
pdfparser.parse(inputstream, handler, metadata,pcontext);
//getting the content of the document
System.out.println("Contents of the PDF :" + handler.toString());
Is there any parameter that I can set to reduce Tika's CPU usage?