Hi I have a PDF file and I need to search a particular string in that. I tried various methods, and I am able to read all the contents in PDF file but unable to find a particular string.
Here in this file, I need to search string such as Telephone, Garbage, Rent etc individually.
Could you please help me?
I have the below code for reading the file.
public class PDFBoxReader {
private PDFParser parser;
private PDFTextStripper pdfStripper;
private PDDocument pdDoc ;
private COSDocument cosDoc ;
private String Text ;
private String filePath;
private File file;
public PDFBoxReader() {
}
public String ToText() throws IOException
{
this.pdfStripper = null;
this.pdDoc = null;
this.cosDoc = null;
file = new File("D:\\report.pdf");
parser = new PDFParser(new FileInputStream(file));
parser.parse();
cosDoc = parser.getDocument();
pdfStripper = new PDFTextStripper();
pdDoc = new PDDocument(cosDoc);
pdDoc.getNumberOfPages();
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(10);
// reading text from page 1 to 10
// if you want to get text from full pdf file use this code
// pdfStripper.setEndPage(pdDoc.getNumberOfPages());
Text = pdfStripper.getText(pdDoc);
return Text;
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
}
It would be great if someone could help me with a code that searches for a particular string. Thanks in advance.
Try
String.indexOf("substring")
withString
being what is returned from yourToText()
method, andsubstring
the string you wish to search for. (Side note, the custom in Java is camel-case methods, which would betoText()
in this case.)This method should find the first index of the entered substring in your long
String
of text. So you could doString.indexOf("Telephone")
to find the first occurrence of the word Telephone in yourString
.If you want the stuff directly after that substring, the index would simply be
String.indexOf("substring")+"substring".length()
You can even find the next occurrence (or the next after that) with another variation of this method
String.indexOf("substring", indexOfLastOccurrence+"substring".length)
Example:
Both methods can be found in the Java API: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#indexOf(java.lang.String)