I want to highlight several keywords in a set of PDF files. Firstly, we have to identify the single words and match them with my keywords. I found an example:
class MyLocationTextExtractionStrategy : LocationTextExtractionStrategy
{
    //Hold each coordinate
    public List<RectAndText> myPoints = new List<RectAndText>();
    List<string> topicTerms;
    public MyLocationTextExtractionStrategy(List<string> topicTerms)
    {
        this.topicTerms = topicTerms;
    }
    //Automatically called for each chunk of text in the PDF
    public override void RenderText(TextRenderInfo renderInfo)
    {
        base.RenderText(renderInfo);
        //Get the bounding box for the chunk of text
        var bottomLeft = renderInfo.GetDescentLine().GetStartPoint();
        var topRight = renderInfo.GetAscentLine().GetEndPoint();
        //Create a rectangle from it
        var rect = new iTextSharp.text.Rectangle(
                                                bottomLeft[Vector.I1],
                                                bottomLeft[Vector.I2],
                                                topRight[Vector.I1],
                                                topRight[Vector.I2]
                                                );
        //Add this to our main collection
        //filter the meaingless words
        string text = renderInfo.GetText();
        this.myPoints.Add(new RectAndText(rect, renderInfo.GetText()));
However, I found so many words are broken. For example, "stop" will be "st" and "op". Are there any other method to identify a single word and its position?
 
                        
When you want to collect single words and their coordination, the better way is to override the existing LocationTextExtractionStrategy. Here is my code:
myPoints is a list, which will return all we want.