Extract data between two specific text using Jericho

288 Views Asked by At

I am using Jericho to parse Html. I have a html page in which I need to extract data between two specific text .

  <table width="100%" align="left">
        <tr><td>
             <b>  Item 7. </b>
        </td></tr>
    </table>
    ...........other data...........
other tags    
<table width="100%" align="left">
        <tr><td>
             <b>  fd ..fds   </b>
        </td></tr>
    </table>

    ...........other data ends...........

    <table width="100%" align="left">
        <tr><td>
             <b>  Item 8. </b>
        </td></tr>
    </table>

How can I extract the data between Item 7. and Item 8. using jerchio .

Thanks in Advance

1

There are 1 best solutions below

0
On

In my case the 'Item 7' and 'Item 8' are seen inside 'bold' followed by

I iterated the list of elements . My code

for (Element allElement : allElements) {

            if(strtInd==false){
            if((allElement.getStartTag().toString().toLowerCase()).startsWith(("<table").toLowerCase())){

                List<Element> boldElem = allElement.getAllElements(HTMLElementName.B);

                if(null !=boldElem && boldElem.size()>0){
                    Element e1 =  boldElem.get(0);
                    if(null != e1&& (e1.getTextExtractor().toString().toLowerCase()).startsWith(("Item 7.").toLowerCase())){
                        prevElement = allElement;
                        strtInd = true;
                    }
                }

            }
            }else{

                if((allElement.getStartTag().toString().toLowerCase()).startsWith(("<table").toLowerCase())){

                    List<Element> boldElem = allElement.getAllElements(HTMLElementName.B);

                    if(null !=boldElem && boldElem.size()>0){
                        Element e1 =  boldElem.get(0);
                        if(null != e1&& (e1.getTextExtractor().toString().toLowerCase()).startsWith(("Item 8.").toLowerCase())){
                            System.out.println(e1.getTextExtractor().toString());
                            strtInd = false;
                            break;
                        }
                    }

                }
                    sBuff.append(allElement.getFirstElement());
                    prevElement =allElement;
                    System.out.println(allElement);
            }


        }