How to fetch data from a website using jericho html parser?

1.2k Views Asked by At

I am using jericho html parser in java. I want to fetch data from a website. In website html content is like this....

<div class="class_div">
   <div class="class_div2">All contents...</div>`
     <span class="equals">Content 1</span>
     <span class="equals">Content 2</span>
     <span class="equals">Content 3</span>
     <span class="equals">Content 4</span>
 </div>

I want to fetch Content 1,Content 2, Content 3, Content 4. How to fetch this?

I am using this code

String sourceUrlString="<website url>";
if (sourceUrlString.indexOf(':')==-1)
sourceUrlString="http:"+sourceUrlString;
Source source=new Source(new URL(sourceUrlString));
Element bodyContent = source.getElementByClass("equals");`
1

There are 1 best solutions below

0
On

Where's the Problem? With your code you get each Element - with those you get their text:

Source source = new Source(/* ... */);
List<Element> elements = source.getAllElementsByClass("equals");

for( Element element : elements )
{
    /*
     * 'element.getTextExcrator().toString()' returns the text of the element
     */
    System.out.println(element.getTextExtractor().toString());
}

Output:

Content 1
Content 2
Content 3
Content 4