This is the html code:
<!DOCTYPE html>
<html>
<title>Instructor's Page</title>
<body>
<h1>Instructor's Page</h1>
<div class="check1"> <div id="check2">
<span id="check3" class="check4"> <strong class="check5"><link href="http://schema.org/t"/>Instructor-1 name</strong>
</span>
</div>
<div class="check1"> <div id="check2">
<span id="check3" class="check4"> <strong class="check6">Instructor-2 name</strong>
</span>
</body>
</html>
I am very new to Jsoup
. How to extract Instructor's name
from the given html page?
Currently, I know only printing the title.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.File;
import java.io.IOException;
public class crawl {
public static void main(String[] args) {
Document doc1;
try {
File input = new File("t.html");
doc1 = Jsoup.parse(input, "UTF-8");
// get page title
String title1 = doc1.title();
System.out.println("title : " + title1);
} catch (IOException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
}
Use the
select
-method to select those elements in the HTML page you want. It takes a pattern as an argument to what objects you want to select, such as a specific tag with a certain id or class.Use this to select what you are looking for. Using a tool in your web browser that helps you identify tags in the HTML source can be helpful.
As to your original question on how to select instructor names, see below.
If the structure of the HTML always is the same and you are certain that the instructors name will be inside a
span
-tag, then you can simply parse the text in theWill print out