I am trying to read an html link that contains something like this
<html>
<head>
<title>
Title
</title>
</head>
<body>
Name1 Age1 Hometown1<br>
Name2 Age2 Hometown2<br>
Name3 Age3 Hometown3<br>
</body>
</html>
with method readData(String[] urls) where String[] urls is an array of strings, strings being one or more urls. Now I'm only interested in what's in the html body of each url, hence I used while .readLine!=null
and .contains("<br>")
. However, it appears that my code can only read the first line of the body block (starting with line after <body>
, as I want) and does not go on to the lines after until the </body>
. How would I make my code read past the first line?
public void readData(String[] urls) {
for (int i=0; i<urls.length; i++) {
String str="";
try {
URL url=new URL(urls[i]);
URLConnection conn=url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String s;
while (( s = in.readLine())!=null)
if (s.contains("<br>")) {
str += s;
}
} catch(Exception e) {
e.printStackTrace();
}
}
}
EDIT1: The issue appears to be that the entire input is coming in as one line rather than multiple lines, as it should be. How would I partition that one line into multiple lines so that I can read each?
EDIT2:
Thanks everyone. I've figured that out. I still use the single long input of String but I just partition it into a String array using .split()
and read each element of that. However, there is a new problem now. for my String[] urls, I am only reading the first element. I cannot read anything beyond the first String urls element when actually I want to read all the String elements in urls. Any ideas?
I think that the goal of this question is to get the information in the body and separate the BR tags.
The readLine() method will take care of reading the individual lines. I do not think there is anything that you can do, unless you also are involved with the code that is writing the page. I think more definition is needed regarding the source of your data.
For dividing up a single line, you could start with methods from the String class.
Use a String.indexOf("<body>") to get the position of the body. Then use a combination of String.substring(int,int) and indexOf(String,int) to work out the rest of the details.