trying to get BufferedReader to read past first line

722 Views Asked by At

I am trying to read an html link that contains something like this

<html>
<head>
<title>
Title
</title>
</head>
<body>
Name1 Age1 Hometown1<br>
Name2 Age2 Hometown2<br>
Name3 Age3 Hometown3<br>
</body>
</html>

with method readData(String[] urls) where String[] urls is an array of strings, strings being one or more urls. Now I'm only interested in what's in the html body of each url, hence I used while .readLine!=null and .contains("<br>"). However, it appears that my code can only read the first line of the body block (starting with line after <body>, as I want) and does not go on to the lines after until the </body>. How would I make my code read past the first line?

public void readData(String[] urls) {
        for (int i=0; i<urls.length; i++) {
            String str="";
            try { 
                URL url=new URL(urls[i]);
                URLConnection conn=url.openConnection();
                BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
                String s;
                while (( s = in.readLine())!=null)
                    if (s.contains("<br>")) {
                        str += s;
                    }
            } catch(Exception e) {
                e.printStackTrace();
            }
        }

    }

EDIT1: The issue appears to be that the entire input is coming in as one line rather than multiple lines, as it should be. How would I partition that one line into multiple lines so that I can read each?

EDIT2: Thanks everyone. I've figured that out. I still use the single long input of String but I just partition it into a String array using .split() and read each element of that. However, there is a new problem now. for my String[] urls, I am only reading the first element. I cannot read anything beyond the first String urls element when actually I want to read all the String elements in urls. Any ideas?

3

There are 3 best solutions below

0
On

I think that the goal of this question is to get the information in the body and separate the BR tags.

The readLine() method will take care of reading the individual lines. I do not think there is anything that you can do, unless you also are involved with the code that is writing the page. I think more definition is needed regarding the source of your data.

For dividing up a single line, you could start with methods from the String class.

Use a String.indexOf("<body>") to get the position of the body. Then use a combination of String.substring(int,int) and indexOf(String,int) to work out the rest of the details.

0
On

I'd try splitting the input string with .split("<body>") method of your string. Then split the second element of the resulting array which would be the content of your body tag. If you would split the body, you have in your example, you'd get an array with 3 elements as long as the last <br> tag is the very last content of your body. (sorry for my rather bad English)

EDIT: It's also important whether you receive the html file or the response body. If you only receive the body, I'd use the solution of Sean Pedersen

0
On

How would I separate that one line into multiple lines as it should be so I can read each?

I may be completely wrong about this, but it seems if your data seems to have newlines, they may actually be carriage returns.

Check out String.split()

Also check out the difference between \n and \r

You can try something like String textStr[] = yourString.split("\\r?\\n");

Just as a side note, StringBuilder was built for this.