java get web content for many webpages

50 Views Asked by At

So previously I had a program that would go to a lot of websites and get part of the source code out of those websites that I wanted. However recently the websites have been updated to now load the information I want dynamically, and I no longer get it.

I have made another version of my program using Selenium that worked, but it took too long to be practical, is there another way of getting the content faster? One thing I noticed is that Internet Explorer version 11 still has the website content loaded the way it used to be, can I get the source specifically from there?

The way I was getting it before that worked was like this:

public static void main(String[] args) throws IOException{


    String example = getSource("http://www.google.com");

    System.out.println(example);
}

public static String getSource(String urlToGoTo) throws IOException
{
    URL url = new URL(urlToGoTo);
    URLConnection connection = url.openConnection();
    BufferedReader in = new BufferedReader(new InputStreamReader(
            connection.getInputStream()));
    String inputLine;
    StringBuilder a = new StringBuilder();
    while ((inputLine = in.readLine()) != null)
        a.append(inputLine);
    in.close();

    return a.toString();
}

Any ideas are welcome, I've been trying to find a way to get this to work for way to long, given that it sounds like it shouldn't be too complicated.

1

There are 1 best solutions below

1
On

It seems your are trying to get the page source. There's a method for that in selenium. You may use it instead of your

getSource("http://www.google.com");

Create an WebDriver instance and navigate to your url and get the page source.

Code snippet:

WebDriver driver = new FirefoxDriver();
driver.get("your URL");
String pageSource = driver.getPageSource();