Trying to get exact source code from web page I see from my browser using Java.

722 Views Asked by At

Very very new to programming,i.e its my 2nd day. I am looking at finance webpage, and am trying to extract the stock symbols from the webpage. Using the source code from the webpage id like a list that looks like ADK-A,AEH,AED, etc..., which is a list of the symbols as they appear on the webpage and browser generated source code.

Looking at the source code via Chrome's browser you can see the stock symbols, but using java even though I get some of the source code, every way i try the stock symbols and plenty of other code are never generated.

I have tried implementations using URL class, URLConnection class, and the HtmlUnit class. I dont know much but im guessing this part of the source is generated by some sort of javascript?? I figured working with Htmlunit would help as supposedly it can handle scripts? It didnt at least the way I am using it. Anyways this is what i tried

private static String name1 = "http://www.quantumonline.com/pfdtable.cfm?Type=TaxAdvPfds&SortColumn=Company&SortOrder=ASC";

//Implementation 1

public static void main (String[] args) throws IOException {
    URL thisUrl = new URL(name1);
    BufferedReader thisUrlBufferedReader = new BufferedReader (new InputStreamReader(thisUrl.openStream()));
    String currentline;
    while( (currentline = thisUrlBufferedReader.readLine()) != null) {
if ((currentline.contains("href")) == true) {
    System.out.println(currentline);
    }
    }
}

//Implementation 2. My understading of fudging with addRequestProperty of a URLConnection, was to make sure my that the website wasnt restricting me based on my user-agent, I honestly dont really know what it does, but i tried with and without, didnt help

public static void main (String[] args) throws IOException {
    URL thisUrl = new URL(name1);
    URLConnection thisUrlConnect = thisUrl.openConnection();
    thisUrlConnect.addRequestProperty("User-Agent", "the user agent i got from http://whatsmyuseragent.com/");
    InputStream input = thisUrlConnect.getInputStream();
    BufferedReader thisUrlBufferedReader = new BufferedReader (new InputStreamReader (input));
    String currentline;
    while( (currentline = thisUrlBufferedReader.readLine()) != null) {
    System.out.println(currentline);
    }
}

//Implementation 3 i also used WebClient(BrowserVersion.CHROME) plus all the other versions //nothing worked

public static void main(String[] args) throws Exception { 
 WebClient webClient = new WebClient(); 
    HtmlPage page = webClient.getPage(name1); 
    System.out.println(page.asXml()); 
    }
}

Anyways if anyone has any ideas im all ears. THANKS!!!

0

There are 0 best solutions below