Trying to practice extracting data from tables using JSoup. Can't figure out why I can't pull the "Shares Outstanding" field from
https://finance.yahoo.com/q/ks?s=AAPL+Key+Statistics
Here's two attempts where 's' is AAPL:
public class YahooStatistics {
String sharesOutstanding = "Shares Outstanding:";
public YahooStatistics(String s) {
String keyStatisticsURL = ("https://finance.yahoo.com/q/ks?s="+s+"+Key+Statistics");
//Attempt 1
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (Element td : tds.select(sharesOutstanding)) {
System.out.println(td.ownText());
}
}
}
}
catch (IOException ex) {
ex.printStackTrace();
}
//Attempt 2
try {
Document doc = Jsoup.connect(keyStatisticsURL).get();
for (Element table : doc.select("table.yfnc_datamodoutline1")) {
for (Element row : table.select("tr")) {
Elements tds = row.select("td");
for (int j = 0; j < tds.size() - 1; j++) {
Element td = tds.get(j);
if ((td.ownText()).equals(sharesOutstanding)) {
System.out.println(tds.get(j+1).ownText());
}
}
}
}
}
catch(IOException ex) {
ex.printStackTrace();
}
The attempts return: BUILD SUCCESSFUL and nothing else.
I've disabled JavaScript on my browser and the table still shows, so I'm assuming this is not written in JavaScript but HTML.
Any suggestions are appreciated.
Notes about your source after the edit:
ownText()
rather thantext()
.text()
gives you the combined text of all the element and all its sub-elements. In this case the element containsShares Outstanding<font size="-1"><sup>5</sup></font>:
, so its combined text is"Shares Outstanding5:"
. If you useownText
it will just be"Shares Outstanding:"
.:
). Update the value insharesOutstanding
accordingly.+
following theAAPL
.You can either break from your loops once you found a match, go back to your original version (with corrections as above) - see note - or you can try using a more sophisticated query which will only match once:
This selector gives you all the
td
elements whose class isyfnc_tabledata1
, whose immediate preceding sibling is atd
element whose class isyfnc_tablehead1
and whose own text contains the "Shares Outstanding:" string. This should basically select the exact TD you need.Note: the previous version of this answer was a long rattle about the difference between
Elements.select()
andElement.select()
. It turns out that I was dead wrong and your original version should have worked - if you had corrected the four points above. So to set the record straight:select()
on anElements
actually does look inside each element and the resulting list may contain descendents of any of the elements in the original list that match the selection. Sorry about that.