Cannot Get HTML Elements (JSOUP)

1.4k Views Asked by At

I am trying to get the website title and some elements from a site with JSOUP for my Android application. I can get title but cannot get the element (article count for this example) by id. I have tried it with select() and getElementById() methods but both don't work.

Related HTML source code:

<div id="articlecount">
    <a href="/wiki/Special:Statistics"title="Special:Statistics">4,891,985</a> articles in 
    <a href="/wiki/English_language" title="English language">English</a>
</div>

I want to get the article count and show it in tv2 textview.

Java Code:

public class MainActivity extends ActionBarActivity {

String URL = "https://en.wikipedia.org/wiki/Main_Page";
String title;
Element article;
TextView tv1, tv2;
ProgressDialog mProgressDialog;

@Override
protected void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);

    tv1 = (TextView)findViewById(R.id.tv1);
    tv2 = (TextView)findViewById(R.id.tv2);

    new FetchWebsiteData().execute();
}

private class FetchWebsiteData extends AsyncTask<Void, Void, Void> {

    @Override
    protected void onPreExecute() {
        super.onPreExecute();
        mProgressDialog = new ProgressDialog(MainActivity.this);
        mProgressDialog.setMessage("Loading...");
        mProgressDialog.setIndeterminate(false);
        mProgressDialog.show();
    }

    @Override
    protected Void doInBackground(Void... params) {
        try {
            Document doc = Jsoup.connect(URL).get();
            title = doc.title();
            article = doc.select("div#articlecount > a").first();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    @Override
    protected void onPostExecute(Void result) {
        tv1.setText(title + " ...");
        tv2.setText(article.text());
        mProgressDialog.dismiss();
    }   
 }   

 ...
}

Program is stopping execution and giving an error like:

...
06-15 11:34:45.744  13540-13540/com.samet.webparser E/AndroidRuntime﹕ FATAL EXCEPTION: main
  Process: com.samet.webparser, PID: 13540
  java.lang.NullPointerException: Attempt to invoke virtual method 'java.lang.String org.jsoup.nodes.Element.text()' on a null object reference
  at com.samet.webparser.MainActivity$FetchWebsiteData.onPostExecute(MainActivity.java:62)
  at com.samet.webparser.MainActivity$FetchWebsiteData.onPostExecute(MainActivity.java:36)
  at android.os.AsyncTask.finish(AsyncTask.java:632)
  at android.os.AsyncTask.access$600(AsyncTask.java:177)
  at android.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:645)
...

Thanks for your help.

2

There are 2 best solutions below

2
On BEST ANSWER

Did you debug your code? Its obvious that

article = doc.select("div#articlecount > a").first();

returns null. This is also documented in the API:

public Element first() Get the first matched element. Returns: The first matched element, or null if contents is empty.

So your selector seems to be incorrect. First you should debug your code or post the full HTML doc.

Edit: I set up a project and tested your code. During this I compared the HTML input to the original page you use. The problem was the user-agent. When testing this with a mobile device the wikipedia homepage is delivered in a special mobile version which does not match the selector you used. Just fake a desktop agent and you're fine:

Document doc = Jsoup.connect(URL).userAgent("Mozilla").get();
0
On

@and_dev Right. So you can do it.

Element articlecount = doc.getElementById("articlecount");
Element article = articlecount.select("a").first();
System.out.println(article.text()); // My Test