Java Goose not extracting content on Android

351 Views Asked by At

I'm trying to set up a small Android application which extracts content from a web page using the Goose library. Since the library is written in Scala, I'm using the .jar I found here. The problem is, when I try to extract content from a page, it returns nothing. I successfully create an Article object using the URL I need, but the values of the object (title, domain, topImage etc.) are all null. I tried using different urls, to see if the problem was isolated to a single website, but it doesn't appear to be so.

The code I use to set up the Goose instance is this:

gooseDir = context.getCacheDir();
Configuration config = new Configuration();
config.setLocalStoragePath(gooseDir.getAbsolutePath());
Goose goose = new Goose(config);

And then I just create the Article instance like so:

Article article = goose.extractContent(url);

Any advice?

1

There are 1 best solutions below

0
On

Actually you can't use the Goose library on Android due to incompatibilities, but you can use my Android version: https://github.com/milosmns/goose

It does almost the same thing as Goose, only works well on Android.