I'm trying to set up a small Android application which extracts content from a web page using the Goose library. Since the library is written in Scala, I'm using the .jar I found here. The problem is, when I try to extract content from a page, it returns nothing. I successfully create an Article
object using the URL I need, but the values of the object (title, domain, topImage etc.) are all null
. I tried using different urls, to see if the problem was isolated to a single website, but it doesn't appear to be so.
The code I use to set up the Goose
instance is this:
gooseDir = context.getCacheDir();
Configuration config = new Configuration();
config.setLocalStoragePath(gooseDir.getAbsolutePath());
Goose goose = new Goose(config);
And then I just create the Article
instance like so:
Article article = goose.extractContent(url);
Any advice?
Actually you can't use the Goose library on Android due to incompatibilities, but you can use my Android version: https://github.com/milosmns/goose
It does almost the same thing as Goose, only works well on Android.