webharvest implementation in eclipse

560 Views Asked by At

I have a XML config (ScreenScraper) that does what I want correctly in the executable version of WebHarvest. I am confused on how to execute it through Java.

1

There are 1 best solutions below

0
On

All you need is import some classes from library:

import org.webharvest.definition.ScraperConfiguration;
import org.webharvest.runtime.Scraper;
import org.webharvest.runtime.variables.Variable;

create object ScraperConfiguration with your config.xml file:

    ScraperConfiguration config = null;
    try {
        config = new ScraperConfiguration("/path/to/config.xml");
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }

create object Scraper with path to working dir:

    Scraper scraper = new Scraper(config, "/tmp/");

and execute configuration:

    scraper.execute();

You can also access variables after configuration execution:

    String stringVar =
        ((Variable)scraper.getContext().getVar("my_string_var")).toString();
    List<Variable> listVar =
        ((Variable) scraper.getContext().getVar("my_list_var")).toList();

You can see example here

And also API here