Dynamically adding seeds from a database in Crawler4J

542 Views Asked by At

I am trying to read a list of seed urls from a csv file and loading them into the crawl controller using the codes below:

public class BasicCrawlController {

    public static void main(String[] args) throws Exception {

        ArrayList<String> sl = Globals.INSTANCE.getSeeds();
        System.out.println("Seeds to add: " + sl.size());
        for (int i = 0; i < sl.size(); i++) {
            String url = sl.get(i).toString();
            System.out.println("Adding to seed: " + url);
            controller.addSeed(url);
        }
    controller.start(BasicCrawler.class, numberOfCrawlers);
    }
}

The output I received from the console is as below:

Seeds to add: 3
Adding to seed: http://xxxxx.com
Adding to seed: http://yyyyy.com
Adding to seed: http://zzzzz.com
 INFO [main] Crawler 1 started.
 INFO [main] Crawler 2 started.
 INFO [main] Crawler 3 started.
 INFO [main] Crawler 4 started.
 INFO [main] Crawler 5 started.
 INFO [main] Crawler 6 started.
 INFO [main] Crawler 7 started.
 INFO [main] Crawler 8 started.
 INFO [main] Crawler 9 started.
 INFO [main] Crawler 10 started.
ERROR [Crawler 1] String index out of range: -8, while processing: http://yyyyy.com/
ERROR [Crawler 1] String index out of range: -8, while processing: http://zzzzz.com/
 INFO [Thread-2] It looks like no thread is working, waiting for 10 seconds to make sure...
 INFO [Thread-2] No thread is working and no more URLs are in queue waiting for another 10 seconds to make sure...
 INFO [Thread-2] All of the crawlers are stopped. Finishing the process...
 INFO [Thread-2] Waiting for 10 seconds before final clean up...

Am I missing something to allow dynamic adding of seeds before launching controller.start ?

The rest of the specification of amount of crawlers and all the necessary stuff for crawler4j in the crawl controller has been omitted from the above codes to make it short and easy to read.

0

There are 0 best solutions below