crawler4j compile error with class CrawlConfig - VariableDeclaratorId Expected

Question

crawler4j compile error with class CrawlConfig - VariableDeclaratorId Expected

752 Views Asked by Trevor Oakley At 08 August 2013 at 06:48

The code will not compile. I changed the JRE to 1.7. The compiler does not highlight the class in Eclipse and the CrawlConfig appears to fail in the compiler. The class should be run from the command line in Linux.

Any ideas?

Compiler Error - Description Resource Path Location Type Syntax error on token "crawlStorageFolder", VariableDeclaratorId expected after this token zeocrawler.java /zeowebcrawler/src/main/java/com/example line 95 Java Problem

import edu.uci.ics.crawler4j.crawler.CrawlConfig;
import edu.uci.ics.crawler4j.crawler.CrawlController;
import edu.uci.ics.crawler4j.crawler.Page;
import edu.uci.ics.crawler4j.crawler.WebCrawler;
import edu.uci.ics.crawler4j.fetcher.PageFetcher;
import edu.uci.ics.crawler4j.parser.HtmlParseData;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtConfig;
import edu.uci.ics.crawler4j.robotstxt.RobotstxtServer;
import edu.uci.ics.crawler4j.url.WebURL;

    public class Controller {   

         String crawlStorageFolder = "/data/crawl/root";
        int numberOfCrawlers = 7;

        CrawlConfig config = new CrawlConfig();

        config.setCrawlStorageFolder(crawlStorageFolder);

        PageFetcher pageFetcher = new PageFetcher(config);
        RobotstxtConfig robotstxtConfig = new RobotstxtConfig();
        RobotstxtServer robotstxtServer = new RobotstxtServer(robotstxtConfig, pageFetcher);
        CrawlController controller = new CrawlController(config, pageFetcher, robotstxtServer);

           controller.addSeed("http://www.senym.com");
            controller.addSeed("http://www.merrows.co.uk");
            controller.addSeed("http://www.zeoic.com");


            controller.start(MyCrawler.class, numberOfCrawlers);








    }
    public URLConnection connectURL(String strURL) {
        URLConnection conn =null;
        try {
        URL inputURL = new URL(strURL);
        conn = inputURL.openConnection();
            int test = 0;

        }catch(MalformedURLException e) {
        System.out.println("Please input a valid URL");
        }catch(IOException ioe) {
        System.out.println("Can not connect to the URL");
        }
        return conn;
        }

    public static void updatelongurl()
    {

//      System.out.println("Short URL: "+ shortURL);
//        urlConn =  connectURL(shortURL);
 //       urlConn.getHeaderFields();
 //       System.out.println("Original URL: "+ urlConn.getURL());

/* connectURL - This function will take a valid url and return a 
URL object representing the url address. */



    }






    public class MyCrawler extends WebCrawler {

        private      Pattern FILTERS = Pattern.compile(".*(\\.(css|js|bmp|gif|jpe?g" 
                                                          + "|png|tiff?|mid|mp2|mp3|mp4"
                                                          + "|wav|avi|mov|mpeg|ram|m4v|pdf" 
                                                          + "|rm|smil|wmv|swf|wma|zip|rar|gz))$");

        /**
         * You should implement this function to specify whether
         * the given url should be crawled or not (based on your
         * crawling logic).
         */
        @Override
        public boolean shouldVisit(WebURL url) {
                String href = url.getURL().toLowerCase();
                return !FILTERS.matcher(href).matches() && href.startsWith("http://www.ics.uci.edu/");
        }

        /**
         * This function is called when a page is fetched and ready 
         * to be processed by your program.
         */
        @Override
        public void visit(Page page) {          
                String url = page.getWebURL().getURL();
                System.out.println("URL: " + url);

                if (page.getParseData() instanceof HtmlParseData) {
                        HtmlParseData htmlParseData = (HtmlParseData) page.getParseData();
                        String text = htmlParseData.getText();
                        String html = htmlParseData.getHtml();
                        List<WebURL> links = htmlParseData.getOutgoingUrls();

                        System.out.println("Text length: " + text.length());
                        System.out.println("Html length: " + html.length());
                        System.out.println("Number of outgoing links: " + links.size());
                }
        }
}

Original Q&A

There are 2 best solutions below

**Julien** · Answer 1 · 2013-08-18T10:03:46.353000

Julien On 18 August 2013 at 10:03

This is a pretty strange error since the code seems to be clean. Try to start eclipse with the -clean option on command line.

**xli** · Answer 2 · 2014-02-02T03:51:14.593000

xli On 02 February 2014 at 03:51

Change

String crawlStorageFolder = "/data/crawl/root";

to

String crawlStorageFolder = "./data/crawl/root";

i.e. add a leading .

crawler4j compile error with class CrawlConfig - VariableDeclaratorId Expected

There are 2 best solutions below

Related Questions in CRAWLER4J

Trending Questions

Popular # Hahtags

Popular Questions