nutch error: Illegal to have multiple roots (start tag in epilog?)

110 Views Asked by At
$ bin/nutch inject crawl/crawldb urls
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.c
lass]
SLF4J: Found binding in [jar:file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.cla
ss]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.RuntimeException: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:3092)
        at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:3041)
        at org.apache.hadoop.conf.Configuration.loadProps(Configuration.java:2914)
        at org.apache.nutch.crawl.Injector.main(Injector.java:533)
Caused by: com.ctc.wstx.exc.WstxParsingException: Illegal to have multiple roots (start tag in epilog?).
 at [row,col,system-id]: [9,2,"file:/C:/Users/Gjergj%20Kadriu/Documents/apache-nutch-1.19/conf/nutch-site.xml"]
        at com.ctc.wstx.sr.StreamScanner.constructWfcException(StreamScanner.java:634)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:504)
        at com.ctc.wstx.sr.StreamScanner.throwParseError(StreamScanner.java:488)
        ... 13 more

Tried different configurations of nutch-site.xml using default from nutch-default, I'm using cygwin in windows 10. Tried enviromental variables troubleshooting etc, nothing work. Any ideas on how to approach this error?

1

There are 1 best solutions below

0
Sebastian Nagel On

The file nutch-site.xml is required to be a valid XML document. The error message indicates that there are multiple root elements. For example, the error is reproducible with the following nutch-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
  <name>http.agent.name</name>
  <value>my-first-web-crawler</value>
</property>
</configuration>
<configuration>
</configuration>

Once the XML syntax is fixed, Nutch should be able to read the configuration file.