Unable to Inject URL seed file in stormcrawler

47 Views Asked by At

I am new to stormcrawler and using SC with Elasticsearch. I unable to inject urls (seeds.txt). there are 10 URLs in the seed file. I am following the README file instructions. Here is the command that I am using to inject the URLs:

storm local target/stormcrawler-1.0-SNAPSHOT.jar --local-ttl 3600 com.digitalpebble.ESCrawlTopology -- -conf crawler-conf.yaml -conf es-conf.yaml . seeds.txt

Here is the error while injecting the URLs;

21:08:04.267 [I/O dispatcher 2] ERROR c.d.s.e.p.AggregationSpout -  Exception with ES query
org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution
at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:80) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:72) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.ListenableFuture.notifyListenerDirectly(ListenableFuture.java:112) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.ListenableFuture.done(ListenableFuture.java:100) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.BaseFuture.setException(BaseFuture.java:149) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.ListenableFuture.onFailure(ListenableFuture.java:147) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.client.RestHighLevelClient$5.onFailure(RestHighLevelClient.java:2756) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.client.RestClient$FailureTrackingResponseListener.onDefinitiveFailure(RestClient.java:686) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.client.RestClient$1.failed(RestClient.java:422) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.concurrent.BasicFuture.failed(BasicFuture.java:137) ~[httpcore-4.4.16.jar:4.4.16]
at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.executionFailed(DefaultClientExchangeHandlerImpl.java:101) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.failed(AbstractClientExchangeHandler.java:426) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.endOfInput(HttpAsyncRequestExecutor.java:356) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.DefaultNHttpClientConnection.consumeInput(DefaultNHttpClientConnection.java:261) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:81) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.client.InternalIODispatch.onInputReady(InternalIODispatch.java:39) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.inputReady(AbstractIODispatch.java:114) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.readable(BaseIOReactor.java:162) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvent(AbstractIOReactor.java:337) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.processEvents(AbstractIOReactor.java:315) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:276) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
Caused by: java.util.concurrent.ExecutionException: org.apache.http.ConnectionClosedException: Connection is closed
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.getValue(BaseFuture.java:257) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.BaseFuture$Sync.get(BaseFuture.java:231) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.BaseFuture.get(BaseFuture.java:53) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
at org.elasticsearch.common.util.concurrent.FutureUtils.get(FutureUtils.java:65) ~[stormcrawler-1.0-SNAPSHOT.jar:?]
... 22 more
Caused by: org.apache.http.ConnectionClosedException: Connection is closed
... 12 more
21:08:05.298 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027 10:parse       JSoupParserBolt         {}
21:08:05.434 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027 17:tika       ParserBolt             {}
21:08:05.817 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       num_queues             0
21:08:05.817 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       fetcher_average_perdoc {}
21:08:05.817 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       fetcher_counter         {}
21:08:05.817 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       activethreads           0
21:08:05.817 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       fetcher_average_persec {}
21:08:05.818 [Thread-55] INFO  o.a.s.m.LoggingMetricsConsumer - 1702174085          storm:1027  6:fetch       in_queues               0
21:08:06.269 [Thread-39-spout-executor[14, 14]] INFO  c.d.s.e.p.AggregationSpout -  Populating buffer with nextFetchDate <= 2023-12-09T21:07:37-05:00
21:08:06.275 [I/O dispatcher 3] ERROR c.d.s.e.p.AggregationSpout -  Exception with ES query
org.elasticsearch.common.util.concurrent.UncategorizedExecutionException: Failed execution
at org.elasticsearch.common.util.concurrent.FutureUtils.rethrowExecutionException(FutureUtils.java:80) ~[stormcrawler-1.0-SNAPSHOT.jar:?]

I tried couple of times and reduce the URLs but the issue remain same. Help would be much appreciated.

0

There are 0 best solutions below