Hadoop error when using spark-submit

1.1k Views Asked by At

I am trying to spark-submit using Amazon ec2 with the following:

spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.1 --master spark://amazonaws.com SimpleApp.py

and I end up with the following error. It seems to be that it is looking for hadoop. My ec2 cluster was created using spark-ec2 command.

Ivy Default Cache set to: /home/adas/.ivy2/cache
The jars for the packages stored in: /home/adas/.ivy2/jars
:: loading settings :: url = jar:file:/home/adas/spark/spark-2.1.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 66439ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.apache.hadoop#hadoop-aws;2.7.1

    ==== local-m2-cache: tried

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== local-ivy-cache: tried

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/ivys/ivy.xml

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/jars/hadoop-aws.jar

    ==== central: tried

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.apache.hadoop#hadoop-aws;2.7.1: not found

        ::::::::::::::::::::::::::::::::::::::::::::::


:::: ERRORS
    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hadoop#hadoop-aws;2.7.1: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2

There are 2 best solutions below

1
On BEST ANSWER

Sorry everyone, it was just some local proxy issues.

2
On

You are submitting the job with --packages org.apache.hadoop:hadoop-aws:2.7.1 option and job is attempting to resolve the dependencies by downloading the packages from public maven repo. However, this error indicates it's unable to reach the maven repo.

Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

You might want to check if the spark master has access to internet.