Automated Integration testing for a spark job

1.1k Views Asked by At

I am using the FailSafe plugin of Maven along with Maven-Docker-Plugin to implement automated integration testing for a spark job.

My project is checked in here

https://github.com/abhitechdojo/maven-scala-spark

My docker file to start the spark job looks like this

FROM cloudera/quickstart

MAINTAINER abhishek "http://www.foobar.com"

# COPY THE UBER JAR IN THE / OF THE FILE SYSTEM

ADD /SparkIntegrationTestsAssembly.jar /

# CREATE INPUT DIRECTORY
CMD ["hadoop", "fs", "-mkdir", "-p", "/input"]

# COPY INPUT FILES
CMD ["hadoop", "fs", "-put", "/twitter.avro", "/input/twitter.avro"]

EXPOSE 8020:8020
EXPOSE 50070:50070
EXPOSE 50010:50010
EXPOSE 50020:50020
EXPOSE 50075:50075
EXPOSE 8030:8030
EXPOSE 8031:8031
EXPOSE 8032:8032
EXPOSE 8033:8033
EXPOSE 8088:8088
EXPOSE 8040:8040
EXPOSE 8042:8042
EXPOSE 10020:10020
EXPOSE 19888:19888
EXPOSE 11000:11000
EXPOSE 8888:8888
EXPOSE 18080:18080
EXPOSE 7077:7077

# RUN ALL SERVICES
ENTRYPOINT ["/usr/bin/docker-quickstart"]

# RUN SPARK JOB
CMD ["spark-submit", "--class", "com.abhi.HelloWorld", "--master", "local[2]", "SparkIntegrationTestsAssembly.jar", "/input", "/output"]

But when I try to start a container based on this image... it just goes on saying

[INFO] Waiting for container 'cloudera/quickstart' to finish startup (max 10000 sec.)
[INFO] Waiting for container 'cloudera/quickstart' to finish startup (max 10000 sec.)
[INFO] Waiting for container 'cloudera/quickstart' to finish startup (max 10000 sec.)
[INFO] Waiting for container 'cloudera/quickstart' to finish startup (max 10000 sec.)
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 07:50 min
[INFO] Finished at: 2016-04-15T01:45:02-05:00
[INFO] Final Memory: 37M/831M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal net.wouterdanes.docker:docker-maven-plugin:5.0.0:start-containers (start) on project SparkIntegrationTest: Execution start of goal net.wouterdanes.docker:docker-maven-plugin:5.0.0:start-containers failed: java.net.SocketException: Socket closed -> [Help 1]

My POM.xml is too large to post here but let me just provide the link here

https://github.com/abhitechdojo/maven-scala-spark/blob/master/pom.xml

I was hoping that the maven-docker-plugin would run the image based on my dockerfile. the file would initiate all services via entrypoint and then run the spark job with spark-submit. the plugin would wait to see "Stopping spark context" line.

Once that line has been seen, integration tests will be run, which will validate the data produced by the spark job.

but somehow the plugin just waits forever. I have tried increasing the timeout to 2000 but still the same behavior.

0

There are 0 best solutions below