Missing dependencies in Apache Crunch Scala build

104 Views Asked by At

I'm trying to build the Apache Crunch source code on my CentOS 7 machine, but am getting the following error in the crunch-spark project when I execute mvn package:

[ERROR] /home/bwatson/programming/git/crunch/crunch-spark/src/it/scala/org/apache/crunch/scrunch/spark/PageRankClassTest.scala:71: error: bad symbolic reference. A signature in PTypeH.class refers to term protobuf
[ERROR] in package com.google which is not available.
[ERROR] It may be completely missing from the current classpath, or the version on
[ERROR] the classpath might be incompatible with the version used when compiling PTypeH.class.
[ERROR]       .map(line => { val urls = line.split("\\t"); (urls(0), urls(1)) })
[ERROR]           ^

Other SO questions about similar errors (here and here) seem to involve PATH or version issues. I've been messing around but can't seem to resolve them. For completeness:

[bwatson@ben-pc crunch]$ scala -version
Scala code runner version 2.11.5 -- Copyright 2002-2013, LAMP/EPFL

[bwatson@ben-pc crunch]$ java -version
java version "1.8.0_31"
Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)

[bwatson@ben-pc crunch]$ mvn -version
Apache Maven 3.0.5 (Red Hat 3.0.5-16)
Maven home: /usr/share/maven
Java version: 1.8.0_31, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_31/jre
Default locale: en_GB, platform encoding: UTF-8
OS name: "linux", version: "3.10.0-123.20.1.el7.x86_64", arch: "amd64", family: "unix"

Any advice? I'm not really sure where Scala is looking for its dependencies, but I'd have thought that Maven would take care of it.

2

There are 2 best solutions below

3
On BEST ANSWER

It turns out the official documentation for Crunch was missing a Maven parameter. The issue was solved by building using:

mvn package -Dcrunch.platform=2
2
On

Unfortunately Different versions of Scala are binary incompatible. Currently by default Apache Spark uses Scala 2.10.4, not Scala 2.11. Apache Scrunch is dependent on Spark. Maven does not know anything about this so it can't help. It is necessary to make some modifications to Scrunch to get it to compile for Scala 2.11 / JDK 1.8. I am working on this at the moment, but I don't have a solution yet. However I get the error message you report if I compile Scala 2.10.4 with JDK 1.8, not Scala 2.11, so I don't think it is doing quite what you intend. The error seems be coming from the Protobuf compiler or jar but I don't know why that is.

When I solve it myself, I will report back!