Trouble getting latest Sparkling Water (2.2) to work with R (via rsparkling)

312 Views Asked by At

I'm having issues updating rsparkling to work with Sparkling Water 2.2 and Spark 2.2. Everything worked with previous versions (<2.1).

I have installed the rsparkling version R package that comes with the latest Sparkling Water 2.2 binaries (as per https://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html), and set the sparkling water version to the install location (i.e. options(rsparkling.sparklingwater.location = "/Users/me/sparkling-water-2.2.0/")).

I can now connect to my cluster, but get error

java.lang.ClassNotFoundException: org.apache.spark.h2o.H2OContext

I think this may have to do with the h2o version I am using - 3.14.0.2 which is the version recommended in the install page.

Does anyone know which version of h2o sparkling water 2.2 works with? The rsparkling documentation (https://github.com/h2oai/rsparkling) is not updated to 2.2. Could this error be the result of something else?

I am connecting to a standalone spark cluster, and my setup is:

Cluster/local Spark version: 2.2
R: 3.4.2
RStudio: 1.0.153
Sparklyr: 0.6.2
h2o: 3.14.0.2
rsparkling: 2.1
2

There are 2 best solutions below

0
On

Maybe you don't have correctly install all the dependancies for spark, h2o and Rstudio. I have this issue, and by following the doc I notice that I have not all the package.

This is how I fix the issue for me, following the doc here

Make sure you have devtools installed, in Rstudio run this command: install.packages('devtools').

Then: library(devtools) devtools::install_github("h2oai/rsparkling", ref = "master")

Hope this could hepl you.

0
On

I have met the same problem and I have solved that from aligning h2o and sparkling-water versions.

  1. https://github.com/h2oai/rsparkling shows a version matching table. Since your h2o is 3.14.0.2, the backend spark should be 2.2.0.
  2. https://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/2/index.html have three lines below the download link which indicates the specified version which sparkling water is built on. For example, Sparkling water 2.2.2 matches H2O 3.14.0.7. Here is the key problem: if you use H2O 3.14.0.6 with Sparkling water 2.2.2, or H2O 3.14.0.7 with Sparkling water 2.2.1, your error will be raised.
  3. carefully read these information and select your download solution (choose exactly matched version between sparkling water and H2O, and Spark).

Here is a solution:

Cluster/local Spark version: 2.2 R: 3.4.2 RStudio: 1.0.153 Sparklyr: 0.6.2 h2o: 3.14.0.2 sparkling water 2.2.0, download from https://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/0/index.html

## sparkling water 2.2.0
options(rsparkling.sparklingwater.version = "2.2.0")
options(rsparkling.sparklingwater.location = "/opt/sparkling-water-2.2.0")
library(rsparkling) 

## spark version 2.2.0
sc <- spark_connect(master = "local", version = "2.2.0")

## connect succeed!
h2o_context(sc)