Run SparkR | or R package on my Cloudera 5.9 Spark

494 Views Asked by At

I have 3 node cluster having Cloudera 5.9 running on CentOS 6.7. I need to connect my R packages (running on my Laptop) to the Spark runing in cluster mode on Hadoop.

However If I try to connect the local R through Sparklyr Connect to Hadoop Spark it is giving Error. As it is searching the Spark home on the laptop itself.

I googled and found we can install SparkR and use R with Spark. However I have few questions regarding the same.

  1. I have downloaded the tar file from https://amplab-extras.github.io/SparkR-pkg/ But my question is I directly copy it to my Linux server and install?
  2. Do I have to Stop/delete my existing Spark which is NOT Stand Alone and using Yarn i.e. it is running in Cluster mode? or SparkR can just run on top of it, If I install it on the server?
  3. Or do I have to run Spark on Stand Alone mode (get Spark gateways running and Start master/slave using script) and install the package from linux command line on top of it?
  4. If it get installed will I be able to access it through CM UI?

Please help, I am new in this and really need guidance.

Thanks, Shilpa

3

There are 3 best solutions below

0
On

I installed R studio on CentOS and got a e-GUI from below link http://devopspy.com/linux/install-r-rstudio-centos-7/

Later I tried to install sparklyr but faced lot of issue. Finally resolved it by installing:

sudo yum install libcurl-devel
sudo yum install openssl-devel
sudo yum install libgit2-devel

Later you may normally install sparklyr package.

0
On
0
On

The best way to install R and then install SparkR on top of it is here : http://blog.clairvoyantsoft.com/2016/11/installing-sparkr-on-a-hadoop-cluster/

I was able to install them following this link. It is really useful and latest.

thanks, Shilpa