How to import data via Spark connection into R environment from cluster?

120 Views Asked by At

I followed this link to make a connection with Spark and my R server.

Connection b/w R studio server pro and hive on GCP

I can see my dataframe but cannot call it into R environment to run analysis on. Can anyone please suggest me the correct way ?

library(sparklyr)
library(dplyr)
sparklyr::spark_install()
#config
Sys.setenv(SPARK_HOME="/usr/lib/spark")
config <- spark_config()
#connect
sc <- spark_connect(master="yarn-client",config = config,version="2.2.1")

I can see my table "rdt" , but when I call it says object not found.

rdt table

this is what i tried :

  data <- rdt

that gives error like so : Error: object 'rdt' not found

then the only way was to put the file directly into the cluster and set working directory to call it (beats the purpose then .. ) I want to call it, how we would usually import a df, in this case from sparklyr connection

    setwd("~/Directory")
    data2 <- read.csv("rdt.csv",header = TRUE)
    str(data2)
0

There are 0 best solutions below