Additional Spark' installation's access to HDFS and Hive

187 Views Asked by At

I have Cloudera Quickstart VM 5.5.0 installed and it has Spark v1.5.0 bundled. When I launch spark-shell

scala> val data = sc.textFile("/hdfs/path/file.csv")

Spark was able to read from HDFS (proven by using data.first), even without the hdfs://namenode:port/ URL. Because I have a use-case for older version of Spark, namely, v1.4.0. I installed / untar an older version to HOME dir.

When I try to do the same thing, the sc.textFile points to the Linux local file system instead of HDFS. How can I make additional Spark's installation point to HDFS, even without specifying hdfs://namenode:port/ URL?

The second thing is, to access Hive tables, I copy the hive-site.xml to Spark's conf dir. After doing that in Spark default installation, I can easily query Hive tables:

scala> val orders = sqlContext.sql("SELECT * FROM default.orders")

scala> orders.limit(5).foreach(println)

and this will display the rows.

When I try to do the same thing on Spark v1.4, I get errors. How can I access Hive tables the same way as default installation do?

0

There are 0 best solutions below