I have Cloudera Quickstart VM 5.5.0 installed and it has Spark v1.5.0 bundled. When I launch spark-shell
scala> val data = sc.textFile("/hdfs/path/file.csv")
Spark was able to read from HDFS (proven by using data.first
), even without the hdfs://namenode:port/
URL.
Because I have a use-case for older version of Spark, namely, v1.4.0. I installed / untar an older version to HOME dir.
When I try to do the same thing, the sc.textFile points to the Linux local file system instead of HDFS. How can I make additional Spark's installation point to HDFS, even without specifying hdfs://namenode:port/
URL?
The second thing is, to access Hive tables, I copy the hive-site.xml to Spark's conf dir. After doing that in Spark default installation, I can easily query Hive tables:
scala> val orders = sqlContext.sql("SELECT * FROM default.orders")
scala> orders.limit(5).foreach(println)
and this will display the rows.
When I try to do the same thing on Spark v1.4, I get errors. How can I access Hive tables the same way as default installation do?