Sharing data across executors in Apache spark

883 Views Asked by A Learner At 18 December 2018 at 04:51

My SPARK project (written in Java) requires to access (SELECT query results) different tables across executors.

One solution to this problem is :

However, I have found that

there many complex queries whose result cant be stored directly in Map
Tables are very large and hence creating Map of large size and passing it to executors as a broadcast variable doesn't sound efficient.

Instead can we load tables in-memory using load which can be shared across executors?

Is void org.apache.spark.sql.Dataset.createOrReplaceTempView(String viewName)

or void org.apache.spark.sql.Dataset.createGlobalTempView(String viewName) throws AnalysisException

Method useful for this purpose?

SPARK VERSION : 2.3.0

Original Q&A

There are 1 best solutions below

Lior Chaga On 18 December 2018 at 06:28 BEST ANSWER

You can broadcast a DataFrame. See documentation