How to create and read createOrReplaceGlobalTempView when using static clusters

184 Views Asked by At

In my deployment.yaml file I have defined a static cluster as such:

custom:
  basic-cluster-props: &basic-cluster-props
    spark_version: "11.2.x-scala2.12"

  basic-static-cluster: &basic-static-cluster
    new_cluster:
      <<: *basic-cluster-props
      num_workers: 1
      node_type_id: "Standard_DS3_v2"

I use this for all of my tasks. In one of the tasks, I save a DataFrame using:

transactions.createOrReplaceGlobalTempView("transactions")

And in another task (which is depended on the previous task), I try to read the temporary view as such:

global_temp_db = session.conf.get("spark.sql.globalTempDatabase")

# Load wallet features
transactions = session.sql(f"""SELECT *
                               FROM """ + global_temp_db + """.transactions""")

But I get the error:

AnalysisException: Table or view not found: global_temp.transactions; line 2 pos 43;
'Project [*]
+- 'UnresolvedRelation [global_temp, transactions], [], false

Both tasks run within the same SparkSession, so why can it not find my global temp view?

1

There are 1 best solutions below

0
On BEST ANSWER

Unfortunately this won't work unless you're using a cluster-reuse feature (otherwise you have a new cluster each time, therefore you won't be able to cross-reference this view).

A more pythonic approach would be to add the code that initializes the view in every task, e.g. if you're using the pre-defined Task class:


class TaskWithPreInitializedView(Task):

  def _add_transactions_view(self):
    transactions = ... # some code to define the view
    transactions.createOrReplaceGlobalTempView(...)

  def launch(self):
    self._add_transactions_view()

class RealTask(TaskWithPreInitializedView):
  def launch(self):
    super(RealTask).launch()
    ... # your code

Since view creation is a very cheap operation which doesn't take much time, this is a quite efficient approach.