How to optimise a left join where both the tables are quite big (1 tb size and other 250gb)

79 Views Asked by codingprabhu At 26 September 2023 at 18:07

I have 8 tables, out of which one is 1 TB and other 7 are roughly around 270 GB. Each of the 7 tables needs to be left joined with the first table to get all the columns of first table and an extra column from second table which leads to 7 new columns in addition to the first table.

How can I optimize the join using Spark SQL on both an application level and other at spark config level.

PS: As all the tables have been loaded into memory, writing it again and reading it is not an option. Also we cannot go with broadcast since the size is not small.

Tried caching and joining like spark.sql("cache table A") all the tables and then joining

Original Q&A

How to optimise a left join where both the tables are quite big (1 tb size and other 250gb)

There are 0 best solutions below

Related Questions in APACHE-SPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in LEFT-JOIN

Related Questions in SPARKCORE

Trending Questions

Popular # Hahtags

Popular Questions