I have 8 tables, out of which one is 1 TB and other 7 are roughly around 270 GB. Each of the 7 tables needs to be left joined with the first table to get all the columns of first table and an extra column from second table which leads to 7 new columns in addition to the first table.
How can I optimize the join using Spark SQL on both an application level and other at spark config level.
PS: As all the tables have been loaded into memory, writing it again and reading it is not an option. Also we cannot go with broadcast since the size is not small.
Tried caching and joining like spark.sql("cache table A") all the tables and then joining