i try to join two tables by a list which was created from an other table. But i didn't find a way how it works.
Base would be:
data = [('B_YEAR, B_ORG', 'YEAR, ORG', 1), ('B_YEAR', 'YEAR', 2) ]
test = spark.createDataFrame(data, ['key_orig', 'key_map', 'Fil_Number'])
fc_year == 2024
Now i try to join:
for i in range (1,x):
test_1= test.filter(col("Fil_Number")==1)
list_key_orig = test_1.select("key_orig").collect()
list_key_map = test_1.select("key_map").collect()
df_calc_values =spark.read.table("hive_metastore.reporting_datalake.df_calc_fc_values")
df_calc_values = df_calc_values.filter((col("GJ")==fc_year))
display(df_calc_values)
df_new= df_orig.join(df_calc_values, on = (key_orig1.key_orig == key_map1.key_map) ,how = "left")
I already searched in google but no result.
Tables:


First of all your question is a disaster! There are no insights where the original DataFrame came from. Also your for loop is not correct indented. So it is not feasible to recognize what is part of your loop and what not. your
fc_yearvariable is not declared. You are importing all namespacing from the functions module of pyspark byfrom pyspark.sql.functions import *that you should absolutely not do to avoid naming conflicts and stuff like that in your code. Also you cannot expect an answer if a question plucked out of air and without any relations to each other.Anyway I tried to provide you an answer on how you can build a joining string out of your given information and how to join two DataFrames by this string.