I have a dataframe of a csv file Dataset<Row> fileDf that looks like -
|COUNTRY_ID|COUNTRY_NAME|REGION_ID|
|----------|------------|---------|
|AR |Argentina | 2|
|AU |Australia | 3|
|BE |Belgium | 1|
|BR |Brazil | 2|
I will have multiple single column dataframes (each dataframe corresponds to a single column values) from other csv files
Dataset<Row> singleColumnDf1
|NAMES |
|------------------------|
|Argentina |
|Australia |
|Belgium |
|Brazil |
Dataset<Row> singleColumnDf2
|IDs |
|---------|
| 2|
| 3|
| 1|
| 2|
and so on. I want to take singleColumnDf1 and do join operation with fileDf and find out what columns contains the values from singleColumnDf1 and display the column names. Similarly I want to do the same with singleColumnDf2 and so on. As the fileDf will be constant I want to make join only once beacuse I dont want the overhead, because my main focus is performance and then get the matched column names and I want the result that tells me what columns from fileDf are matched what columns of singleColumnDfs.
For Example I want the final result to be->
*for singleColumnDf1 the matched column is COUNTRY_NAME,*
*for singleColumnDf2 the matched column is COUNTRY_ID*
Is this possible to do in spark can i combine the two singleColumnDfs together and do join once with fileDf and get the results using keys do differentiate the results? Or should I do join for each and every singleColumnDf?