use corelated subquery in pyspark sql

250 Views Asked by At

Tab1 Columns [F,S,E]

F1 S1 R
F1 S2 R2
F1 S3 R1
F2 S1 R2
F2 S4 R4
F1 S4 R

Tab2 Columns [F,S]

F1 S1
F1 S3
F2 S1
F2 S4

TAKE ROWS FROM TAB1 FOR ONLY IF F->S RELATION IS PRESENT IN Tab2

RESULT Columns [F,S,E]

F1 S1 R
F1 S3 R
F2 S4 R4

I have the query now, but am not able to get results with pyspark.I am able to run on MySql db.

I tried to use corelated subquery in spark 2.4.3, but this returns 0 rows.

Tab1.createOrReplaceTempView("Tab1")

Tab2.createOrReplaceTempView("Tab2")

joined_df = spark.sql(
    """SELECT F, S, E FROM Tab1 
       WHERE EXISTS (SELECT * FROM Tab2 WHERE Tab1.F=Tab2.F AND Tab1.S=Tab2.S)"""
)

joined_df.show(10)
0

There are 0 best solutions below