we are using pyspark to query elastic search
we have 2 indexs:
index1- with 20 docs
index2 - with 100000 docs
dataframe1 is a join between 2 dataframes:
dataframe3 - queries index1 (returns 1 row on dataframe3.collect())
dataframe4 - queries index2 (returns 1 row on dataframe4.collect())
dataframe1 = dataframe3.join(dataframe4) when i call dataframe1.collect() it returns 1 row immediatly
dataframe2 queries index1 with different query (returns 1 row on dataframe2.collect())
when i do
dataframe1.union(dataframe2).collect() it gets stuck....
what is vert strange is when i don use dataframe4 in the join everything works fine....
i am using elasticsearch-spark-30_2.12-8.9.0 Elasticsearch 8.9.2
please help