I have a flattened incoming data in the below format in my parquet file:
I want to convert it into the below format where I am non-flattening my structure:
I tried the following:
Dataset<Row> rows = df.select(col("id"), col("country_cd"),
explode(array("fullname_1", "fullname_2")).as("fullname"),
explode(array("firstname_1", "firstname_2")).as("firstname"));
But it gives the below error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: Only one generator allowed per select clause but found 2: explode(array(fullname_1, fullname_2)), explode(array(firstname_1, firstname_2));
I understand it is because you cannot use more than 1 explode in a query. I am looking for options to do the above in Spark Java.
This type of problem is most easily solved with a
.flatMap()
. A.flatMap()
is like a.map()
except that it allows you to output n records for each input record, as opposed to a 1:1 ratio.This results in the following: