I have an array of dimensions 500 x 26. Using the filter operation in pyspark, I'd like to pick out the columns which are listed in another array at row i. Ex: if
a[i]= [1 2 3]
Then pick out columns 1, 2 and 3 and all rows. Can this be done with filter command? If yes, can someone show an example or the syntax?
Sounds like you need to filter columns, but not records. Fo doing this you need to use Spark's map function - to transform every row of your array represented as an RDD. See in my example:
results in