In Spark, does the filter function turn the data into tuples?

159 Views Asked by At

Just wondering does the filter turn the data into tuples? For example

val filesLines = sc.textFile("file.txt")
val split_lines = filesLines.map(_.split(";"))

val filteredData = split_lines.filter(x => x(4)=="Blue")

//from here if we wanted to map the data would it be using tuple format ie. x._3 OR x(3)

val blueRecords = filteredData.map(x => x._1, x._2) 

OR

val blueRecords = filteredData.map(x => x(0), x(1))
2

There are 2 best solutions below

0
On BEST ANSWER

No, all filter does is take a predicate function and uses it such that any of the datapoints in the set that return a false when passed through that predicate, then they are not passed back out to the resultant set. So, the data remians the same:

filesLines //RDD[String] (lines of the file)
split_lines //RDD[Array[String]] (lines delimited by semicolon)
filteredData //RDD[Array[String]] (lines delimited by semicolon where the 5th item is Blue

So, to use filteredData, you will have to access the data as an array using parentheses with the appropriate index

0
On

filter will not change the RDD - filtered data would still be RDD(Array[String])