How lambda function in takeOrdered function works in pySpark?

Question

How lambda function in takeOrdered function works in pySpark?

6.4k Views Asked by Sarit Adhikari At 27 July 2025 at 10:43

I can't quite get the behavior of lambda in following code:

rdd = sc.parallelize([5,3,1,2)]
rdd.takeOrdered(3,lambda s: -1*s)

From what I have understood, lambda applies an operation to all elements in a list, so I expected above code to return

[-1,-2,-3]

But it returned

[5,3,2]

What am I missing here?

Original Q&A

There are 7 best solutions below

Ryan On 12 June 2015 at 15:18

rdd.takeOrdered actually accepts a comparator as it's second parameter.

What you want to do is this:

rdd.map(lambda s: -1*s).takeOrdered(3)

That will map your values, and then take the first 3 by order.

I'm not sure what spark is doing with the lamda you're passing it to be honest.

Waqas On 15 July 2015 at 13:21

probably you want to do this

rdd.takeOrdered(3, key = lambda s: (-1*s))

Adrian DV On 19 July 2015 at 21:10

Try mapping first:

rdd = sc.parallelize([5,3,1,2)]
newRDD = rdd.map(lambda s: -1*s)

Then return or print an action (map is a transformation)... e.g.

rdd.collect()

then if you want to take an specific order of the numbers or items (ascending or descending) you can try with takeOrdered("number of items you want, "the order in which you want them to be taken (-1 reverse the order)".

or

newRDD = (rdd
           .map(lambda s: -1*s)
           .takeOrdered(3, lambda s: -1*s))

Malik Daud Ahmad Khokhar On 29 July 2015 at 12:10

The following means get the first 3 elements by descending order, the lambda is basically applied to the ordering attribute and not the final result.

rdd.takeOrdered(3, key = lambda s: -s)

The following means get the first 3 elements by ascending order:

rdd.takeOrdered(3, key = lambda s: s)

What you want to do is use the map function before the takeOrdered, the map function is what is actually applied to each element in the list i.e. map is what is used to modify each value in the list, producing the desired output of [-1, -2, -3]

rdd = sc.parallelize([5,3,1,2])
rdd.map(lambda s: -s).takeOrdered(3, key = lambda s: -s)

dayman On 29 July 2015 at 23:53

It might be easier to think of the second parameter to takeOrdered, the lambda, as a "key extractor" since it doesn't do any transformation on the underlying data.

In the simple case where we've got this array of numbers, the key is just the value

rdd = sc.parallelize([5,3,1,2)]   
rdd.takeOrdered(3, lambda x: x) #[1,2,3]

Or, in the code the you submitted, the items are sorted by the inverse of the value (-5 < -3 < -2 ...).

rdd.takeOrdered(3, lambda x: -x) #[5,3,2]

All you're doing when you give the lambda to takeOrdered is telling it what you'd like it ordered by. If you want additional transformations, they must happen in another step.

To return the output you wanted, you could map the items to their inverse and then take them sorted by the original value (inverse of the inverse):

rdd.map(lambda x: -x)\ #[-5,-3,-1,-2]
   .takeOrdered(3, lambda x: -x) #[-1,-2,-3]

Kiso On 19 December 2015 at 14:13

It's very similar to the existing sorted function in Python. Check out the examples on "key Functions" from this site: https://wiki.python.org/moin/HowTo/Sorting

You started with [5, 3, 1, 2].

Imagine that the keys are attached as [(5, -5), (3, -3), (1, -1), (2, -2)].

Then, you sort it by keys in ascending order so you get: [(5, -5), (3, -3), (2, -2), (1, -1)].

Now, ignore the second element (the key) from each pair: [5, 3, 2, 1]

Then, select the first 3 items: [5, 3, 2]

**matcheek** · Accepted Answer

https://spark.apache.org/docs/1.1.1/api/python/pyspark.rdd.RDD-class.html

takeOrdered(self, num, key=None) Get the N elements from a RDD ordered in ascending order or as specified by the optional key function.

so in your example you are providing an order function.

How lambda function in takeOrdered function works in pySpark?

There are 7 best solutions below

Related Questions in PYTHON

Related Questions in LAMBDA

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Trending Questions

Popular # Hahtags

Popular Questions