How to write in global list with rdd?
Li = []
Fn(list):
If list.value == 4:
Li.append(1)
rdd.mapValues(lambda x:fn(x))
When I try to print Li the result is: []
What I'm trying to do is to transform another global liste Li1 while transforming the rdd object. However, when I do this I have always an empty list in the end. Li1 is never transformed.
The reason why you get
Livalue set to[]after executingmapValues - is because Spark serializesFnfunction (and all global variables that it references - it is called closure) and sends to an another machine - worker.But there is no exactly corresponding mechanism for sending results with closures back from worker to driver.
In order to receive results - you need to return from your function and use action like
take()orcollect(). But be careful - you don't want to send back more data than can fit into driver's memory - otherwise Spark app will throw out of memory exception.Also you have not executed action on your RDD
mapValuestransformation - so in your example no task were executed on workers.would result in
Edi
Following your problem description (based on my understanding of what you want to do):