Pyspark use customized function to store each row into a self defined object, for example a node object

155 Views Asked by kevin wang At 29 July 2025 at 16:49

Is there a way to utilize the map function to store each row of the pyspark dataframe into a self-defined python class object?

pyspark dataframe

For example, in the picture above I have a spark dataframe, I want to store every row of id, features, label into a node object (with 3 attributes node_id, node_features, and node_label). I am wondering if this is feasible in pyspark. I have tried something like

for row in df.rdd.collect() do_something (row)

but this can not handle big data and is extremely slow. I am wondering if there is a more efficient way to resolve it. Much thanks.

Original Q&A

There are 1 best solutions below

Mohan On 13 July 2020 at 18:34

You can use foreach method for your operation. The operation will be parallelized in spark.

Refer Pyspark applying foreach if you need more details.

Pyspark use customized function to store each row into a self defined object, for example a node object

There are 1 best solutions below

Related Questions in DATAFRAME

Related Questions in PYSPARK

Related Questions in TRANSFORM

Related Questions in PYTHON-OBJECT

Trending Questions

Popular # Hahtags

Popular Questions