Run non-spark python code on spark to use its distributive compute to optimize the performance

77 Views Asked by At

Can I use my existing native python code (non-pyspark code) in spark to use its fast processing and distributed feature? I do not want to edit my existing python code to make it pyspark code... and just want to run it as it-is in spark(stand-alone)?? is it possible to do using spark-submit or any other way so that I can avail spark and run my non-spark python code? I would really appreciate anyone's help/steps to overcome this problem?

TIA.

P.S: I am trying to do spark-submit on a linux server(having spark installed) but unable to achieve this

Example, abc.py is a python script having non-pyspark, native python code I cannot make changes to the code but want to run the above python file in spark to use its distributive compute, can I do that using spark-submit or any other way?? Note: I can not make any changes to the python file I have and this python file has no code written in pyspark

1

There are 1 best solutions below

16
thebluephantom On

Without using RDD's or rather DataFrames in the main these days, no parallelization will occur. Nor for pandas dataframe.

That is to say, no point in running on Spark. Can run it on Databricks of course and minimize platforms.