Run non-spark python code on spark to use its distributive compute to optimize the performance

77 Views Asked by thecuriousone At 11 March 2024 at 06:07

Can I use my existing native python code (non-pyspark code) in spark to use its fast processing and distributed feature? I do not want to edit my existing python code to make it pyspark code... and just want to run it as it-is in spark(stand-alone)?? is it possible to do using spark-submit or any other way so that I can avail spark and run my non-spark python code? I would really appreciate anyone's help/steps to overcome this problem?

TIA.

P.S: I am trying to do spark-submit on a linux server(having spark installed) but unable to achieve this

Example, abc.py is a python script having non-pyspark, native python code I cannot make changes to the code but want to run the above python file in spark to use its distributive compute, can I do that using spark-submit or any other way?? Note: I can not make any changes to the python file I have and this python file has no code written in pyspark

Original Q&A

There are 1 best solutions below

thebluephantom On 11 March 2024 at 10:26

Without using RDD's or rather DataFrames in the main these days, no parallelization will occur. Nor for pandas dataframe.

That is to say, no point in running on Spark. Can run it on Databricks of course and minimize platforms.

Run non-spark python code on spark to use its distributive compute to optimize the performance

There are 1 best solutions below

Related Questions in PYTHON

Related Questions in APACHE-SPARK

Related Questions in SPARK-SUBMIT

Trending Questions

Popular # Hahtags

Popular Questions