I have a spark cluster setup with 1 master node and 2 worker nodes. I am running a pyspark application in this spark standalone cluster where I have a job to write the transformed data into Mysql database.
So, I have a question here whether writing to database is done by driver or executor? Because when writing to a textfile, it's done by driver since my output file gets created in driver
Updated
Adding below the code I have used to write to a text file
from pyspark import SparkConf,SparkContext
if __name__ =="__main__":
sc = SparkContext(master = "spark://IP:PORT",appName='word_count_application')
words = sc.textFile("book_2.txt")
word_count = words.flatMap(lambda a : a.split(" ")).map(lambda a : (a,1)).reduceByKey(lambda a,b : a+b)
word_count.saveAsTextFile("book2_output.txt")
If the writing is done using dataset/datafame api like this:
Then it's done by the executors, that why in spark we have multiple files in the output because each executor will write each partition defined inside it.
The driver is used for scheduling work across the executors, and not for doing the actual work ( reading, transforming and writing) which will be done by the executors