currently, am executing my spark-submit commands in airflow by SSH using BashOperator & BashCommand but our client is not allowing us to do SSH into the cluster, is that possible to execute the Spark-submit command without SSH into cluster from airflow?
Trigger spark submit jobs from airflow on Dataproc Cluster without SSH
1.1k Views Asked by Kriz At
1
There are 1 best solutions below
Related Questions in GOOGLE-CLOUD-PLATFORM
- Google Logging API - What service name to use when writing entries from non-Google application?
- Custom exception message from google endpoints exception
- Unable to connect database of lamp instance from servlet running on tomcat instance of google cloud
- How to launch a Jar file using Spark on hadoop
- Google Cloud Bigtable Durability/Availability Guarantees
- How do I add a startup script to an existing VM from the developer console?
- What is the difference between an Instance and an Instance group
- How do i change files using ftp in google cloud?
- How to update all machines in an instance group on Google Cloud Platform?
- Setting up freeswitch server on Google cloud compute
- Google Cloud Endpoints: verifyToken: Signature length not correct
- Google Cloud BigTable connection setup time
- How GCE HTTP Cross-Region Load Balancing implemented
- Google Cloud Bigtable compression
- Google cloud SDK code to execute via cron
Related Questions in AIRFLOW
- Why is it recommended against using a dynamic start_date in Airflow?
- Airflow XCOM KeyError: 'task_instance'
- Modules in Airflow
- Make Airflow worker quit after N tasks have been completed?
- Execute airflow tasks in for loop
- Airflow - Splitting DAG definition across multiple files
- Proper way to create dynamic workflows in Airflow
- how do I use the --conf option in airflow
- Airflow: Get user logged in with ldap
- Integration of Apache Airflow with Google cloud pubsub
- How to run airflow webserver on port 80
- Airflow picking up queued task very slow
- Airflow DataProcPySparkOperator not considering cluster other than global region
- Unable to install the latest release candidate of a Python library on GitHub
- Airflow - How to pass xcom variable into Python function
Related Questions in SPARK-SUBMIT
- Is there any way to submit spark job using API
- Error while submitting a spark scala job using spark-submit
- How to pass configuration parameters from a file as environment variables for spark job?
- CLI argument with spark-submit while executing python file
- Can we use spark session object without explicitly creating it, if Submit a job by spark-submit
- Spark Program running very slow on cluster
- can spark-submit be used as a job scheduler?
- Setting default.parallelism in spark-submit command
- spark-submit process does not terminate after job completion automatically
- Spark-submit fails with return code 13 for example of wordCount
- How to access the kubectl forwarded port on Spark Kubernetes cluster from spark-submit?
- Docker container on EMR
- How to submit spark jobs without direct access to the datanode dfs port (dfs.datanode.address) (9866) in our case
- Can spark-submit upload dependencies using WebHDFS?
- Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArraySerializer
Related Questions in DATAPROC
- Disk utilization of Dataproc Worker Node is getting increased day by day
- Error connecting to jdbc with pyspark in dataproc
- CPU core allocation in a DataProc cluster in GCP
- Not able to create a dataproc cluster with image version 2.0 404 HTTP response code 22 exit code see output in
- Invalid Argument When Creating Dataproc Cluster on GKE
- Create an email alert for a PySpark job executing on Google Dataproc
- Error installing package from private repository on Dataproc cluster
- Accessing Dataproc Cluster through Apache Livy?
- configuring dataproc with an external hive metastore
- ValueError: unknown enum label "Hudi"
- Using Spark Bigquery Connector on Dataproc and data appears to be delayed by an hour
- Is it possible that i set fully customized metric for auto scale-out with dataproc worker node in GCP (Google Cloud Platform)
- Trigger spark submit jobs from airflow on Dataproc Cluster without SSH
- Installing python packages in Serverless Dataproc GCP
- Dataproc: Can user create workers of different instance types?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
You can use DataprocSubmitJobOperator to submit jobs in Airflow. Just make sure to pass correct parameters to the operator. Take note that the
jobparameter is a dictionary based from Dataproc Job. So you can use this operator to submit different jobs like pyspark, pig, hive, etc.The code below submits a pyspark job:
Airflow run:
Airflow logs:
Dataproc job: