According to the documentation there is an option to use an existing Dataproc cluster in 6.2 version and above.
We use Cloud Data Fusion 6.2.0 but the existing Dataproc does not appear when we try to create a new compute profile.
What are we doing wrong? Why does the described option not show up? Do we have to do some additional configurations?
UPDATE 1
When I choose Dataproc, I see the followings:
UPDATE 2
When we try to use Remote Hadoop Provisioner we got the following error message in the /logs/program.log file. SSH connection is successful because the run-id folder is there.
2021-06-15 09:40:37,617 - ERROR [main:o.a.z.s.NIOServerCnxnFactory@44] - Thread Thread[main,5,main] died
java.lang.reflect.InvocationTargetException: null
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_282]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_282]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_282]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_282]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteLauncher.main(RemoteLauncher.java:73) ~[launcher.jar:na]
Caused by: java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1357) ~[hadoop-common-3.2.2.jar:na]
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1338) ~[hadoop-common-3.2.2.jar:na]
at io.cdap.cdap.common.conf.CConfigurationUtil.copyTxProperties(CConfigurationUtil.java:100) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:62) ~[na:na]
at io.cdap.cdap.common.guice.ConfigModule.<init>(ConfigModule.java:49) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.initialize(RemoteExecutionJobMain.java:117) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.doMain(RemoteExecutionJobMain.java:98) ~[na:na]
at io.cdap.cdap.internal.app.runtime.distributed.remote.RemoteExecutionJobMain.main(RemoteExecutionJobMain.java:73) ~[na:na]
... 5 common frames omitted
I wasn't able to reproduce exactly the scenario since when creating a CDF instance from the scratch I was able to select Cloud Data Fusion 6.2.3 as similar closer version.
I can confirm that on version 6.2.3 you have the option to choose an Existing Dataproc Cluster. Therefore I would recommend to you to upgrade to at least that version. Follow this docs in order to do it in a safe way.
As alternative there is a method to configure Cloud Data Fusion pipeline to run against existing cluster here. This feature is available only on the Enterprise edition of Cloud Data Fusion.