How to connect Data Fusion to Cloud SQL Proxy

2.5k Views Asked by At

I'm on a journey trying to connect Data Fusion with Cloud SQL MySQL with private IP. I've read many ressources and it seems that it is possible (at least I'm still not convinced that it is not possible). What I have so far:

  • a Data Fusion private instance with a private IP.
  • a Cloud SQL for MySQL instance with private IP.
  • a Cloud SQL Proxy deployed on a virtual machine.
  • everything is connected to the same default VPC network.
  • firewall fully open (Ingress, Egress on IP ranges: 0.0.0.0/0 and all protocal ports)

from my VM instance I can connect to the MySQL db using the following command mysql -u root –host 127.0.0.1 –port 3306. When trying to use the same parameters in Cloud Fusion I'm not able to establish the connection. What should, what can I check to make sure that all this is correctly setup.

EDIT

I've initially accepted the answer from Ajai but then unaccepted it as I'm not able to make the connection work in a new project. There is probably an element, something that's need to be done somewhere, that is missing here.

2

There are 2 best solutions below

0
On BEST ANSWER

I've successfully recreated the environment and here are the detailed steps, perhaps you missed a step along the way:

  1. Create a subnet in a VPC with Private Google Access Configuring Private Google Access
  2. Create a Private Cloud Data Fusion instance attached to the same VPC
  3. Create a firewall rule allowing the allocated Service Networking range to access the proxy VM on port 3307
  4. Create a Private CloudSQL MySQL instance attached to the same VPC
  5. Created a VPC Peering between Cloud Data Fusion and the same VPC as per the steps outlined in Set up VPC Network Peering
  6. Deployed a VM in the subnet on step 1
  7. Deployed the CloudSQL Proxy via the steps outlined in Install the Cloud SQL Auth proxy
  8. Executed the Cloud SQL Proxy with the following command line (note, 0.0.0.0 allows binding to all IPs):
    *./cloud_sql_proxy -instances=<Instance Connection Name>=tcp:0.0.0.0:3307
  1. Ran the test on the CDF console: Successful Connection

Once you've verified the above, you can then automate the CloudSQL Proxy as a linux service or startup script.

P.S. thanks for quoting our article!

Edit:

If you want to use the docker version of the proxy, use the following in place of steps 7 & 8 as per Ajai's answer:

sudo docker run -d \
  -p 0.0.0.0:3307:3307 \
  gcr.io/cloudsql-docker/gce-proxy:latest /cloud_sql_proxy \
  -instances=<instance connection name>=tcp:0.0.0.0:3307

Edit 2

The 2 key things to point out about the proxy is that you might already have 3306 bound to MySQL on the same instance. Using a port like 3307 (or other number) reduces that possibility. Note that for outbound connections to CloudSQL itself, the CloudSQL Proxy does use 3307 How the Cloud SQL Auth proxy works.

The second thing is about setting it to listen on 0.0.0.0; as mentioned above, this binds to all IPs, allowing the proxy to listen to all incoming connections instead of those only coming from 127.0.0.1.

4
On

So far your approach seems to be right. The only way to connect between a private CDF instance with a private CloudSQL MYSQL instance is via a CloudSQL proxy.

However there are a few things to check when following this approach,

VM setup

  • Create a private GCE VM (no external IP)
  • Have all scopes (Allow full access to all Cloud APIs)
  • Have the Operating System changed to "Containerized Optimized OS". This comes with pre-installed docker
  • Have the following as Automation startup script,
docker pull gcr.io/cloudsql-docker/gce-proxy:1.16

docker run -d \
  -p 0.0.0.0:3306:3306 \
  gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy \
  -instances=<cloudsql-connection-name>=tcp:0.0.0.0:3306

The last step should get a proxy up and running with the specific docker image. A more detailed documentation can be found here(https://cloud.google.com/sql/docs/mysql/connect-docker)

CloudSQL Driver

One other thing to note, based on the MySQL version the 5.1.39 driver might not work all the time. Please check the Hub for CloudSQL MySQl specific drivers

enter image description here

I will try to update the answer if none of these suggestions work for you.

Additional resource to understand the problem

For any who wants to understand why a private CDF instance can't directly connect to a private CloudSQL MySQL instance here are couple of resources that talks about it,