GCP Composer v1.18.6 and 2.0.10 incompatible with CloudSqlProxyRunner

500 Views Asked by At

In my Composer Airflow DAGs, I have been using the CloudSqlProxyRunner to connect to my Cloud SQL instance.

However, after updating Google Cloud Composer from v1.18.4 to 1.18.6, my DAG started to encounter a strange error:

[2022-04-22, 23:20:18 UTC] {cloud_sql.py:462} INFO - Downloading cloud_sql_proxy from https://dl.google.com/cloudsql/cloud_sql_proxy.linux.x86_64 to /home/airflow/dXhOYoU_cloud_sql_proxy.tmp
[2022-04-22, 23:20:18 UTC] {taskinstance.py:1702} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1330, in _run_raw_task
    self._execute_task_with_callbacks(context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1457, in _execute_task_with_callbacks
    result = self._execute_task(context, self.task)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1513, in _execute_task
    result = execute_callable(context=context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/decorators/base.py", line 134, in execute
    return_value = super().execute(context)
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/operators/python.py", line 174, in execute
    return_value = self.execute_callable()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/operators/python.py", line 185, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/home/airflow/gcs/dags/real_time_scoring_pipeline.py", line 99, in get_messages_db
    with SQLConnection() as sql_conn:
  File "/home/airflow/gcs/dags/helpers/helpers.py", line 71, in __enter__
    self.proxy_runner.start_proxy()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 524, in start_proxy
    self._download_sql_proxy_if_needed()
  File "/opt/python3.8/lib/python3.8/site-packages/airflow/providers/google/cloud/hooks/cloud_sql.py", line 474, in _download_sql_proxy_if_needed
    raise AirflowException(
airflow.exceptions.AirflowException: The cloud-sql-proxy could not be downloaded. Status code = 404. Reason = Not Found

Checking manually, https://dl.google.com/cloudsql/cloud_sql_proxy.linux.x86_64 indeed returns a 404.

Looking at the function that raises the exception, _download_sql_proxy_if_needed, it has this code:

        system = platform.system().lower()
        processor = os.uname().machine
        if not self.sql_proxy_version:
            download_url = CLOUD_SQL_PROXY_DOWNLOAD_URL.format(system, processor)
        else:
            download_url = CLOUD_SQL_PROXY_VERSION_DOWNLOAD_URL.format(
                self.sql_proxy_version, system, processor
            )

So, for whatever reason, in both of these latest images of Composer, processor = os.uname().machine returns x86_64. Previously, it returned amd64, and https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64 is in fact a valid link to the binary we need.

I replicated this error in Composer 2.0.10 as well.

I am still investigating possible workarounds, but posting this here in case someone else encounters this issue, and has figured out a workaround, and to raise this with Google engineers (who, according to Composer's docs, monitor this tag).

2

There are 2 best solutions below

0
On

My current workaround is patching the CloudSqlProxyRunner to hardcode the correct URL:

class PatchedCloudSqlProxyRunner(CloudSqlProxyRunner):
    """
    This is a patched version of CloudSqlProxyRunner to provide a workaround for an incorrectly
    generated URL to the Cloud SQL proxy binary.
    """

    def _download_sql_proxy_if_needed(self) -> None:
        download_url = "https://dl.google.com/cloudsql/cloud_sql_proxy.linux.amd64"
        
        # the rest of the code is taken from the original method

        proxy_path_tmp = self.sql_proxy_path + ".tmp"
        self.log.info(
            "Downloading cloud_sql_proxy from %s to %s", download_url, proxy_path_tmp
        )
        # httpx has a breaking API change (follow_redirects vs allow_redirects)
        # and this should work with both versions (cf. issue #20088)
        if "follow_redirects" in signature(httpx.get).parameters.keys():
            response = httpx.get(download_url, follow_redirects=True)
        else:
            response = httpx.get(download_url, allow_redirects=True)  # type: ignore[call-arg]
        # Downloading to .tmp file first to avoid case where partially downloaded
        # binary is used by parallel operator which uses the same fixed binary path
        with open(proxy_path_tmp, "wb") as file:
            file.write(response.content)
        if response.status_code != 200:
            raise AirflowException(
                "The cloud-sql-proxy could not be downloaded. "
                f"Status code = {response.status_code}. Reason = {response.reason_phrase}"
            )

        self.log.info(
            "Moving sql_proxy binary from %s to %s", proxy_path_tmp, self.sql_proxy_path
        )
        shutil.move(proxy_path_tmp, self.sql_proxy_path)
        os.chmod(self.sql_proxy_path, 0o744)  # Set executable bit
        self.sql_proxy_was_downloaded = True

And then instantiate it and use it as I would the original CloudSqlProxyRunner:

proxy_runner = PatchedCloudSqlProxyRunner(path_prefix, instance_spec)
proxy_runner.start_proxy()

But I am hoping that this is properly fixed by someone at Google soon, by fixing the os.uname().machine value, or uploading a Cloud SQL proxy binary to the one currently generated in _download_sql_proxy_if_needed.

1
On

As mentioned by @enocom this commit to support arm64 download links actually caused a side-effect of generating broken download links. I assume the author of the commit thought that the Cloud SQL Proxy had binaries for each machine type, although in fact there are not Linux x86_64 links.

I have created an airflow PR to hopefully fix the broken links, hopefully it will get merged in soon and resolve this. Will update the thread with any updates.

Update (I've been working with Jack on this): I just merged that PR! When a new version of the providers is added to PyPI, you'll need to add it to your Composer environment. In the meantime, as a workaround, you could take the fix from Jack's PR and use it as a local dependency. (Similar to the other reply here!) If you do this, I highly recommend setting a calendar reminder (maybe a month from now?) to remove the workaround and go back to importing from the provider package, just to make sure you don't miss out on other updates to it! :)