How to install AWS CLI on Cloud Composer?

119 Views Asked by At

I need to install AWS CLI tool on Google Cloud Composer to be able to use it with BashOperator from Airflow DAGs.

AWS CLI documentation page explains how to install it as a package, but Cloud Composer doesn't have a supported way to install apt packages on all instances.

My motivation. I need to synchronize a large S3 bucket with another storage. The command aws s3 sync (link) suits perfectly for this. Unfortunately, I didn't find a replacement for this command in Airflow Amazon provider operators. Also, it seems this command is not supported by boto and boto3 (github issue 1, issue 2).

1

There are 1 best solutions below

1
On BEST ANSWER

To install the AWS CLI on Cloud Composer, you can use the following steps:

  1. Create a new Airflow DAG and add a BashOperator task.
  2. In the BashOperator task, use the following command to install the AWS CLI:
pip install awscli
  1. Configure the AWS CLI with your AWS credentials. You can do this by adding the following code to the BashOperator task:
aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID
aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY
  1. Configure the AWS CLI to use your chosen AWS region. You can do this by adding the following code to the BashOperator task:
aws configure set default.region AWS_REGION

Replace AWS_REGION with the name of a supported AWS region.

  1. Save and run the DAG.

Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.

Note that you will need to have the pip Python package installed on your Cloud Composer environment in order to use the above steps. You can install pip using the following command:

sudo apt install python-pip

Here is an example of a complete Airflow DAG that installs the AWS CLI and configures it with your AWS credentials:

from airflow import DAG
from airflow.operators.bash_operator import BashOperator

default_args = {
    'start_date': airflow.utils.dates.days_ago(2),
    'retries': 1
}

dag = DAG('install_aws_cli', default_args=default_args)

install_aws_cli = BashOperator(
    task_id='install_aws_cli',
    bash_command='pip install awscli',
    dag=dag
)

configure_aws_cli = BashOperator(
    task_id='configure_aws_cli',
    bash_command=[
        'aws configure set aws_access_key_id YOUR_AWS_ACCESS_KEY_ID',
        'aws configure set aws_secret_access_key YOUR_AWS_SECRET_ACCESS_KEY',
        'aws configure set default.region AWS_REGION'
    ],
    dag=dag
)

install_aws_cli >> configure_aws_cli

Once you have saved the DAG, you can run it using the following command:

airflow run install_aws_cli

Once the DAG has run, the AWS CLI will be installed on all of the Airflow worker nodes in your Cloud Composer environment. You can then use the AWS CLI in your Airflow DAGs to interact with AWS services.