Choosing data catalog in pyathena?

1.1k Views Asked by At

I'm trying to use pyathena (which looks simpler than the native boto3) to perform some queries . However, I wasn't able to find how can I define which data catalog to use. For example the query execution using boto3:

    athena_client = boto3.client('athena')

    start_execution_response = athena_client.start_query_execution(
        QueryString='SELECT * FROM test_table',
        QueryExecutionContext={
            'Database': 'default',
            'Catalog': 'AwsDataCatalog'
        },
        ResultConfiguration={
            'OutputLocation': 's3://some_bucket/query_results/'
        }
    )

And using pyathena I can do something like that (and it worked):

    cursor = connect(s3_staging_dir='s3://some_bucket/query_results/',
                     schema_name="some_db",
                     ).cursor()

    cursor.execute("SELECT * FROM table1")

But how can it work when I have multiple catalogs? Am I missing something?

Thanks, Nir.

2

There are 2 best solutions below

2
On

I would assume you can also just write a query with fully qualified table name

SELECT * FROM AwsDataCatalog.default.test_table

And then reference to different catalogs.

0
On

You can specify catalog_name while creating a connection as below

 cursor = connect(s3_staging_dir='s3://some_bucket/query_results/',
                 catalog_name="catalog",
                 schema_name="some_db",
                 ).cursor()
 cursor.execute("SELECT * FROM table1")

You can refer to this source code, which mentions all the parameters it supports. https://github.com/laughingman7743/PyAthena/blob/master/pyathena/connection.py#227