Pyathena Schema does not exist

6.1k Views Asked by At

I need to process some data of a certain flow that I have in a specific folder in a bucket S3. I want to do this in Python. After searching for a while I found the library PyAthena which exactly what I was looking for!

I installed the version 1.8.0 of PyAthena.

For your information, I have my S3 bucket in the region of Paris eu-west-3 and my Athena database in the region of Francfort eu-central-1.

I used the following code that I found in the documentation PyAthena Doc:

from pyathena import connect

cursor = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
             aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
             s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
             region_name='us-west-2').cursor()
cursor.execute("SELECT * FROM one_row")
print(cursor.description)
print(cursor.fetchall())

I was at first not sure about which region_name to use, if it should be the one of Paris where the S3 bucket is, or the one of Francfort where the Athena database is!!

I tried both and following the error messages I got, I ended up using the one of my S3 bucket! However I kept getting errors of permissions about Glue, something like:

pyathena.error.OperationalError: Insufficient permissions to execute the query.  Error retrieving table : master in database : default due to : User: arn:aws:iam::<my-account-client-ID>:user/s3-test is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-west-3:<my-account-client-ID>:catalog

So I added the following strategy in the IAM:

        {
        "Sid": "VisualEditor2",
        "Effect": "Allow",
        "Action": [
            "athena:StartQueryExecution",
            "athena:GetQueryResultsStream",
            "athena:GetQueryResults",
            "athena:DeleteNamedQuery",
            "athena:GetNamedQuery",
            "athena:*",
            "athena:ListQueryExecutions",
            "athena:ListNamedQueries",
            "athena:CreateNamedQuery",
            "athena:StopQueryExecution",
            "athena:GetQueryExecution",
            "athena:BatchGetNamedQuery",
            "athena:BatchGetQueryExecution"
        ],
        "Resource": "*"
    },
    {
        "Sid": "VisualEditor3",
        "Effect": "Allow",
        "Action": [
            "glue:GetTable",
            "glue:GetTables",
            "glue:GetDatabase"
        ],
        "Resource": [
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:catalog",
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:database/*",
            "arn:aws:glue:eu-west-3:<my-account-client-ID>:table/*/*"
        ]
    }

Now I have this error message:

    cursor.execute("select * from master")
    File "/home/ubuntu/.local/lib/python3.6/site-packages/pyathena/util.py", line 28, in _wrapper
return wrapped(*args, **kwargs)
    File "/home/ubuntu/.local/lib/python3.6/site-packages/pyathena/cursor.py", line 57, in execute
raise OperationalError(query_execution.state_change_reason)
    pyathena.error.OperationalError: SYNTAX_ERROR: line 1:15: Schema default does not exist
1

There are 1 best solutions below

0
On

The problem is the select statement: if you do not indicate it you will use the database default and if you do not have such database in your environment it will fail. You should indicate your database and table:

cursor.execute("SELECT * FROM <YOUR_DATABASE>.<YOUR_TABLE>")

Or you could also specify the database name (or schema name) with a parameter in the cursor function:

cursor = connect(aws_access_key_id='YOUR_ACCESS_KEY_ID',
             aws_secret_access_key='YOUR_SECRET_ACCESS_KEY',
             s3_staging_dir='s3://YOUR_S3_BUCKET/path/to/',
             region_name='us-west-2').cursor(schema_name=<YOUR_DATABASE>)
cursor.execute("SELECT * FROM <YOUR_TABLE>")

If you do one of that you should not get the same error anymore.