AWS Glue unable to access input data set

9.8k Views Asked by At

I have a dataset registered in Glue / Athena, call it my_db.table. I'm able to query it via Athena and everything generally seems to be in order.

I'm trying to use this table in a Glue job, but am getting the following fairly opaque error message:

py4j.protocol.Py4JJavaError: An error occurred while calling o54.getCatalogSource.
: java.lang.Error: No classification or connection in my_db.table

This would appear to indicate that Glue can't see the catalog entry for my table, or can't use the information in that entry, but I don't have any further visibility than that.

Has anyone experience with this error and what might be causing it?

2

There are 2 best solutions below

3
On BEST ANSWER

The error message actually describes the problem well - there was no classification for the table being queried.

Tables created via Glue are registered with a Classification - csv, parquet, orc, avro, json. See Creating Tables Using Athena for AWS Glue Jobs.

The table I created 'manually' via Athena did not have a classifcation. See the below screenshot from the Glue 'tables' page.

enter image description here

The solution is easy: at the end of the CREATE TABLE script user must append a classification property like so

CREATE EXTERNAL TABLE IF NOT EXISTS my_db.my_table (
  `id` int,
  `description` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ',',
  'collection.delim' = 'undefined',
  'mapkey.delim' = 'undefined'
) LOCATION 's3://my_bucket/'
TBLPROPERTIES ('classification'='csv');

Now the table has a classification within the Glue interface and is accessible via a Glue job.

0
On

Need to add the classification in the table you've created. To add it via UI follow these steps:

  1. Go to the table in glue:

enter image description here

  1. click on Edit Table and add it as shown in image: enter image description here