AWS Athena - HIVE_UNKNOWN_ERROR: Unable to create input format on Athena

2.6k Views Asked by At

I am trying to create a GLUE table from S3 csv file , below is my CF template

#######################################################################################################
# AWS RESOURCE CHILD STACKS CONFIGURATION DETAILS
#######################################################################################################
Resources:

    TempDataMasterTable:
       Type: AWS::Glue::Table
       Properties:
         DatabaseName: "temp_db"
         CatalogId: !Ref AWS::AccountId
         TableInput:
           Name: "temp_table"
           Description: "Master table"
           TableType: EXTERNAL_TABLE
           Parameters: { "classification" : "csv", "compressionType" : "none", "typeOfData": "file" }
           StorageDescriptor:
             Location: s3://temp-location/temp-folder/
             InputFormat: ''
             OutputFormat: ''
             SerdeInfo:
               Parameters:
                 serialization.format: ','
                 field.delim: ','
             Columns:
             - {"Name": "name", "Type": "string"}
             - {"Name": "lastname", "Type": "string"}

When I go to Athena console , I can see the table has been created but on trying to query the table , I get the error - " HIVE_UNKNOWN_ERROR: Unable to create input format on Athena" so clearly I am doing something wrong while giving the input or output format in the CF template. Could you please advise what should be the format in the CF template. Also the location here -"s3://temp-location/temp-folder/" has files for each day , so format filename-date. Could you please advise what's wrong with the CF template that Athena can not read this table ?

2

There are 2 best solutions below

0
On BEST ANSWER

Finally found the solution to the problem , below were the parameters that were changed and fixed the issue

InputFormat: 'org.apache.hadoop.mapred.TextInputFormat'
             OutputFormat: 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
SerdeInfo:
               Parameters:
                 serialization.format: ','
                 field.delim: ','
               SerializationLibrary: 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

To answer this more broadly - a better solution to figure out these cryptic parameters is to create a test crawler and crawl your S3 data with it by 'Run Crawler'. Once that is done you can go to Athena and see your new table created by the Crawler. If you open the table properties you can see the following among other config details

enter image description here

and then use these properties in your cloud formation template . This is how I fixed the issue and can be used for other formats as well

0
On

You should pass org.apache.hadoop.mapred.TextInputFormat for input and org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat for output. This template also needs to have SerdeInfo which contains the library that will help you in reading the data from S3.Below is a sample template which worked for me in Athena.Refer to this for more examples.

AWSTemplateFormatVersion: 2010-09-09
Description: Glue Table
Resources:
  MyGlueTable:
    Type: AWS::Glue::Table
    Properties:
      DatabaseName: ddddd
      CatalogId: !Ref AWS::AccountId
      TableInput:
        Name: my_glue_table
        TableType: EXTERNAL_TABLE
        StorageDescriptor:
          Columns:
          - {"Name": "name", "Type": "string"}
          - {"Name": "lastname", "Type": "string"}
          Compressed: false
          InputFormat: org.apache.hadoop.mapred.TextInputFormat
          Location: s3://<>
          NumberOfBuckets: -1
          OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
          SerdeInfo:
            SerializationLibrary: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          SortColumns: []
          StoredAsSubDirectories: false