Query fails in HiveContext of pyspark while writing into avro format

140 Views Asked by At

I'm trying to load an external table as avro format, using HiveContext of pyspark. The external-table creation query runs in hive. However, the same query fails in hive context with error as, org.apache.hadoop.hive.serde2.SerDeException: Encountered exception determining schema. Returning signal schema to indicate problem: null

My avro schema is as follows.

{
  "type" : "record",
  "name" : "test_table",
  "namespace" : "com.ent.dl.enh.test_table",
  "fields" : [ {
    "name" : "column1",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column2",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column3",
    "type" : [ "null", "string" ] , "default": null
  }, {
    "name" : "column4",
    "type" : [ "null", "string" ] , "default": null
  } ]
}

My create table script is,

CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')

I'm running below code using using spark-submit,

from pyspark import SparkConf, SparkContext
from pyspark.sql import HiveContext

print "Start of program"
sc = SparkContext()
hive_context = HiveContext(sc)


hive_context.sql("CREATE EXTERNAL TABLE test_table_enh ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3://Staging/test_table/enh' TBLPROPERTIES ('avro.schema.url'='s3://Staging/test_table/test_table.avsc')")

print "end"

Spark Version: 2.2.0 OpenJDK version: 1.8.0 Hive Version: 2.3.0

0

There are 0 best solutions below