How to Handle Multiline record in Hive table

253 Views Asked by At
Json File :

{ 
"buyer": { 
"legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd",
"identification": "test.bb@tesr"
}, 
{ 
"type": "TXID","identification": "12345678"
}
]
},
"supplier": {
"legalBusinessName": "test Company",
"organisationIdentifications": [
{
"type":"abcd","identification": "test28@test"
}
]
},
"paymentRecommendationId": "1234-5678-9876-2212-123456",
"excludedRemittanceInformation": [],
"recommendedPaymentInstructions": [{
"executionDate": "2022-06-12",
"paymentMethod": "aaaa",
"remittanceInformation": {
"structured": [{
"referredDocumentInformation": [{
"type": "xxx",
"number": "12341234",
"relatedDate": "2022-06-12",
"paymentDueDate": "2022-06-12",
"referredDocumentAmount": {
"remittedAmount": 2600.5,
"duePayableAmount": 3000
}
}]
}]
}
}]
}

Create Table Statement:

CREATE EXTERNAL TABLE IF NOT EXISTS `test`.`test_rahul` 
(`buyer` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT<  `type`:STRING, `identification`:STRING>>, 
`supplier` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT<    `type`:STRING, `identification`:STRING>>,
`paymentRecommendationId` STRING, `recommendedPaymentInstructions` ARRAY< STRUCT<     `executionDate`:STRING, `paymentMethod`:STRING, 
`remittanceInformation`:STRUCT< `structured`:STRUCT<     `referredDocumentInformation`:STRUCT< `type`:STRING, 
`number`:STRING, `relatedDate`:STRING, `paymentDueDate`:STRING,     `referredDocumentAmount`:STRUCT< `remittedAmount`:DOUBLE, 
`duePayableAmount`:INT>>>>>>) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'  
WITH SERDEPROPERTIES ( "field.delim"=",","mapping.ts" = "number") 
STORED AS textFILE LOCATION '/user/hdfs/Jsontest/';

If I am wring Jsonfile data in single row, for each record than it working fine but if its in multiline then getting below error.

Error Message :

Error: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: Row is not a valid JSON Object - JSONException: A JSONObject text must end with '}' at 2 [character 3 line 1] (state=,code=0)

can someone kindly suggest. looks like i need to add line/field seprator but not able to decide what should i add so that it can handle multiline also same as spark. i.e..oprtion(multiline,true)

1

There are 1 best solutions below

0
On

It seems like JSON serde in Hive cannot support multi-line. You might need to flatten JSON into single line like the following format.

{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
...