Partitioning geospatial data in hive

884 Views Asked by At

I am having problems in creating partitions for a table having geospatial data (i.e shape column). I have a table building which has geospatial data and I want to create a partition based on statename.

I created a buildingpartiton table as follows

 SET hive.exec.dynamic.partition=true;
 SET hive.exec.dynamic.partition.mode=non-strict;

 create table buildingpartition (OBJECTID string,
> RILUNIQUEID string,
> RILFEATURECODE string,
> BLDGNO string,
> BLDGNAME string,
> BLDGTYPE string,
> BLDGSUBTYPE string,
> BLDGCLASS string,
> BLDGROAD string,
> BLDGSUBROAD string,
> SUBLOCALITY string,
> BLDGSIZE string,
> TAG string,
> PINCODE string,
> NUMBEROFFLATS string,
> NUMBEROFSHOPS string,
> BLDG_TYPE string,
> CABLEOPERATORNAME string,
> AREA_1 string,
> LBU2 string,
> SHAPE binary)
> Partitioned by (STATENAME string)
> ROW FORMAT SERDE 'com.esri.hadoop.hive.serde.JsonSerde'
> STORED AS INPUTFORMAT 'com.esri.json.hadoop.UnenclosedJsonInputFormat'
>   OUTPUTFORMAT'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat';

Then i tried to insert data into this table from an existing building table. this buidling table consists of 40 million records.

 > Insert into table buildingpartition parition (statename) select * from building;

I am getting the main cause of the error as

Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of statename=00 00 00 00 03 05 00 00 00 00 83 54 0b b0 17 22 41 f0 6d 9e 04 7b 61 43 41 c0 84 25 eb ed 17 22 41 a0 82 cd c2 80 61 43 41 01 00 00 00 05 00 00 00 00 00 00 00 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 40 a0 a2 83 ed 17 22 41 a0 82 cd c2 80 61 43 41 c0 84 25 eb ed 17 22 41 70 a7 13 4d 7b 61 43 41 40 c9 31 72 b0 17 22 41 f0 6d 9e 04 7b 61 43 41 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 in directory /apps/hive/warehouse/proddb.db/buildingpartition/.hive-staging_hive_2016-12-15_10-58-15_294_7141760028369054723-1/_task_tmp.-ext-10002 is exceeded: limit=255 length=408

Can anyone help me out with this problem? I have attached the full error trace below. Any help would be appreciated. Thanks

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"objectid":null,"riluniqueid":"KOL_A_LO0000_0597215","rilfeaturecode":"LO0000","bldgno":"null","bldgname":"null","bldgtype":"1","bldgsubtype":"Unclassified","bldgclass":"Building","bldgroad":"PURANO BAZAR BRIDGE","bldgsubroad":"P K VISHWAS ROAD","sublocality":"Milan Garh Colony","statename":"West Bengal","bldgsize":"G + 0","tag":"null","pincode":"700119","numberofflats":"0","numberofshops":"0","bldg_type":"Others","cableoperatorname":"null","area_1":"0.0","lbu2":"null","shape":�T �"A�m�{aCA��%��"A����aCA�T �"AIXz�aCA@����"A����aCA��%��"Ap�M{aCA@�1r�"A�m�{aCA�T �"AIXz�aCA} at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:172) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"objectid":null,"riluniqueid":"KOL_A_LO0000_0597215","rilfeaturecode":"LO0000","bldgno":"null","bldgname":"null","bldgtype":"1","bldgsubtype":"Unclassified","bldgclass":"Building","bldgroad":"PURANO BAZAR BRIDGE","bldgsubroad":"P K VISHWAS ROAD","sublocality":"Milan Garh Colony","statename":"West Bengal","bldgsize":"G + 0","tag":"null","pincode":"700119","numberofflats":"0","numberofshops":"0","bldg_type":"Others","cableoperatorname":"null","area_1":"0.0","lbu2":"null","shape":�T �"A�m�{aCA��%��"A����aCA�T �"AIXz�aCA@����"A����aCA��%��"Ap�M{aCA@�1r�"A�m�{aCA�T �"AIXz�aCA} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:163) ... 8 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of statename=00 00 00 00 03 05 00 00 00 00 83 54 0b b0 17 22 41 f0 6d 9e 04 7b 61 43 41 c0 84 25 eb ed 17 22 41 a0 82 cd c2 80 61 43 41 01 00 00 00 05 00 00 00 00 00 00 00 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 40 a0 a2 83 ed 17 22 41 a0 82 cd c2 80 61 43 41 c0 84 25 eb ed 17 22 41 70 a7 13 4d 7b 61 43 41 40 c9 31 72 b0 17 22 41 f0 6d 9e 04 7b 61 43 41 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 in directory /apps/hive/warehouse/proddb.db/buildingpartition/.hive-staging_hive_2016-12-15_10-58-15_294_7141760028369054723-1/_task_tmp.-ext-10002 is exceeded: limit=255 length=408 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:949) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:1014) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createAncestorDirectories(FSDirMkdirOp.java:134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2615) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2493) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:708) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:405) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:577) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:859) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:966) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:709) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:117) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) ... 9 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of statename=00 00 00 00 03 05 00 00 00 00 83 54 0b b0 17 22 41 f0 6d 9e 04 7b 61 43 41 c0 84 25 eb ed 17 22 41 a0 82 cd c2 80 61 43 41 01 00 00 00 05 00 00 00 00 00 00 00 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 40 a0 a2 83 ed 17 22 41 a0 82 cd c2 80 61 43 41 c0 84 25 eb ed 17 22 41 70 a7 13 4d 7b 61 43 41 40 c9 31 72 b0 17 22 41 f0 6d 9e 04 7b 61 43 41 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 in directory /apps/hive/warehouse/proddb.db/buildingpartition/.hive-staging_hive_2016-12-15_10-58-15_294_7141760028369054723-1/_task_tmp.-ext-10002 is exceeded: limit=255 length=408 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:949) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:1014) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createAncestorDirectories(FSDirMkdirOp.java:134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2615) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2493) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:708) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:405) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:622) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:566) ... 18 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.FSLimitException$PathComponentTooLongException): The maximum path component name limit of statename=00 00 00 00 03 05 00 00 00 00 83 54 0b b0 17 22 41 f0 6d 9e 04 7b 61 43 41 c0 84 25 eb ed 17 22 41 a0 82 cd c2 80 61 43 41 01 00 00 00 05 00 00 00 00 00 00 00 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 40 a0 a2 83 ed 17 22 41 a0 82 cd c2 80 61 43 41 c0 84 25 eb ed 17 22 41 70 a7 13 4d 7b 61 43 41 40 c9 31 72 b0 17 22 41 f0 6d 9e 04 7b 61 43 41 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 in directory /apps/hive/warehouse/proddb.db/buildingpartition/.hive-staging_hive_2016-12-15_10-58-15_294_7141760028369054723-1/_task_tmp.-ext-10002 is exceeded: limit=255 length=408 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:949) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addLastINode(FSDirectory.java:1014) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.unprotectedMkdir(FSDirMkdirOp.java:237) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createSingleDirectory(FSDirMkdirOp.java:191) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createChildrenDirectories(FSDirMkdirOp.java:166) at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.createAncestorDirectories(FSDirMkdirOp.java:134) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2615) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2493) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2377) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:708) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:405) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) at org.apache.hadoop.ipc.Client.call(Client.java:1426) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy14.create(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.create(ClientNamenodeProtocolTranslatorPB.java:298) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy15.create(Unknown Source) at org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1716) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1703) at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:1638) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:449) at org.apache.hadoop.hdfs.DistributedFileSystem$7.doCall(DistributedFileSystem.java:445) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:445) at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:388) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:802) at org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat.getHiveRecordWriter(HiveIgnoreKeyTextOutputFormat.java:80) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246) ... 20 more Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

2

There are 2 best solutions below

0
On

have you added "hive-hcatalog-core.jar" ? Some distribution require this jar for JSON SerDe.see Apache Hive JSON SerDe instructions

Other possibility is - error in SerDE that you are using.

0
On

It says right there in error that partition directory name is exceeding naming convention limit -

The maximum path component name limit of statename=00 00 00 00 03 05 00 00 00 00 83 54 0b b0 17 22 41 f0 6d 9e 04 7b 61 43 41 c0 84 25 eb ed 17 22 41 a0 82 cd c2 80 61 43 41 01 00 00 00 05 00 00 00 00 00 00 00 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 40 a0 a2 83 ed 17 22 41 a0 82 cd c2 80 61 43 41 c0 84 25 eb ed 17 22 41 70 a7 13 4d 7b 61 43 41 40 c9 31 72 b0 17 22 41 f0 6d 9e 04 7b 61 43 41 00 83 54 0b b0 17 22 41 18 49 58 7a 80 61 43 41 in directory /apps/hive/warehouse/proddb.db/buildingpartition/.hive-staging_hive_2016-12-15_10-58-15_294_7141760028369054723-1/_task_tmp.-ext-10002 is exceeded: limit=255 length=408 at org.apache.hadoop.hdfs.server.namenode.FSDirectory.verifyMaxComponentLength(FSDirectory.java:949)

Your table is getting partitioned on SHAPE column instead of STATENAME column because you are inserting using select * statement.

Column on which the table should be partitioned must be mentioned as last column in select query.

right way to do it is -

from building 
insert overwrite table buildingpartition partition(statename)
select col1,col2..... , statename  ;