We are trying to read data from 'ORC' table in HIVE (1.2.1) and put that data into table with 'TextInputFormat'. Some entries are too large in original data and following error occurs during operation:
org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.tez.runtime.library.common.sort.impl.ExternalSorter$MapBufferTooSmallException: Record too large for in-memory buffer. Exceeded buffer overflow limit, bufferOverflowRecursion=2, bufferList.size=1, blockSize=1610612736
Any ideas how to fix the issue?
We are using TEZ engine for queries execution and there are no errors with simple MR engine.
Query to execute:
insert overwrite table visits_text_test_1m select * from visits where dt='2016-01-19' limit 1000000;
Upd: Same error when copying from ORC to ORC storage.
Upd 2: Simple 'select' from ORC works pretty good with any engine.
Hint #1: just switch from TEZ to MapReduce before running your query - slower but more resilient.
set hive.execution.engine = mr ;
Hint #2: since the exception comes out of the dreadful TEZ ExternalSorter beast, dig into TEZ properties such as
tez.runtime.sorter.class
,tez.runtime.io.sort.mb
etc. Be warned that finding a working set of properties (not even speaking of tuning them to match yourhive.tez.container.size
) will probably require some kind of voodoo sacrifice.Cf. HortonWork's Configuring Tez manual for starters.