When to set hive parameters during a session?

1k Views Asked by At

I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables. We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example, the files are merged for some partitions (few number of files), but not others (many smaller files), seemingly on random days.

My question is: when is it necessary to enter all of my Hive set parameters? Does it need to be done for every single insert/command/statement I'm running? Or just once at the beginning of the Hive session when I've launched Hive?

These are the standard set parameters we've been using:

SET mapred.job.queue.name=yometrics;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2000;
SET hive.exec.max.dynamic.partitions.pernode=2000;
SET hive.merge.tezfiles=true;
1

There are 1 best solutions below

5
leftjoin On BEST ANSWER

You can put configuration in the beginning of the file, it will work for the whole session.

Alternatively you can put common parameters in the separate file params.hql and in each script call

source /local/path/to/the/file/params.hql in the beginning.

Also you can put them in the hive-site.xml

Also you can use bootstrap for the same if you are on Qubole/AWS: https://docs.qubole.com/en/latest/user-guide/hive/bootstrap-script.html