multiple extracts using a same Beeline connects

116 Views Asked by At

Currently I have a script generating 100+ csv extracts. It run the following:

beeline -u "jdbc:hive2://<host>:<port>/default;principal=hive" --outputformat=csv -f <query.sql> > <extract_file.csv>
beeline -u "jdbc:hive2://<host>:<port>/default;principal=hive" --outputformat=csv -f <query1.sql> > <extract_file1.csv>
beeline -u "jdbc:hive2://<host>:<port>/default;principal=hive" --outputformat=csv -f <query2.sql> > <extract_file2.csv>
....
beeline -u "jdbc:hive2://<host>:<port>/default;principal=hive" --outputformat=csv -f <query100.sql> > <extract_file100.csv>

Every time this script runs, it congest the YARN queue completely and cause other jobs to fail. I have tried to separate these 100 extracts into smaller groups and run it one after another, it helps a little bit, but it runs a lot slower and still take up a lot of resources from YARN queue and affect the other users/jobs. I would like to know if there is a way to make the connection once in the script and use the same connection to generate multiple csv extracts?

Thanks

1

There are 1 best solutions below

0
On

Couple of ways you can tune whole process -

  1. Check if you can run some jobs during off peak hours?

  2. Check if you can decommission some file? Or combine 2/3 files into one.

  3. Select only those columns required and remove any unnecessary ones. Similarly put more filter to reduce number of rows if needed.

  4. Tune each query so they take less time and resources.

  5. check if you are dumping some static file everyday. You can make them automatically run on weekends.

  6. Group all tasks into multiple groups with long running tasks evenly distributed.

  7. Now, you can dive into hive heap memory tuning. You can use below link to tune it. Divide your query in groups, give more memory to long running jobs. https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_hive_tuning.html

  8. If you do all above steps, you can think to increase memory and cpu :).