Pig script runs fine on Sandbox but fails on a real cluster

68 Views Asked by At

Environments:

  1. Hortonworks Sandbox running HDP 2.5
  2. Hortonworks HDP 2.5 Hadoop cluster managed by Ambari

We are facing a tricky situation. We run Pig script from Hadoop tutorial. Script is working with tiny data. It works fine on a Sandbox. But fails in real cluster where it complains about insufficient memory for the container.

container is running beyond physical memory limit

message can be seen in the logs.

The tricky part is - Sandbox has way less memory available than real cluster (about 3 times less). Also most memory settings in Sandbox (MapReduce memory, Yarn memory, Yarn container sizes) allow much less memory than corresponding settings in a real cluster. Still it is sufficient for Pig in Sandbox but not sufficient in a real cluster.

Another note - Hive queries doing the similar job also work good (in both environements), they do not complain about memory.

Apparently there is some setting somewhere (within Environment 2), which makes Pig to request too much memory? Can please anybody recommend what parameter should be modified to stop Pig script to request too big memory?

0

There are 0 best solutions below