Files
impala/testdata/cluster
Laszlo Gaal 70568c80b3 IMPALA-10316: Increase Yarn minimum container size for dataload
This is an attempt to get rod of IMPALA-10669 and friends, crashing Tez
containers during the loading of nested ORC data.

The usual error message logged for these failures is:

Container [pid=11530,containerID=container_1618776748992_0039_01_000003]
is running 2785280B beyond the 'PHYSICAL' memory limit.
Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB
virtual memory used. Killing container.

https://stackoverflow.com/a/43827548/143681 explains that the tunable
setting 'yarn.scheduler.minimum-allocation-mb' in yarn-site.xml sets
both the minimum memory size and the memory size increment for Yarn
containers

This patch is an attempt to work around the failure by forcibly setting
a minimum size for the Yarn containers used in dataload that is
significantly larger than the 1 GB size reported in the failure messages.

Tested by running the dataload phase successfully on the following
platform combinations:
- Ubuntu 16.04, m6i.8xlarge (128 GB RAM, Docker)
- Ubuntu 16.04, m5.12xlarge (192 GB RAM, Docker)
- Centos 7.4, m5.4xlarge (64 GB RAM)
- Centos 7.4, r5.4xlarge (128 GB RAM)
- Ubuntu 16.04, m6i.4xlarge (64 GB RAM)

Change-Id: I77e7c9e9fa3491c6e5652351869d3a4410bbb7b8
Reviewed-on: http://gerrit.cloudera.org:8080/18630
Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com>
Reviewed-by: Michael Smith <michael.smith@cloudera.com>
Reviewed-by: Laszlo Gaal (Cloudera) <laszlo.gaal@cloudera.com>
2022-06-20 18:44:24 +00:00
..
2020-06-18 01:11:18 +00:00