mirror of
https://github.com/apache/impala.git
synced 2026-01-06 15:01:43 -05:00
This is an attempt to get rod of IMPALA-10669 and friends, crashing Tez containers during the loading of nested ORC data. The usual error message logged for these failures is: Container [pid=11530,containerID=container_1618776748992_0039_01_000003] is running 2785280B beyond the 'PHYSICAL' memory limit. Current usage: 1.0 GB of 1 GB physical memory used; 2.6 GB of 2.1 GB virtual memory used. Killing container. https://stackoverflow.com/a/43827548/143681 explains that the tunable setting 'yarn.scheduler.minimum-allocation-mb' in yarn-site.xml sets both the minimum memory size and the memory size increment for Yarn containers This patch is an attempt to work around the failure by forcibly setting a minimum size for the Yarn containers used in dataload that is significantly larger than the 1 GB size reported in the failure messages. Tested by running the dataload phase successfully on the following platform combinations: - Ubuntu 16.04, m6i.8xlarge (128 GB RAM, Docker) - Ubuntu 16.04, m5.12xlarge (192 GB RAM, Docker) - Centos 7.4, m5.4xlarge (64 GB RAM) - Centos 7.4, r5.4xlarge (128 GB RAM) - Ubuntu 16.04, m6i.4xlarge (64 GB RAM) Change-Id: I77e7c9e9fa3491c6e5652351869d3a4410bbb7b8 Reviewed-on: http://gerrit.cloudera.org:8080/18630 Tested-by: Impala Public Jenkins <impala-public-jenkins@cloudera.com> Reviewed-by: Michael Smith <michael.smith@cloudera.com> Reviewed-by: Laszlo Gaal (Cloudera) <laszlo.gaal@cloudera.com>