Recently, the full data load started failing because Hive ran out of heap space while
writing the nested tpch tables. This patch simply bumps up the heap space, and the query
is now successfull.
Change-Id: I92d0029659c41417d76a15f703df1d42e5187d5e
Reviewed-on: http://gerrit.cloudera.org:8080/776
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Due to a possible change in behaviour in Hive/MR, it is no longer possible to use
arbitrarily large values for parquet.block.size. This breaks the loading of nested tpch
data on newer Hive. This patch addresses the problem by using a permissble value.
Change-Id: Ib5b14651fb579cec6aa8d45bd2253cecb4346eb9
Reviewed-on: http://gerrit.cloudera.org:8080/755
Reviewed-by: Martin Grund <mgrund@cloudera.com>
Tested-by: Internal Jenkins
When loading a large nested table using the GROUP_CONCAT function,
Impala runs out of memory. We prevent this from happening by adding
an option to partition the table and load one partition at a time.
Change-Id: I8d517f94ef97e98d36eb8ebc8180865023655114
Reviewed-on: http://gerrit.cloudera.org:8080/448
Reviewed-by: Taras Bobrovytsky <tbobrovytsky@cloudera.com>
Tested-by: Internal Jenkins
The database will be used for testing in the future.
Change-Id: I60b54b36db9493a5bea308151b4027cd47d73047
Reviewed-on: http://gerrit.cloudera.org:8080/400
Reviewed-by: Ishaan Joshi <ishaan@cloudera.com>
Tested-by: Internal Jenkins