mirror of
https://github.com/apache/impala.git
synced 2026-01-08 03:02:48 -05:00
The two Kudu loads and Hive UDFs can all run in parallel. This
should shave about 4 minutes off of the data load. (Current
timings are 3.5, 4, and 0.6 minutes, see below.)
I've run dataload with this change many times.
Loading Kudu functional (logging to /home/ubuntu/Impala/logs/data_loading/load-kudu.log)...
Loading workload 'functional-query' using exploration strategy 'core' in table formats 'kudu/none/none' OK (Took: 3 min 29 sec)
Loading Kudu TPCH (logging to /home/ubuntu/Impala/logs/data_loading/load-kudu-tpch.log)...
Loading workload 'tpch' using exploration strategy 'core' in table formats 'kudu/none/none' OK (Took: 4 min 0 sec)
Loading Hive UDFs (logging to /home/ubuntu/Impala/logs/data_loading/build-and-copy-hive-udfs.log)...
Loading Hive UDFs OK (Took: 0 min 41 sec)
Change-Id: I7e93ee5a77ec9271b980b88bef7ad512ecbe0407
Reviewed-on: http://gerrit.cloudera.org:8080/8822
Reviewed-by: Dimitris Tsirogiannis <dtsirogiannis@cloudera.com>
Tested-by: Impala Public Jenkins